[
https://issues.apache.org/activemq/browse/CAMEL-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=56667#action_56667
]
Ashwin Karpe edited comment on CAMEL-1472 at 12/28/09 6:57 AM:
---------------------------------------------------------------
Hi Claus, Jon & Hadrian,
I have created a new Apache Lucene Component & Query processor and have
attached a patch along with a zip file containing the code for your review. I
have also added the requisite unit tests and ensured that the code undergoes
checkstyle validation.
The component works as follows
Lucene Producer: Index Creation example
----------------------------------------------------------
context.addRoutes(new RouteBuilder() {
public void configure() {
from("direct:start").
to("lucene://stdQuotesIndex?analyzerRef=#stdAnalyzer&indexDir=#std&srcDir=#load_dir").
to("mock:result");
}
});
where each URI parameter setting does the following
- analyzerRef: can be any valid implementation of Lucene Directory
Analyzer (StandardAnalyzer, WhitespaceAnalyzer, StopAnalyzer... etc)
- srcDir: an optional directory location for loading Text or XML
documents at endpoint or Lucene Index creation.
Once created the index can take any exchange body and store its contents in the
index.
Important Note: Lucene stipulates that the index be created upfront and then
used in a read only mode later for any querying. Hence the index cannot be in
flux during query processing. This requires the Lucene Producer to have
received its payloads upfront and created the index before any queries can be
logged against it.
Since the URI settings cannot be directly passed, I pass them using the JNDI
registry associated with the the Default Component (example shown below).
Example: Providing values for the Lucene URI
--------------------------------------------------------------
@Override
protected JndiRegistry createRegistry() throws Exception {
JndiRegistry registry = new JndiRegistry(createJndiContext());
registry.bind("std", new File("target/stdindexDir"));
registry.bind("load_dir", new File("src/test/resources/sources"));
registry.bind("stdAnalyzer", new
StandardAnalyzer(Version.LUCENE_CURRENT));
return registry;
}
I have also added a Query Processor that is fully capable of running any
queries (including wildcards etc) against a Lucene Document Index and present
the results in a schema driven XML format (example provided below)
Example: Query Processor for Lucene called LuceneSearcher
-------------------------------------------------------------------------------------
context.addRoutes(new RouteBuilder() {
public void configure() {
from("direct:start").
setHeader("QUERY", constant("Rodney Dangerfield")).
process(new LuceneSearcher("target/stdindexDir", analyzer,
null, 20)).
to("mock:searchResult");
}
});
Example: Search Results presentation Format
----------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<hits xmlns="http://camel.apache.org/lucene/SearchData">
<numberOfHits>2</numberOfHits>
<hit>
<number>1</number>
<hitLocation>15</hitLocation>
<score>0.9453935</score>
<data>I worked in a pet store and people kept asking how big I?d
get. - Rodney Dangerfield</data>
</hit>
<hit>
<number>2</number>
<hitLocation>13</hitLocation>
<score>0.8272193</score>
<data>I tell ya when I was a kid, all I knew was rejection. My
yo-yo, it never came back. - Rodney Dangerfield</data>
</hit>
</hits>
I used the latest version of Lucene version 3.0 for the implementation but this
can be moved up easily over time since I have no hard restrictions on Lucene
versions. The API sets could be different moving backwards though. I have not
verified this.... Lucene has undergone a lot of change in each subsequent
version it seems :). The good news is that for the most part they offer
backward compatibility for API's.
Please find attached the patch as well as a zip file containing the code.
Can you please review and please let me know what you think. I would be happy
to update the documentation once I get your feedback and am happy to make any
needed changes.
Cheers,
Ashwin...
was (Author: akarpe):
Hi Claus, Jon & Hadrian,
I have created a new Apache Lucene Component & Query processor and have
attached a patch along with a zip file containing the code for your review. I
have also added the requisite unit tests and ensured that the code undergoes
checkstyle validation.
The component works as follows
Lucene Producer: Index Creation example
----------------------------------------------------------
context.addRoutes(new RouteBuilder() {
public void configure() {
from("direct:start").
to("lucene://stdQuotesIndex?analyzerRef=#stdAnalyzer&indexDir=#std&srcDir=#load_dir").
to("mock:result");
}
});
where each URI parameter setting does the following
- analyzerRef: can be any valid implementation of Lucene Directory
Analyzer (StandardAnalyzer, WhitespaceAnalyzer, StopAnalyzer... etc)
- srcDir: an optional directory location for loading Text or XML
documents at endpoint or Lucene Index creation.
Since these settings cannot be directly passed into the URI, I pass them using
the JNDI registry associated with the the Default Component (example shown
below).
Example: Providing values for the Lucene URI
--------------------------------------------------------------
@Override
protected JndiRegistry createRegistry() throws Exception {
JndiRegistry registry = new JndiRegistry(createJndiContext());
registry.bind("std", new File("target/stdindexDir"));
registry.bind("load_dir", new File("src/test/resources/sources"));
registry.bind("stdAnalyzer", new
StandardAnalyzer(Version.LUCENE_CURRENT));
return registry;
}
I have also added a Query Processor that is fully capable of running any
queries (including wildcards etc) against a Lucene Document Index and present
the results in a schema driven XML format (example provided below)
Example: Query Processor for Lucene called LuceneSearcher
-------------------------------------------------------------------------------------
context.addRoutes(new RouteBuilder() {
public void configure() {
from("direct:start").
setHeader("QUERY", constant("Rodney Dangerfield")).
process(new LuceneSearcher("target/stdindexDir", analyzer,
null, 20)).
to("mock:searchResult");
}
});
Example: Search Results presentation Format
----------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<hits xmlns="http://camel.apache.org/lucene/SearchData">
<numberOfHits>2</numberOfHits>
<hit>
<number>1</number>
<hitLocation>15</hitLocation>
<score>0.9453935</score>
<data>I worked in a pet store and people kept asking how big I?d
get. - Rodney Dangerfield</data>
</hit>
<hit>
<number>2</number>
<hitLocation>13</hitLocation>
<score>0.8272193</score>
<data>I tell ya when I was a kid, all I knew was rejection. My
yo-yo, it never came back. - Rodney Dangerfield</data>
</hit>
</hits>
I used the latest version of Lucene version 3.0 for the implementation but this
can be moved up easily over time since I have no hard restrictions on Lucene
versions. The API sets could be different moving backwards though. I have not
verified this.... Lucene has undergone a lot of change in each subsequent
version it seems :). The good news is that for the most part they offer
backward compatibility for API's.
Please find attached the patch as well as a zip file containing the code.
Can you please review and please let me know what you think. I would be happy
to update the documentation once I get your feedback and am happy to make any
needed changes.
Cheers,
Ashwin...
> Lucene Component
> ----------------
>
> Key: CAMEL-1472
> URL: https://issues.apache.org/activemq/browse/CAMEL-1472
> Project: Apache Camel
> Issue Type: New Feature
> Reporter: Claus Ibsen
> Assignee: Ashwin Karpe
> Fix For: Future
>
> Attachments: camel-lucene-20091227.patch, camel-lucene.zip
>
>
> We should add a new component for Apache Lucene integration
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.