[ https://issues.apache.org/activemq/browse/CAMEL-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=56667#action_56667 ]
Claus Ibsen edited comment on CAMEL-1472 at 12/28/09 2:33 PM: -------------------------------------------------------------- Hi Claus, Jon & Hadrian, I have created a new Apache Lucene Component & Query processor and have attached a patch along with a zip file containing the code for your review. I have also added the requisite unit tests and ensured that the code undergoes checkstyle validation. The component works as follows Lucene Producer: Index Creation example ---------------------------------------------------------- {code} context.addRoutes(new RouteBuilder() { public void configure() { from("direct:start"). to("lucene://stdQuotesIndex?analyzerRef=#stdAnalyzer&indexDir=#std&srcDir=#load_dir"). to("mock:result"); } }); {code} where each URI parameter setting does the following - analyzerRef: can be any valid implementation of Lucene Directory Analyzer (StandardAnalyzer, WhitespaceAnalyzer, StopAnalyzer... etc) - srcDir: an optional directory location for loading Text or XML documents at endpoint or Lucene Index creation. Once created the index can take any exchange body and store its contents in the index. Important Note: Lucene stipulates that the index be created upfront and then used in a read only mode later for any querying. Hence the index cannot be in flux during query processing. This requires the Lucene Producer to have received its payloads upfront and created the index before any queries can be logged against it. Since the URI settings cannot be directly passed, I pass them using the JNDI registry associated with the the Default Component (example shown below). Example: Providing values for the Lucene URI -------------------------------------------------------------- {code} @Override protected JndiRegistry createRegistry() throws Exception { JndiRegistry registry = new JndiRegistry(createJndiContext()); registry.bind("std", new File("target/stdindexDir")); registry.bind("load_dir", new File("src/test/resources/sources")); registry.bind("stdAnalyzer", new StandardAnalyzer(Version.LUCENE_CURRENT)); return registry; } {code} I have also added a Query Processor that is fully capable of running any queries (including wildcards etc) against a Lucene Document Index and present the results in a schema driven XML format (example provided below) Example: Query Processor for Lucene called LuceneSearcher ------------------------------------------------------------------------------------- {code} context.addRoutes(new RouteBuilder() { public void configure() { from("direct:start"). setHeader("QUERY", constant("Rodney Dangerfield")). process(new LuceneSearcher("target/stdindexDir", analyzer, null, 20)). to("mock:searchResult"); } }); {code} Example: Search Results presentation Format ---------------------------------------------------------------- {code:xml} <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <hits xmlns="http://camel.apache.org/lucene/SearchData"> <numberOfHits>2</numberOfHits> <hit> <number>1</number> <hitLocation>15</hitLocation> <score>0.9453935</score> <data>I worked in a pet store and people kept asking how big I?d get. - Rodney Dangerfield</data> </hit> <hit> <number>2</number> <hitLocation>13</hitLocation> <score>0.8272193</score> <data>I tell ya when I was a kid, all I knew was rejection. My yo-yo, it never came back. - Rodney Dangerfield</data> </hit> </hits> {code} I used the latest version of Lucene version 3.0 for the implementation but this can be moved up easily over time since I have no hard restrictions on Lucene versions. The API sets could be different moving backwards though. I have not verified this.... Lucene has undergone a lot of change in each subsequent version it seems :). The good news is that for the most part they offer backward compatibility for API's. Please find attached the patch as well as a zip file containing the code. Can you please review and please let me know what you think. I would be happy to update the documentation once I get your feedback and am happy to make any needed changes. Cheers, Ashwin... edit: updated to use code snippets was (Author: akarpe): Hi Claus, Jon & Hadrian, I have created a new Apache Lucene Component & Query processor and have attached a patch along with a zip file containing the code for your review. I have also added the requisite unit tests and ensured that the code undergoes checkstyle validation. The component works as follows Lucene Producer: Index Creation example ---------------------------------------------------------- context.addRoutes(new RouteBuilder() { public void configure() { from("direct:start"). to("lucene://stdQuotesIndex?analyzerRef=#stdAnalyzer&indexDir=#std&srcDir=#load_dir"). to("mock:result"); } }); where each URI parameter setting does the following - analyzerRef: can be any valid implementation of Lucene Directory Analyzer (StandardAnalyzer, WhitespaceAnalyzer, StopAnalyzer... etc) - srcDir: an optional directory location for loading Text or XML documents at endpoint or Lucene Index creation. Once created the index can take any exchange body and store its contents in the index. Important Note: Lucene stipulates that the index be created upfront and then used in a read only mode later for any querying. Hence the index cannot be in flux during query processing. This requires the Lucene Producer to have received its payloads upfront and created the index before any queries can be logged against it. Since the URI settings cannot be directly passed, I pass them using the JNDI registry associated with the the Default Component (example shown below). Example: Providing values for the Lucene URI -------------------------------------------------------------- @Override protected JndiRegistry createRegistry() throws Exception { JndiRegistry registry = new JndiRegistry(createJndiContext()); registry.bind("std", new File("target/stdindexDir")); registry.bind("load_dir", new File("src/test/resources/sources")); registry.bind("stdAnalyzer", new StandardAnalyzer(Version.LUCENE_CURRENT)); return registry; } I have also added a Query Processor that is fully capable of running any queries (including wildcards etc) against a Lucene Document Index and present the results in a schema driven XML format (example provided below) Example: Query Processor for Lucene called LuceneSearcher ------------------------------------------------------------------------------------- context.addRoutes(new RouteBuilder() { public void configure() { from("direct:start"). setHeader("QUERY", constant("Rodney Dangerfield")). process(new LuceneSearcher("target/stdindexDir", analyzer, null, 20)). to("mock:searchResult"); } }); Example: Search Results presentation Format ---------------------------------------------------------------- <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <hits xmlns="http://camel.apache.org/lucene/SearchData"> <numberOfHits>2</numberOfHits> <hit> <number>1</number> <hitLocation>15</hitLocation> <score>0.9453935</score> <data>I worked in a pet store and people kept asking how big I?d get. - Rodney Dangerfield</data> </hit> <hit> <number>2</number> <hitLocation>13</hitLocation> <score>0.8272193</score> <data>I tell ya when I was a kid, all I knew was rejection. My yo-yo, it never came back. - Rodney Dangerfield</data> </hit> </hits> I used the latest version of Lucene version 3.0 for the implementation but this can be moved up easily over time since I have no hard restrictions on Lucene versions. The API sets could be different moving backwards though. I have not verified this.... Lucene has undergone a lot of change in each subsequent version it seems :). The good news is that for the most part they offer backward compatibility for API's. Please find attached the patch as well as a zip file containing the code. Can you please review and please let me know what you think. I would be happy to update the documentation once I get your feedback and am happy to make any needed changes. Cheers, Ashwin... > Lucene Component > ---------------- > > Key: CAMEL-1472 > URL: https://issues.apache.org/activemq/browse/CAMEL-1472 > Project: Apache Camel > Issue Type: New Feature > Reporter: Claus Ibsen > Assignee: Ashwin Karpe > Fix For: Future > > Attachments: camel-lucene-20091227.patch, camel-lucene.zip > > > We should add a new component for Apache Lucene integration -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.