[jira] Issue Comment Edited: (CAMEL-1472) Lucene Component

Ashwin Karpe (JIRA) Sun, 27 Dec 2009 22:48:11 -0800

    [ 
https://issues.apache.org/activemq/browse/CAMEL-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=56667#action_56667
 ]


Ashwin Karpe edited comment on CAMEL-1472 at 12/28/09 6:47 AM:
---------------------------------------------------------------

Hi Claus, Jon & Hadrian, 

I have created a new Apache Lucene Component & Query processor and have 
attached a patch along with a zip file containing the code for your review.  I 
have also added the requisite unit tests and ensured that the code undergoes 
checkstyle validation.

The component works as follows

Lucene Producer: Index Creation example
----------------------------------------------------------
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                from("direct:start").
                    
to("lucene://stdQuotesIndex?analyzerRef=#stdAnalyzer&indexDir=#std&srcDir=#load_dir").
                    to("mock:result");

            }
        });

where each URI parameter setting does the following 
       - analyzerRef:  can be any valid implementation of Lucene Directory 
Analyzer (StandardAnalyzer, WhitespaceAnalyzer, StopAnalyzer... etc)
       - srcDir: an optional directory location for loading Text or XML 
documents at endpoint or Lucene Index creation.    

Since these settings cannot be directly passed into the URI, I pass them using 
the JNDI registry associated with the the Default Component (example shown 
below).  

Example: Providing values for the Lucene URI
--------------------------------------------------------------
    @Override
    protected JndiRegistry createRegistry() throws Exception {
        JndiRegistry registry = new JndiRegistry(createJndiContext());
        registry.bind("std", new File("target/stdindexDir"));
        registry.bind("load_dir", new File("src/test/resources/sources"));
        registry.bind("stdAnalyzer", new 
StandardAnalyzer(Version.LUCENE_CURRENT));
        return registry;
    }

I have also added a Query Processor that is fully capable of running any 
queries (including wildcards etc) against a Lucene Document Index and present 
the results in a schema driven XML format (example provided below)

Example:  Query Processor for Lucene called LuceneSearcher
-------------------------------------------------------------------------------------
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                
                from("direct:start").
                    setHeader("QUERY", constant("Rodney Dangerfield")).
                    process(new LuceneSearcher("target/stdindexDir", analyzer, 
null, 20)).
                    to("mock:searchResult");
            }
        });  

Example: Search Results presentation Format
----------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<hits xmlns="http://camel.apache.org/lucene/SearchData";>
      <numberOfHits>2</numberOfHits>
      <hit>
             <number>1</number>
             <hitLocation>15</hitLocation>
             <score>0.9453935</score>
             <data>I worked in a pet store and people kept asking how big I?d 
get. - Rodney Dangerfield</data>
      </hit>
      <hit>
              <number>2</number>
              <hitLocation>13</hitLocation>
              <score>0.8272193</score>
              <data>I tell ya when I was a kid, all I knew was rejection. My 
yo-yo, it never came back. - Rodney Dangerfield</data>
      </hit>
</hits>

I used the latest version of Lucene version 3.0 for the implementation but this 
can be moved up easily over time since I have no hard restrictions on Lucene 
versions. The API sets could be different moving backwards though. I have not 
verified this.... Lucene has undergone a lot of change in each subsequent 
version it seems :). The good news is that for the most part they offer 
backward compatibility for API's.

Please find attached the patch as well as a zip file containing the code.

Can you please review and please let me know what you think. I would be happy 
to update the documentation once I get your feedback and am happy to make any 
needed changes.

Cheers,

Ashwin...

      was (Author: akarpe):
    Hi Claus, Jon & Hadrian, 

I have created a new Apache Lucene Component & Query processor and have 
attached a patch along with a zip file containing the code for your review.  I 
have also added the requisite unit tests and ensured that the code undergoes 
checkstyle validation.

The component works as follows

<code>
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                from("direct:start").
                    
to("lucene://stdQuotesIndex?analyzerRef=#stdAnalyzer&indexDir=#std&srcDir=#load_dir").
                    to("mock:result");

            }
        });
</code>

where each URI parameter setting does the following 
       - analyzerRef:  can be any valid implementation of Lucene Directory 
Analyzer (StandardAnalyzer, WhitespaceAnalyzer, StopAnalyzer... etc)
       - srcDir: an optional directory location for loading Text or XML 
documents at endpoint or Lucene Index creation.    

Since these settings cannot be directly passed into the URI, I pass them using 
the JNDI registry associated with the the Default Component (example shown 
below).  

<code>
    @Override
    protected JndiRegistry createRegistry() throws Exception {
        JndiRegistry registry = new JndiRegistry(createJndiContext());
        registry.bind("std", new File("target/stdindexDir"));
        registry.bind("load_dir", new File("src/test/resources/sources"));
        registry.bind("stdAnalyzer", new 
StandardAnalyzer(Version.LUCENE_CURRENT));
        return registry;
    }
</code>

I used the latest version of Lucene version 3.0 for the implementation but this 
can be moved up easily over time since I have no hard restrictions on Lucene 
versions. The API sets could be different moving backwards though. I have not 
verified this.... Lucene has undergone a lot of change in each subsequent 
version it seems :).

Please find attached the patch as well as a zip file containing the code.

Can you please review and please let me know what you think. I would be happy 
to update the documentation once I get your feedback and am happy to make any 
needed changes.

Cheers,

Ashwin...
  
> Lucene Component
> ----------------
>
>                 Key: CAMEL-1472
>                 URL: https://issues.apache.org/activemq/browse/CAMEL-1472
>             Project: Apache Camel
>          Issue Type: New Feature
>            Reporter: Claus Ibsen
>            Assignee: Ashwin Karpe
>             Fix For: Future
>
>         Attachments: camel-lucene-20091227.patch, camel-lucene.zip
>
>
> We should add a new component for Apache Lucene integration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CAMEL-1472) Lucene Component

Reply via email to