[jira] Issue Comment Edited: (CAMEL-1472) Lucene Component

Claus Ibsen (JIRA) Mon, 28 Dec 2009 06:34:08 -0800

    [ 
https://issues.apache.org/activemq/browse/CAMEL-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=56667#action_56667
 ]


Claus Ibsen edited comment on CAMEL-1472 at 12/28/09 2:33 PM:
--------------------------------------------------------------

Hi Claus, Jon & Hadrian, 

I have created a new Apache Lucene Component & Query processor and have 
attached a patch along with a zip file containing the code for your review.  I 
have also added the requisite unit tests and ensured that the code undergoes 
checkstyle validation.

The component works as follows

Lucene Producer: Index Creation example
----------------------------------------------------------
{code}
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                from("direct:start").
                    
to("lucene://stdQuotesIndex?analyzerRef=#stdAnalyzer&indexDir=#std&srcDir=#load_dir").
                    to("mock:result");

            }
        });
{code}

where each URI parameter setting does the following 
       - analyzerRef:  can be any valid implementation of Lucene Directory 
Analyzer (StandardAnalyzer, WhitespaceAnalyzer, StopAnalyzer... etc)
       - srcDir: an optional directory location for loading Text or XML 
documents at endpoint or Lucene Index creation.    

Once created the index can take any exchange body and store its contents in the 
index.

Important Note: Lucene stipulates that the index be created upfront and then 
used in a read only mode later for any querying. Hence the index cannot be in 
flux during query processing. This requires the Lucene Producer to have 
received its payloads upfront and created the index before any queries can be 
logged against it.  

Since the URI settings cannot be directly passed, I pass them using the JNDI 
registry associated with the the Default Component (example shown below).  

Example: Providing values for the Lucene URI
--------------------------------------------------------------
{code}
    @Override
    protected JndiRegistry createRegistry() throws Exception {
        JndiRegistry registry = new JndiRegistry(createJndiContext());
        registry.bind("std", new File("target/stdindexDir"));
        registry.bind("load_dir", new File("src/test/resources/sources"));
        registry.bind("stdAnalyzer", new 
StandardAnalyzer(Version.LUCENE_CURRENT));
        return registry;
    }
{code}

I have also added a Query Processor that is fully capable of running any 
queries (including wildcards etc) against a Lucene Document Index and present 
the results in a schema driven XML format (example provided below)

Example:  Query Processor for Lucene called LuceneSearcher
-------------------------------------------------------------------------------------
{code}
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                
                from("direct:start").
                    setHeader("QUERY", constant("Rodney Dangerfield")).
                    process(new LuceneSearcher("target/stdindexDir", analyzer, 
null, 20)).
                    to("mock:searchResult");
            }
        });  
{code}

Example: Search Results presentation Format
----------------------------------------------------------------
{code:xml}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<hits xmlns="http://camel.apache.org/lucene/SearchData";>
      <numberOfHits>2</numberOfHits>
      <hit>
             <number>1</number>
             <hitLocation>15</hitLocation>
             <score>0.9453935</score>
             <data>I worked in a pet store and people kept asking how big I?d 
get. - Rodney Dangerfield</data>
      </hit>
      <hit>
              <number>2</number>
              <hitLocation>13</hitLocation>
              <score>0.8272193</score>
              <data>I tell ya when I was a kid, all I knew was rejection. My 
yo-yo, it never came back. - Rodney Dangerfield</data>
      </hit>
</hits>
{code}

I used the latest version of Lucene version 3.0 for the implementation but this 
can be moved up easily over time since I have no hard restrictions on Lucene 
versions. The API sets could be different moving backwards though. I have not 
verified this.... Lucene has undergone a lot of change in each subsequent 
version it seems :). The good news is that for the most part they offer 
backward compatibility for API's.

Please find attached the patch as well as a zip file containing the code.

Can you please review and please let me know what you think. I would be happy 
to update the documentation once I get your feedback and am happy to make any 
needed changes.

Cheers,

Ashwin...

edit: updated to use code snippets

      was (Author: akarpe):
    Hi Claus, Jon & Hadrian, 

I have created a new Apache Lucene Component & Query processor and have 
attached a patch along with a zip file containing the code for your review.  I 
have also added the requisite unit tests and ensured that the code undergoes 
checkstyle validation.

The component works as follows

Lucene Producer: Index Creation example
----------------------------------------------------------
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                from("direct:start").
                    
to("lucene://stdQuotesIndex?analyzerRef=#stdAnalyzer&indexDir=#std&srcDir=#load_dir").
                    to("mock:result");

            }
        });

where each URI parameter setting does the following 
       - analyzerRef:  can be any valid implementation of Lucene Directory 
Analyzer (StandardAnalyzer, WhitespaceAnalyzer, StopAnalyzer... etc)
       - srcDir: an optional directory location for loading Text or XML 
documents at endpoint or Lucene Index creation.    

Once created the index can take any exchange body and store its contents in the 
index.

Important Note: Lucene stipulates that the index be created upfront and then 
used in a read only mode later for any querying. Hence the index cannot be in 
flux during query processing. This requires the Lucene Producer to have 
received its payloads upfront and created the index before any queries can be 
logged against it.  

Since the URI settings cannot be directly passed, I pass them using the JNDI 
registry associated with the the Default Component (example shown below).  

Example: Providing values for the Lucene URI
--------------------------------------------------------------
    @Override
    protected JndiRegistry createRegistry() throws Exception {
        JndiRegistry registry = new JndiRegistry(createJndiContext());
        registry.bind("std", new File("target/stdindexDir"));
        registry.bind("load_dir", new File("src/test/resources/sources"));
        registry.bind("stdAnalyzer", new 
StandardAnalyzer(Version.LUCENE_CURRENT));
        return registry;
    }

I have also added a Query Processor that is fully capable of running any 
queries (including wildcards etc) against a Lucene Document Index and present 
the results in a schema driven XML format (example provided below)

Example:  Query Processor for Lucene called LuceneSearcher
-------------------------------------------------------------------------------------
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                
                from("direct:start").
                    setHeader("QUERY", constant("Rodney Dangerfield")).
                    process(new LuceneSearcher("target/stdindexDir", analyzer, 
null, 20)).
                    to("mock:searchResult");
            }
        });  

Example: Search Results presentation Format
----------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<hits xmlns="http://camel.apache.org/lucene/SearchData";>
      <numberOfHits>2</numberOfHits>
      <hit>
             <number>1</number>
             <hitLocation>15</hitLocation>
             <score>0.9453935</score>
             <data>I worked in a pet store and people kept asking how big I?d 
get. - Rodney Dangerfield</data>
      </hit>
      <hit>
              <number>2</number>
              <hitLocation>13</hitLocation>
              <score>0.8272193</score>
              <data>I tell ya when I was a kid, all I knew was rejection. My 
yo-yo, it never came back. - Rodney Dangerfield</data>
      </hit>
</hits>

I used the latest version of Lucene version 3.0 for the implementation but this 
can be moved up easily over time since I have no hard restrictions on Lucene 
versions. The API sets could be different moving backwards though. I have not 
verified this.... Lucene has undergone a lot of change in each subsequent 
version it seems :). The good news is that for the most part they offer 
backward compatibility for API's.

Please find attached the patch as well as a zip file containing the code.

Can you please review and please let me know what you think. I would be happy 
to update the documentation once I get your feedback and am happy to make any 
needed changes.

Cheers,

Ashwin...
  
> Lucene Component
> ----------------
>
>                 Key: CAMEL-1472
>                 URL: https://issues.apache.org/activemq/browse/CAMEL-1472
>             Project: Apache Camel
>          Issue Type: New Feature
>            Reporter: Claus Ibsen
>            Assignee: Ashwin Karpe
>             Fix For: Future
>
>         Attachments: camel-lucene-20091227.patch, camel-lucene.zip
>
>
> We should add a new component for Apache Lucene integration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CAMEL-1472) Lucene Component

Reply via email to