[jira] Created: (JCR-1079) Extend the IndexingConfiguration to allow configuration of reuseable analyzers

Ard Schrijvers (JIRA) Thu, 23 Aug 2007 01:50:53 -0700

Extend the IndexingConfiguration to allow configuration of reuseable analyzers
------------------------------------------------------------------------------


                 Key: JCR-1079
                 URL: https://issues.apache.org/jira/browse/JCR-1079
             Project: Jackrabbit
          Issue Type: New Feature
    Affects Versions: 1.3.1
            Reporter: Ard Schrijvers
            Priority: Minor
             Fix For: 1.4


To the indexing_configuration.xml a xml block of analyzers should be 
configurable. In each <index-rule> to a property an analyzer can be assigned. 
This means, that property will be analyzed with that specific analyzer. In the 
first place, it enables multilingual indexing. 

Documentation needs to be added explaining the difference in searching in the 
node scope [jcr:contains(.,'foo')] and in some property 
[jcr:contains(@myprop,'foo')]. The node scope will always be searched and 
indexed with the default analyzer, which can be configured in the workspace.xml 
in  the  <SearchIndex> element.

Below a possible indexing_configuration.xml snippet is shown. Also node the 
possible enhancement (not sure wether this implementation will have it, because 
it requires a lot of filter Factories and is probably out of scope). Adding 
custom filters which do not need a factory might be easier.

<analyzers>
        <analyzer name="fr" 
class="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
        <analyzer name="de" 
class="org.apache.lucene.analysis.de.GermanAnalyzer"/>
        <analyzer name="compound" 
class="org.apache.lucene.analysis.SimpleAnalyzer">
             <filter class="jr.StopFilterFactory" words="stopwords.txt"/>
             <filter class="jr.EdgeNGramTokenizerFactory" side="front" 
minGram="1" maxGram="2"/>
        </analyzer>
</analyzers>

<index-rule nodeType="nt:unstructured">
       <property analyzer="fr">bode_fr</property>
       <property analyzer="de">bode_de</property>
</index-rule>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (JCR-1079) Extend the IndexingConfiguration to allow configuration of reuseable analyzers

Reply via email to