[jira] Commented: (JCR-1079) Extend the IndexingConfiguration to allow configuration of reuseable analyzers

Ard Schrijvers (JIRA) Wed, 29 Aug 2007 01:33:53 -0700

    [ 
https://issues.apache.org/jira/browse/JCR-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523505
 ]


Ard Schrijvers commented on JCR-1079:
-------------------------------------

I have implemented the first part, without enabling configuration of an 
analyzer in a index rule (because of the implication that it ends up being 
global, though if wanted, can add it. Let me know). Current configuration looks 
for example like:

<analyzers> 
        <analyzer class="org.apache.lucene.analysis.fr.FrenchAnalyzer">
            <property>test:fr_mytext</property>
            <property>test:fr_body</property>
        </analyzer>
        <analyzer class="org.apache.lucene.analysis.de.GermanAnalyzer">
            <property>test:de_mytext</property>
            <property>test:de_body</property>
        </analyzer>
        <analyzer class="org.apache.lucene.analysis.nl.DutchAnalyzer">
            <property>test:nl_mytext</property>
            <property>test:nl_body</property>
        </analyzer>
</analyzers>

Now, I want to add some tests showing the possible confusing difference in 
search results between a property and a node scope search. The only problem is 
that untill now, I do not see a indexing_configuration.xml in the jackrabbit 
trunk.  I do see a indexing-configuration-1.0.dtd only.

Is it possible add some tests with respect to some indexing_configuration.xml? 
It might be a problem that the indexing_configuration.xml holds for all the 
tests regarding the workspace then, or not? Has anybody an idea how to add some 
tests for the indexing_configuration.xml? 



> Extend the IndexingConfiguration to allow configuration of reuseable analyzers
> ------------------------------------------------------------------------------
>
>                 Key: JCR-1079
>                 URL: https://issues.apache.org/jira/browse/JCR-1079
>             Project: Jackrabbit
>          Issue Type: New Feature
>    Affects Versions: 1.3.1
>            Reporter: Ard Schrijvers
>            Priority: Minor
>             Fix For: 1.4
>
>
> To the indexing_configuration.xml a xml block of analyzers should be 
> configurable. In each <index-rule> to a property an analyzer can be assigned. 
> This means, that property will be analyzed with that specific analyzer. In 
> the first place, it enables multilingual indexing. 
> Documentation needs to be added explaining the difference in searching in the 
> node scope [jcr:contains(.,'foo')] and in some property 
> [jcr:contains(@myprop,'foo')]. The node scope will always be searched and 
> indexed with the default analyzer, which can be configured in the 
> workspace.xml in  the  <SearchIndex> element.
> Below a possible indexing_configuration.xml snippet is shown. Also node the 
> possible enhancement (not sure wether this implementation will have it, 
> because it requires a lot of filter Factories and is probably out of scope). 
> Adding custom filters which do not need a factory might be easier.
> <analyzers>
>       <analyzer name="fr" 
> class="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
>       <analyzer name="de" 
> class="org.apache.lucene.analysis.de.GermanAnalyzer"/>
>         <analyzer name="compound" 
> class="org.apache.lucene.analysis.SimpleAnalyzer">
>              <filter class="jr.StopFilterFactory" words="stopwords.txt"/>
>              <filter class="jr.EdgeNGramTokenizerFactory" side="front" 
> minGram="1" maxGram="2"/>
>         </analyzer>
> </analyzers>
> <index-rule nodeType="nt:unstructured">
>        <property analyzer="fr">bode_fr</property>
>        <property analyzer="de">bode_de</property>
> </index-rule>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-1079) Extend the IndexingConfiguration to allow configuration of reuseable analyzers

Reply via email to