[jira] Commented: (JCR-1079) Extend the IndexingConfiguration to allow configuration of reuseable analyzers

Ard Schrijvers (JIRA) Fri, 31 Aug 2007 06:30:54 -0700

    [ 
https://issues.apache.org/jira/browse/JCR-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524083
 ]


Ard Schrijvers commented on JCR-1079:
-------------------------------------

>For testing you can create a new workspace under 
>jackrabbit-core/applications/test/workspaces. We already have two workspaces, 
>which >are used for the JCR tests. Just create a folder, put a workspace.xml 
>and an indexing_configuration.xml in there. You also might have to >adapt the 
>build script, otherwise it probably remove the files again when you call: mvn 
>clean.

I already did this, so testing for me is not a problem. What I was referring 
to, was to have it available in unit tests in trunk, so people can play with 
it. But, then, this configuration xml is applied to all the other unit tests. 

Anyway, I'll create a patch for the working configurable properties analyzers 
monday, and add a indexing_configuration.xml that can be used for testing. 

Do you still want me to add the possibility for defining an analyzer in an 
indexing-rule (though defining an analyzer in an indexing-rule for a property 
will imply that this analyzer is also used for this property outside the 
specific indexing-rule)?

> Extend the IndexingConfiguration to allow configuration of reuseable analyzers
> ------------------------------------------------------------------------------
>
>                 Key: JCR-1079
>                 URL: https://issues.apache.org/jira/browse/JCR-1079
>             Project: Jackrabbit
>          Issue Type: New Feature
>    Affects Versions: 1.3.1
>            Reporter: Ard Schrijvers
>            Priority: Minor
>             Fix For: 1.4
>
>
> To the indexing_configuration.xml a xml block of analyzers should be 
> configurable. In each <index-rule> to a property an analyzer can be assigned. 
> This means, that property will be analyzed with that specific analyzer. In 
> the first place, it enables multilingual indexing. 
> Documentation needs to be added explaining the difference in searching in the 
> node scope [jcr:contains(.,'foo')] and in some property 
> [jcr:contains(@myprop,'foo')]. The node scope will always be searched and 
> indexed with the default analyzer, which can be configured in the 
> workspace.xml in  the  <SearchIndex> element.
> Below a possible indexing_configuration.xml snippet is shown. Also node the 
> possible enhancement (not sure wether this implementation will have it, 
> because it requires a lot of filter Factories and is probably out of scope). 
> Adding custom filters which do not need a factory might be easier.
> <analyzers>
>       <analyzer name="fr" 
> class="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
>       <analyzer name="de" 
> class="org.apache.lucene.analysis.de.GermanAnalyzer"/>
>         <analyzer name="compound" 
> class="org.apache.lucene.analysis.SimpleAnalyzer">
>              <filter class="jr.StopFilterFactory" words="stopwords.txt"/>
>              <filter class="jr.EdgeNGramTokenizerFactory" side="front" 
> minGram="1" maxGram="2"/>
>         </analyzer>
> </analyzers>
> <index-rule nodeType="nt:unstructured">
>        <property analyzer="fr">bode_fr</property>
>        <property analyzer="de">bode_de</property>
> </index-rule>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-1079) Extend the IndexingConfiguration to allow configuration of reuseable analyzers

Reply via email to