[
https://issues.apache.org/jira/browse/JCR-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522998
]
Ard Schrijvers commented on JCR-1079:
-------------------------------------
Suggested configuration gives complication for implementation:
Since I need to be able to know which analyzer to use based *only* on the the
string representation (JCR-style name) of the given property, I can only set an
analyzer for a certain property for the entire workspace, and not based on a
single index-rule.
This means that if I use
<index-rule nodeType="nt:unstructured">
<property analyzer="fr">myNs:bode_fr</property>
<property analyzer="de">myNs:bode_de</property>
</index-rule>
that the analyzer for property myNs:bode_fr is set for the entire workspace for
all other index-rule's as well.
Therefore, I would like to suggest to add the properties to index with a
certain analyzer to the analyzer configuration, so
<analyzer name="fr" class="org.apache.lucene.analysis.fr.FrenchAnalyzer">
<property>myNs:bode_fr</property>
<property>myNs:intro_fr</property>
</analyzer>
This means, nothing changes to the index configuration apart from an analyzer
xml block, where workspace global analyzers for certain properties are defined.
WDOT?
> Extend the IndexingConfiguration to allow configuration of reuseable analyzers
> ------------------------------------------------------------------------------
>
> Key: JCR-1079
> URL: https://issues.apache.org/jira/browse/JCR-1079
> Project: Jackrabbit
> Issue Type: New Feature
> Affects Versions: 1.3.1
> Reporter: Ard Schrijvers
> Priority: Minor
> Fix For: 1.4
>
>
> To the indexing_configuration.xml a xml block of analyzers should be
> configurable. In each <index-rule> to a property an analyzer can be assigned.
> This means, that property will be analyzed with that specific analyzer. In
> the first place, it enables multilingual indexing.
> Documentation needs to be added explaining the difference in searching in the
> node scope [jcr:contains(.,'foo')] and in some property
> [jcr:contains(@myprop,'foo')]. The node scope will always be searched and
> indexed with the default analyzer, which can be configured in the
> workspace.xml in the <SearchIndex> element.
> Below a possible indexing_configuration.xml snippet is shown. Also node the
> possible enhancement (not sure wether this implementation will have it,
> because it requires a lot of filter Factories and is probably out of scope).
> Adding custom filters which do not need a factory might be easier.
> <analyzers>
> <analyzer name="fr"
> class="org.apache.lucene.analysis.fr.FrenchAnalyzer"/>
> <analyzer name="de"
> class="org.apache.lucene.analysis.de.GermanAnalyzer"/>
> <analyzer name="compound"
> class="org.apache.lucene.analysis.SimpleAnalyzer">
> <filter class="jr.StopFilterFactory" words="stopwords.txt"/>
> <filter class="jr.EdgeNGramTokenizerFactory" side="front"
> minGram="1" maxGram="2"/>
> </analyzer>
> </analyzers>
> <index-rule nodeType="nt:unstructured">
> <property analyzer="fr">bode_fr</property>
> <property analyzer="de">bode_de</property>
> </index-rule>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.