[
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904584#comment-16904584
]
Tomoko Uchida commented on SOLR-13593:
--------------------------------------
ICU factory "name" argument was changed to "form" on the master branch, so the
factories can be looked up by names (with "form" attributes to specify
normalization form) like this:
{code:xml}
<fieldType name="text_ws_icucf" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<charFilter name="icuNormalizer2" form="nfkc"/>
<tokenizer name="whitespace"/>
</analyzer>
</fieldType>
<fieldType name="text_ws_icutf" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer name="whitespace"/>
<filter name="icuNormalizer2" form="nfkc"/>
</analyzer>
</fieldType>
{code}
Corresponding field types using "class" are:
{code:xml}
<fieldType name="text_ws_icucf" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<charFilter class="solr.ICUNormalizer2CharFilterFactory" form="nfkc"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_ws_icutf" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ICUNormalizer2FilterFactory" form="nfkc"
mode="compose"/>
</analyzer>
</fieldType>
{code}
This works for me and the branch passed entire test. I will merge the all
changes to the master branch soon.
> Allow to specify analyzer components by their SPI names in schema definition
> ----------------------------------------------------------------------------
>
> Key: SOLR-13593
> URL: https://issues.apache.org/jira/browse/SOLR-13593
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Reporter: Tomoko Uchida
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored
> in the static "NAME" field (LUCENE-8778).
> Solr uses factories' simple class name in schema definition (like
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> <fieldtype name="myfieldtype" class="solr.TextField">
> <analyzer>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"
> />
> <filter class="solr.PorterStemFilterFactory" />
> </analyzer>
> </fieldtype>
> {code}
> would be
> {code:xml}
> <fieldtype name="myfieldtype" class="solr.TextField">
> <analyzer>
> <tokenizer name="whitespace"/>
> <filter name="keywordMarker" protected="protwords.txt" />
> <filter name="porterStem" />
> </analyzer>
> </fieldtype>
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]