[jira] [Updated] (SOLR-14434) Add documentation for adding multiterm analyzers in Schema API

Trey Grainger (Jira) Thu, 23 Apr 2020 23:28:43 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-14434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Trey Grainger updated SOLR-14434:
---------------------------------
    Affects Version/s:     (was: 8.5.1)
                           (was: 8.4.1)
                           (was: 8.5)
                           (was: 8.3.1)
                           (was: 8.4)
                           (was: 8.3)
                           (was: 8.1.1)
                           (was: 8.2)
                           (was: 8.1)
                           (was: 8.0)
                       8.6

> Add documentation for adding multiterm analyzers in Schema API
> --------------------------------------------------------------
>
>                 Key: SOLR-14434
>                 URL: https://issues.apache.org/jira/browse/SOLR-14434
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Schema and Analysis
>    Affects Versions: 8.6
>            Reporter: Trey Grainger
>            Priority: Major
>
> Originally this was filed as a bug report, but upon further inspection I 
> realized the usage was just undocumented and just a result of inconsistent 
> property name (casing) between the XML and JSON. Changing this to a Jira to 
> add documentation so others don't run into this issue in the future.
> Also need to document that the "analysis/field" API ignores {{multiterm}} 
> analysis and thus doesn't reflect the full nature of incoming queries. This 
> has been an annoying quirk for years and I think would be worth fixing, but 
> for now we should at least document it.
> --------------
> In addition to "{{index}}" and "{{query}}" analyzers, Solr supports adding an 
> explicit "{{multiterm}}" analyzer to schema {{fieldType}} definitions. This 
> allows for specific control over analysis for things like wildcard terms, 
> prefix queries, range queries, etc. For example, the following would cause 
> the wildcard query for "{{hats*}}" to get stemmed to "{{hat*}}" instead of 
> "{{hats*}}", and thus match on the indexed version of "{{hat}}".
> {code:java}
>   <fieldType class="solr.TextField" multiValued="true" name="multiterm_test" 
> positionIncrementGap="100" termOffsets="true" termVectors="true">
>     <analyzer type="index">
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.SynonymGraphFilterFactory" expand="true" 
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     </analyzer>
>     <analyzer type="multiterm">
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.EnglishMinimalStemFilterFactory"/>
>     </analyzer>
>   </fieldType>{code}
> In the xml version this analyzer is called "{{multiterm}}", whereas it's 
> "{{multiTerm}}" in the JsonAPI. This isn't in the documentation anywhere and 
> just cost me a bunch of time debugging through the code until I finally found 
> what was going on. Using this ticket to add better documentation around usage 
> and gotchas around this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14434) Add documentation for adding multiterm analyzers in Schema API

Reply via email to