[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

Erick Erickson (JIRA) Fri, 15 Dec 2017 08:43:41 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292780#comment-16292780
 ]


Erick Erickson commented on SOLR-6492:
--------------------------------------

Another application of this that just crossed my mind is the old "exact match 
when stemming" process. 
KeywordRepeatFilterFactory>>stemmer>>RemoveDuplicatesTokenFilterFactory at 
index time and then two analysis chains at query time, one with the stemmer and 
one without.

Still not perfect, if I index "running" and then search for "run" I'd get a 
match on the stemmed version. It would handle the case of indexing "run" and 
searching (exact match) on "running" and some of the other more surprising 
effects of stemming.

> Solr field type that supports multiple, dynamic analyzers
> ---------------------------------------------------------
>
>                 Key: SOLR-6492
>                 URL: https://issues.apache.org/jira/browse/SOLR-6492
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Trey Grainger
>             Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to 
> support one or more dynamically-selected analyzers for a field. For example, 
> someone may have a "content" field and pass in a document in Greek (using an 
> Analyzer with Tokenizer/Filters for German), a separate document in English 
> (using an English Analyzer), and possibly even a field with mixed-language 
> content in Greek and English. This latter case could pass the content 
> separately through both an analyzer defined for Greek and another Analyzer 
> defined for English, stacking or concatenating the token streams based upon 
> the use-case.
> There are some distinct advantages in terms of index size and query 
> performance which can be obtained by stacking terms from multiple analyzers 
> in the same field instead of duplicating content in separate fields and 
> searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a 
> different analyzer for the same field to remove a feature (i.e. turning 
> on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

Reply via email to