[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346292#comment-14346292
 ] 

Trey Grainger commented on SOLR-6492:
-------------------------------------

Hi Kranti,

The design almost exactly as you described when you said "have analysis chains 
defined in schema.xml and these chains could be resued between multiple fields 
and on each field there should be a way to conditionally chose the analysis 
chain". Specifically, each analysis chain is just defined as a FieldType, like 
you would define any analysis chain you were going assign to a field.

What I hadn't considered yet, however, was having the update processor choose 
choose the analyzers based upon a value in another field.  I had previously 
only been considering the case where a user would either:
1) Use an automatic language identifier update processor, or
2) Pass the language in directly in the content of the field. (i.e. <field 
name="my_field">en,es|document content here</field>). 

Having the ability to specify the key for the analyzers in a different field 
would probably be more user friendly, and this would be trivial to implement, 
so I can look to add it. Something like this:
<field name="my_field">document content here</field>
<field name="language">en</field>
<field name="language">es</field>

Is that what you were hoping for?

> Solr field type that supports multiple, dynamic analyzers
> ---------------------------------------------------------
>
>                 Key: SOLR-6492
>                 URL: https://issues.apache.org/jira/browse/SOLR-6492
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Trey Grainger
>             Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to 
> support one or more dynamically-selected analyzers for a field. For example, 
> someone may have a "content" field and pass in a document in Greek (using an 
> Analyzer with Tokenizer/Filters for German), a separate document in English 
> (using an English Analyzer), and possibly even a field with mixed-language 
> content in Greek and English. This latter case could pass the content 
> separately through both an analyzer defined for Greek and another Analyzer 
> defined for English, stacking or concatenating the token streams based upon 
> the use-case.
> There are some distinct advantages in terms of index size and query 
> performance which can be obtained by stacking terms from multiple analyzers 
> in the same field instead of duplicating content in separate fields and 
> searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a 
> different analyzer for the same field to remove a feature (i.e. turning 
> on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to