[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

Trey Grainger (JIRA) Thu, 30 Oct 2014 13:00:07 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190715#comment-14190715
 ]


Trey Grainger commented on SOLR-6492:
-------------------------------------

Hi Sharon,

Your question was which code that will parse "df=someMultiTextField|en,de"
and decide which analysis chain to use. In short, since FieldTypes have
access to the schema but Analyzers and Tokenizers don't, I'm creating a new
FieldType which passes the schema into a new Analyzer, which can then pass
the schema into the new Tokenizer. When the Tokenizer is used, the
fieldname (string) and value (reader) are passed in, so it is possible to
pull the metadata ("|en,de") off of either of these and dynamically choose
a new analysis chain analyzer from the schema at that time.

I've done this work already for pulling data out of the field content (so I
know that works), but pulling the metadata from the fieldname is still
pending (I'm hoping to work on it this weekend). If you want to see what
I've done thusfar, you can look on github at MultiTextField,
MultiTextFieldAnalyzer, and MultiTextFieldTokenizer:
https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextField.java
https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextFieldAnalyzer.java
https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextFieldTokenizer.java

I have some questions / feedback on your proposed solution... I'm hopping
on a plane now but will post them later tonight.

Thanks,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search & Analytics @CareerBuilder


On Thu, Oct 30, 2014 at 7:32 AM, Sharon Krisher (JIRA) <j...@apache.org>



> Solr field type that supports multiple, dynamic analyzers
> ---------------------------------------------------------
>
>                 Key: SOLR-6492
>                 URL: https://issues.apache.org/jira/browse/SOLR-6492
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Trey Grainger
>             Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to 
> support one or more dynamically-selected analyzers for a field. For example, 
> someone may have a "content" field and pass in a document in Greek (using an 
> Analyzer with Tokenizer/Filters for German), a separate document in English 
> (using an English Analyzer), and possibly even a field with mixed-language 
> content in Greek and English. This latter case could pass the content 
> separately through both an analyzer defined for Greek and another Analyzer 
> defined for English, stacking or concatenating the token streams based upon 
> the use-case.
> There are some distinct advantages in terms of index size and query 
> performance which can be obtained by stacking terms from multiple analyzers 
> in the same field instead of duplicating content in separate fields and 
> searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a 
> different analyzer for the same field to remove a feature (i.e. turning 
> on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

Reply via email to