[ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190715#comment-14190715 ]
Trey Grainger commented on SOLR-6492: ------------------------------------- Hi Sharon, Your question was which code that will parse "df=someMultiTextField|en,de" and decide which analysis chain to use. In short, since FieldTypes have access to the schema but Analyzers and Tokenizers don't, I'm creating a new FieldType which passes the schema into a new Analyzer, which can then pass the schema into the new Tokenizer. When the Tokenizer is used, the fieldname (string) and value (reader) are passed in, so it is possible to pull the metadata ("|en,de") off of either of these and dynamically choose a new analysis chain analyzer from the schema at that time. I've done this work already for pulling data out of the field content (so I know that works), but pulling the metadata from the fieldname is still pending (I'm hoping to work on it this weekend). If you want to see what I've done thusfar, you can look on github at MultiTextField, MultiTextFieldAnalyzer, and MultiTextFieldTokenizer: https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextField.java https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextFieldAnalyzer.java https://github.com/treygrainger/solr-in-action/blob/master/src/main/java/sia/ch14/MultiTextFieldTokenizer.java I have some questions / feedback on your proposed solution... I'm hopping on a plane now but will post them later tonight. Thanks, Trey Grainger Co-author, Solr in Action Director of Engineering, Search & Analytics @CareerBuilder On Thu, Oct 30, 2014 at 7:32 AM, Sharon Krisher (JIRA) <j...@apache.org> > Solr field type that supports multiple, dynamic analyzers > --------------------------------------------------------- > > Key: SOLR-6492 > URL: https://issues.apache.org/jira/browse/SOLR-6492 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis > Reporter: Trey Grainger > Fix For: 5.0 > > > A common request - particularly for multilingual search - is to be able to > support one or more dynamically-selected analyzers for a field. For example, > someone may have a "content" field and pass in a document in Greek (using an > Analyzer with Tokenizer/Filters for German), a separate document in English > (using an English Analyzer), and possibly even a field with mixed-language > content in Greek and English. This latter case could pass the content > separately through both an analyzer defined for Greek and another Analyzer > defined for English, stacking or concatenating the token streams based upon > the use-case. > There are some distinct advantages in terms of index size and query > performance which can be obtained by stacking terms from multiple analyzers > in the same field instead of duplicating content in separate fields and > searching across multiple fields. > Other non-multilingual use cases may include things like switching to a > different analyzer for the same field to remove a feature (i.e. turning > on/off query-time synonyms against the same field on a per-query basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org