[
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189951#comment-14189951
]
Sharon Krisher commented on SOLR-6492:
--------------------------------------
Hi Trey,
In section (1) of "improvements to be made before the patch is finalized": when
you pass the parameter df=someMultiTextField|en,de , which code will then parse
it and use it to decide which analysis chain to use ?
I have a similar issue in my project, where I need to pass the user's language
to my custom analyzer so it will be able to invoke language-specific analysis
code.
I thought of the following solution:
- Pass the query language as a local parameter of the query param (or as part
of the df param as in your suggestion)
- Create a custom parser (that extends the extended dismax parser). In this
parser, read the query language from the local params of the request (parsers
receive the local params and the request params in their constructor) and store
the language in a ThreadLocal variable.
- In my analyzer code, access the ThreadLocal variable and get the language
from there.
Do you see an issue with this solution?
> Solr field type that supports multiple, dynamic analyzers
> ---------------------------------------------------------
>
> Key: SOLR-6492
> URL: https://issues.apache.org/jira/browse/SOLR-6492
> Project: Solr
> Issue Type: New Feature
> Components: Schema and Analysis
> Reporter: Trey Grainger
> Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to
> support one or more dynamically-selected analyzers for a field. For example,
> someone may have a "content" field and pass in a document in Greek (using an
> Analyzer with Tokenizer/Filters for German), a separate document in English
> (using an English Analyzer), and possibly even a field with mixed-language
> content in Greek and English. This latter case could pass the content
> separately through both an analyzer defined for Greek and another Analyzer
> defined for English, stacking or concatenating the token streams based upon
> the use-case.
> There are some distinct advantages in terms of index size and query
> performance which can be obtained by stacking terms from multiple analyzers
> in the same field instead of duplicating content in separate fields and
> searching across multiple fields.
> Other non-multilingual use cases may include things like switching to a
> different analyzer for the same field to remove a feature (i.e. turning
> on/off query-time synonyms against the same field on a per-query basis).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]