Hi Isaac,
In the process of writing Solr in Action (http://solrinaction.com), I have
built the solution to SOLR-5053 for the multilingual search chapter (I
didn't realize this ticket existed at the time). The solution was
something I called a MultiTextField. Essentially, the field let's you
map a list of defined pre-fixes to field types and dynamically substitute
in one or more field types based upon the incoming content.
For example:
#schema.xml#
fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
es:text_spanish,
fr:text_french/
fieldType name=text_english ... /
fieldType name=text_spanish ... /
fieldType name=text_french ... /
field name=content type=multiText indexed=true ... /
#document#
adddoc
field name=id1/field
field name=contenten,es|the schools, la escuala/field
/doc/add
#Outputted Token Stream#:
[Position 1] [Position 2] [Position 3] [Position 4]
the school la
escuela
schools
escuel
#query on two languages#
q=en,es|la OR en,es|escuela
Essentially, this MultiText field type lets you dynamically combine one or
more Analyzers (from a defined field type) and stack the tokens based upon
term positions within each independent Analyzer. The use case here was
multiple
To answer your original question... at query time, this implementation
requires that you pass the prefix before EACH term in the query, not just
the first term (you can see this in the q= I demonstrated above). If you
have a Token Filter you have developed, you could probably accomplish
what you are trying to do the same way.
You could write a custom QParserPlugin that would do this for you I think.
Alternatively, it may be possible to create a similar implementation that
makes use of a dynamic field name (i.e. content|en,fr as the field
name), which would pull the prefix from the field name and apply it to all
tokens instead of requiring/allowing each token to specify it's own prefix.
I haven't done this in my implementation, but I could see where it might
be more user-friendly for many Solr users.
I'm just finishing up the multilingual search chapter and code now and
will be happy to post it to SOLR-5053 once I finish in the next few days if
this would be helpful to you.
-Trey
On Sat, Sep 21, 2013 at 4:15 PM, Isaac Hebsh isaac.he...@gmail.com wrote:
Thought about that again,
We can do this work as a search component, manipulating the query string.
The cons are the double QParser work, and the double tokenization work.
Another approach which might solve this issue easily is Dynamic query
analyze chain: https://issues.apache.org/jira/browse/SOLR-5053
What would you do?
On Tue, Sep 17, 2013 at 10:31 PM, Isaac Hebsh isaac.he...@gmail.com
wrote:
Hi everyone,
We developed a TokenFilter.
It should act differently, depends on a parameter supplied in the
query (for query chain only, not the index one, of course).
We found no way to pass that parameter into the TokenFilter flow. I guess
that the root cause is because TokenFilter is a pure lucene object.
As a last resort, we tried to pass the parameter as the first term in the
query text (q=...), and save it as a member of the TokenFilter instance.
Although it is ugly, it might work fine.
But, the problem is that it is not guaranteed that all the terms of a
particular query will be analyzed by the same instance of a TokenFilter.
In
this case, some terms will be analyzed without the required information
of
that parameter. We can produce such a race very easily.
How should I overcome this issue?
Do anyone have a better resolution?