Re: Getting a query parameter in a TokenFilter

2013-09-22 Thread Trey Grainger
Hi Isaac,

In the process of writing Solr in Action (http://solrinaction.com), I have
built the solution to SOLR-5053 for the multilingual search chapter (I
didn't realize this ticket existed at the time).  The solution was
something I called a MultiTextField.  Essentially, the field let's you
map a list of defined pre-fixes to field types and dynamically substitute
in one or more field types based upon the incoming content.

For example:

#schema.xml#
 fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french/

fieldType name=text_english ... /
fieldType name=text_spanish ... /
fieldType name=text_french ... /

field name=content type=multiText indexed=true ... /
#document#
adddoc
  field name=id1/field
  field name=contenten,es|the schools, la escuala/field
/doc/add

#Outputted Token Stream#:
[Position 1]   [Position 2]   [Position 3] [Position 4]
 the   school   la
escuela
 schools
escuel

#query on two languages#
q=en,es|la OR en,es|escuela

 Essentially, this MultiText field type lets you dynamically combine one or
more Analyzers (from a defined field type) and stack the tokens based upon
term positions within each independent Analyzer.  The use case here was
multiple

To answer your original question... at query time, this implementation
requires that you pass the prefix before EACH term in the query, not just
the first term (you can see this in the q= I demonstrated above).  If you
have a Token Filter you have developed, you could probably accomplish
what you are trying to do the same way.

You could write a custom QParserPlugin that would do this for you I think.
 Alternatively, it may be possible to create a similar implementation that
makes use of a dynamic field name (i.e.  content|en,fr as the field
name), which would pull the prefix from the field name and apply it to all
tokens instead of requiring/allowing each token to specify it's own prefix.
 I haven't done this in my implementation, but I could see where it might
be more user-friendly for many Solr users.

I'm just finishing up the multilingual search chapter and code now and
will be happy to post it to SOLR-5053 once I finish in the next few days if
this would be helpful to you.

-Trey


On Sat, Sep 21, 2013 at 4:15 PM, Isaac Hebsh isaac.he...@gmail.com wrote:

 Thought about that again,
 We can do this work as a search component, manipulating the query string.
 The cons are the double QParser work, and the double tokenization work.

 Another approach which might solve this issue easily is Dynamic query
 analyze chain: https://issues.apache.org/jira/browse/SOLR-5053

 What would you do?


 On Tue, Sep 17, 2013 at 10:31 PM, Isaac Hebsh isaac.he...@gmail.com
 wrote:

  Hi everyone,
 
  We developed a TokenFilter.
  It should act differently, depends on a parameter supplied in the
  query (for query chain only, not the index one, of course).
  We found no way to pass that parameter into the TokenFilter flow. I guess
  that the root cause is because TokenFilter is a pure lucene object.
 
  As a last resort, we tried to pass the parameter as the first term in the
  query text (q=...), and save it as a member of the TokenFilter instance.
 
  Although it is ugly, it might work fine.
  But, the problem is that it is not guaranteed that all the terms of a
  particular query will be analyzed by the same instance of a TokenFilter.
 In
  this case, some terms will be analyzed without the required information
 of
  that parameter. We can produce such a race very easily.
 
  How should I overcome this issue?
  Do anyone have a better resolution?
 



Re: Getting a query parameter in a TokenFilter

2013-09-21 Thread Isaac Hebsh
Thought about that again,
We can do this work as a search component, manipulating the query string.
The cons are the double QParser work, and the double tokenization work.

Another approach which might solve this issue easily is Dynamic query
analyze chain: https://issues.apache.org/jira/browse/SOLR-5053

What would you do?


On Tue, Sep 17, 2013 at 10:31 PM, Isaac Hebsh isaac.he...@gmail.com wrote:

 Hi everyone,

 We developed a TokenFilter.
 It should act differently, depends on a parameter supplied in the
 query (for query chain only, not the index one, of course).
 We found no way to pass that parameter into the TokenFilter flow. I guess
 that the root cause is because TokenFilter is a pure lucene object.

 As a last resort, we tried to pass the parameter as the first term in the
 query text (q=...), and save it as a member of the TokenFilter instance.

 Although it is ugly, it might work fine.
 But, the problem is that it is not guaranteed that all the terms of a
 particular query will be analyzed by the same instance of a TokenFilter. In
 this case, some terms will be analyzed without the required information of
 that parameter. We can produce such a race very easily.

 How should I overcome this issue?
 Do anyone have a better resolution?



Getting a query parameter in a TokenFilter

2013-09-17 Thread Isaac Hebsh
Hi everyone,

We developed a TokenFilter.
It should act differently, depends on a parameter supplied in the
query (for query chain only, not the index one, of course).
We found no way to pass that parameter into the TokenFilter flow. I guess
that the root cause is because TokenFilter is a pure lucene object.

As a last resort, we tried to pass the parameter as the first term in the
query text (q=...), and save it as a member of the TokenFilter instance.

Although it is ugly, it might work fine.
But, the problem is that it is not guaranteed that all the terms of a
particular query will be analyzed by the same instance of a TokenFilter. In
this case, some terms will be analyzed without the required information of
that parameter. We can produce such a race very easily.

How should I overcome this issue?
Do anyone have a better resolution?