Re: dash-words

Chris Hostetter Wed, 02 Aug 2006 13:05:34 -0700

: with a query like this +arbeiterjugend +west-berlin I get no results.
:
: org.apache.lucene.queryParser.QueryParser.parse makes this query (with
: WordDelimiterFilter) with Default QueryParser.AND_OPERATOR:
:
: +titel:arbeiterjugend +titel:"west (berlin westberlin)"
:
: with +arbeiterjugend +westberlin I get the result.
:
: It seems that the synonyms don't work with the query. How do you solve
: this in Solr? Do I have to build a TermQuery?


First off, when using WordDelimiterFilter it's generally a good idea to
use a slightly differnet configuration of the Filter in your indexng
analyzer then in your query analyzer -- this is discussed a bit inthe
wiki...

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089
http://wiki.apache.org/solr/SolrRelevancyCookbook#head-353fcfa33e5c4a0a5959aa3d8d33c5a3a61f2683

...this can help avoid situations like you describe.

but in general, what you are running into is a general constraint of the
way Analyzers can produce tokens with a "zero gap" indicating that they
occupy the same spot as the previous token, but there is no way for the
analyzer to indicate that a sequence of 1 or more tokens occupies the same
space as another sequence of 1 or more tokens.  so when QUeryParser asks
the analyzer to make a token stream out of "west-berlin" the analyzer has
no way to return a token stream that can easily be recognized as [ [[west]
[berlin]] or [westberlin] ].

this does in fact prove to be a large problem when dealing with "multi
word synonyms" (also discussed in the wiki mentioned above) but can
generally be dealt with in the WordDeliminterFilter.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: dash-words

Reply via email to