Have you made any progress? Since the AnalyzingQueryParser doesn't inherit
from QParserPlugin solr doesn't want to use it but I guess we could implement a
similar parser that does inherit from QParserPlugin?
Switching parser seems to be what is needed? Has really no one solved this
before?
- Kári
- Original Message -
From: Matti Oinas matti.oi...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tuesday, 11 January, 2011 12:47:52 PM
Subject: Re: solr wildcard queries and analyzers
This might be the solution.
http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
2011/1/11 Matti Oinas matti.oi...@gmail.com:
Sorry, the message was not meant to be sent here. We are struggling
with the same problem here.
2011/1/11 Matti Oinas matti.oi...@gmail.com:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers
On wildcard and fuzzy searches, no text analysis is performed on the
search word.
2011/1/11 Kári Hreinsson k...@gagnavarslan.is:
Hi,
I am having a problem with the fact that no text analysis are performed on
wildcard queries. I have the following field type (a bit simplified):
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.TrimFilterFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.ASCIIFoldingFilterFactory /
/analyzer
/fieldType
My problem has to do with Icelandic characters, when I index a document
with a text field including the word sjálfsögðu it gets indexed as
sjalfsogdu (because of the ASCIIFoldingFilterFactory which replaces the
Icelandic characters with their English equivalents). Then, when I search
(without a wildcard) for sjálfsögðu or sjalfsogdu I get that document
as a result. This is convenient since it enables people to search without
using accented characters and yet get the results they want (e.g. if they
are working on computers with English keyboards).
However this all falls apart when using wildcard searches, then the search
string isn't passed through the filters, and even if I search for sjálf*
I don't get any results because the index doesn't contain the original
words (I get result if I search for sjalf*). I know people have been
having a similar problem with the case sensitivity of wildcard queries and
most often the solution seems to be to lowercase the string before passing
it on to solr, which is not exactly an optimal solution (yet a simple one
in that case). The Icelandic characters complicate things a bit and
applying the same solution (doing the lowercasing and character mapping) in
my application seems like unnecessary duplication of code already part of
solr, not to mention complication of my application and possible
maintenance down the road.
Is there any way around this? How are people solving this? Is there a way
to apply the filters to wildcard queries? I guess removing the
ASCIIFoldingFilterFactory is the simplest solution but this
normalization (of the text done by the filter) is often very useful.
I hope I'm not overlooking some obvious explanation. :/
Thanks in advance,
Kári Hreinsson