Re: wildcards, stemming and searching

Erik Hatcher Thu, 10 Feb 2005 07:55:52 -0800

How would you deal with a query like "a*z" though?

I suspect, however, that you only care about suffix queries and stemming those. If thats the case, then you could subclass getWildcardQuery and do internal stemming (remove trailing wildcard, run it through the analyzer directly there and return a modified WildcardQuery instance.

With wildcard queries though, this is risky. Prefixes won't necessarily stem to what the full word would stem to.

        Erik


On Feb 9, 2005, at 6:26 PM, aaz wrote:

Hi,
We are not using QueryParser and have some custom Query construction.
We have an index that indexes various documents. Each document is Analyzed and indexed via

StandardTokenizer() ->StandardFilter() -> LowercaseFilter() -> StopFilter() -> PorterStemFilter()

We also want to support wildcard queries, hence on an inbound query we need to deal with "*" in the value side of the comparison. We also need to "analyze" the value side of the query against the same analyzer in which the index was built with. This leads to some problems and would like your solution opinion.
User queries.
somefield = united*
After the analyzer hits "united*", we get back "unit". Hence we cannot detect that the user requested a wildcard.

Lets say we come up with some solution to "escape" the "*" char before the Analyzer hits it. For example
somefield = united*  -> unitedXXWILDCARDXX
After analysis this then becomes "unitedxxwildcardxx", which we can then turn into a WildcardQuery "united*"

The problem here is that the term "united" will never exist in the indexing due to the stemming which did not stem properly due to our escape mechanism.
How can I solve this problem?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: wildcards, stemming and searching

Reply via email to