Thanks for the information on o.a.l.search.spans. I was thinking of parsing the phrase query string into a sequence of terms, then constructing a phrase query object using add(Term term, int position) method in org.apache.lucene.search.PhraseQuery class. Then I can inject similar words (suggested by SpellChecker) at appropriate positions for each term as I construct the final phrase query object.
Do you agree that this should work too? On Dec 4, 2007 1:22 AM, Doron Cohen <[EMAIL PROTECTED]> wrote: > See below - > > smokey <[EMAIL PROTECTED]> wrote on 03/12/2007 05:14:23: > > > Suppose I have an index containing the terms impostor, > > imposter, fraud, and > > fruad, then presumably regardless of whether I spell impostor and fraud > > correctly, Lucene SpellChecker will offer the improperly > > spelled versions as > > corrections. This means that the phrase "The login fraud involves an > > impostor" would need to expand to: > > > > "The login fraud involves an impostor" OR "The login fruad involves an > > impostor" OR "The login fraud involves an imposter" OR "The login fruad > > involves an imposter" to cover all cases and thus find all > > possible matches. > > > > However, that feels like an aweful a lot of matches to perform > > on the index. > > A more efficient approach would be to expand the query to "The > > login (fraud > > OR fruad) involves an (impostor OR imposter)", which should be logically > > equivalent to the first (longer) query. > > > > So my question is > > (1) if others have generated the "The login (fraud OR fruad) involves an > > (impostor OR imposter)" types of queries when applying SpellChecker to a > > phrase, and agreed that this indeed performs better than the first one. > > (2) if others have observed any problems in doing so in terms > > of performance > > or anything else > > > > Any information would be appreciated. > > Lucene phrase query does not support 'sub parts'. But you may > want to look at o.a.l.search.spans. It seems that a span-near query > made of span-term queries and span-or queries, setting (max)span as > ~the length of your phrase and setting in-order=true would get > pretty close. > > About performance I hope others can comment, cause I never compared > this to phrase query. When you do try this, please tell us of any > interesting performance results! > > Regards, > Doron > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >