Suppose I have an index containing the terms impostor, imposter, fraud, and
fruad, then presumably regardless of whether I spell impostor and fraud
correctly, Lucene SpellChecker will offer the improperly spelled versions as
corrections. This means that the phrase "The login fraud involves an
impostor" would need to expand to:

"The login fraud involves an impostor" OR "The login fruad involves an
impostor" OR "The login fraud involves an imposter" OR "The login fruad
involves an imposter" to cover all cases and thus find all possible matches.

However, that feels like an aweful a lot of matches to perform on the index.
A more efficient approach would be to expand the query to "The login (fraud
OR fruad) involves an (impostor OR imposter)", which should be logically
equivalent to the first (longer) query.

So my question is
(1) if others have generated the "The login (fraud OR fruad) involves an
(impostor OR imposter)" types of queries when applying SpellChecker to a
phrase, and agreed that this indeed performs better than the first one.
(2) if others have observed any problems in doing so in terms of performance
or anything else

Any information would be appreciated.

Reply via email to