Aad Nales wrote:

Hi All,

Before I start reinventing wheels I would like to do a short check to
see if anybody else has already tried this. A customer has requested us
to look into the possibility to perform a spell check on queries. So far
the most promising way of doing this seems to be to create an Analyzer
based on the spellchecker of OpenOffice. My question is: "has anybody
tried this before?"

I did a WordNet/synonym query expander. Search for "WordNet" on this page. Of interest is it stores the Wordnet info in a separate Lucene index as at its essence all an index is is a database.


http://jakarta.apache.org/lucene/docs/lucene-sandbox/

Also, another variation, is to instead spell based on what terms are in the index, not what an external dictionary says. I've done this on my experimental site searchmorph.com in a dumb/inefficient way. Here's an example:

http://www.searchmorph.com/kat/search.jsp?s=recursivz

After you click above it takes ~10sec as it produces terms close to "recursivz". Opps - looking at the output, it looks like the same word is suggest multiple times - ouch - I must be considering all fields, not just the contents field. TBD is fixing this. (or no wonder it's so slow :))

I can/should send the code out. The logic is that for any terms in a query that have zero matches, go thru all the terms(!) and calculate the Levenshtein string distance, and return the best matches. A more intelligent way of doing this is to instead look for terms that also match on the 1st "n" (prob 3) chars.





Cheers, Aad


--
Aad Nales
[EMAIL PROTECTED], +31-(0)6 54 207 340




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to