Hey Aad, I believe http://jakarta.apache.org/lucene/docs/contributions.html has a link to Phonetix (http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html), an LGPL-licensed lib for phonetic algorithms like Soundex, Metaphone and DoubleMetaphone. There are Lucene adapters.
As to the suitability of the algorithms, I haven't taken a look at the Phonetix implementation, but if http://spottedtiger.tripod.com/D_Language/D_DoubleMetaPhone.html is anything to go by (do a search for "dutch"), then it should meet your needs, or at least won't be difficult to customize. Is that what you're looking for? k On Wed, 09 Feb 2005 13:23:57 +0100, Aad Nales wrote: > In my Clipper days I could build an index on English words using a > technique that was called soundex. Searching in that index resulted > in hits of words that sounded the same. From what i remember this > technique only worked for English. Has it ever been generalized? > > What i am trying to solve is this. A customer is looking for a > solution to spelling mistakes made by children (upto 10) when > typing in queries. The site is Dutch. Common mistakes are 'sgool' > when searching for 'school'. The 'normal' spellcheckers and > suggestors typically generate a list where the 'sounds like' > candidates' are too far away from the result. So what I am thinking > about doing is this: > > 1. create a parser that takes a word and creates a soundindex entry. > > 2. create list of 'correctly' spelled words either based on the > index of the website or on some kind of dictionary. > 2a. perhaps create a n-gram index based on these words > > 3. accept a query, figure out that a spelling mistake has been made > 3a find alternatives by parsing the query and searching the 'sound > like index' and then calculate and order the results > > Steps 2 and 3 have been discussed at length in this forum and have > even made it to the sandbox. What I am left with is 1. > > My thinking is processing a series of replacement statements that > go like: -- > g sounds like ch if the immediate predecessor is an s. o sounds > like oo if the immediate predecessor is a consonant -- > > But before I takes this to the next step I am wondering if anybody > has created or thought up alternative solutions? > > Cheers, > Aad > > > -------------------------------------------------------------------- > - To unsubscribe, e-mail: lucene-user- >[EMAIL PROTECTED] For additional commands, e-mail: >[EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]