Hey Aad, I believe http://jakarta.apache.org/lucene/docs/contributions.html has 
a link to Phonetix 
(http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html),
 an LGPL-licensed lib for phonetic algorithms like Soundex, Metaphone and 
DoubleMetaphone. There are Lucene adapters.

As to the suitability of the algorithms, I haven't taken a look at the Phonetix 
implementation, but if 
http://spottedtiger.tripod.com/D_Language/D_DoubleMetaPhone.html is anything to 
go by (do a search for "dutch"), then it should meet your needs, or at least 
won't be difficult to customize.

Is that what you're looking for?

k

On Wed, 09 Feb 2005 13:23:57 +0100, Aad Nales wrote:
> In my Clipper days I could build an index on English words using a
> technique that was called soundex. Searching in that index resulted
> in hits of words that sounded the same. From what i remember this
> technique only worked for English. Has it ever been generalized?
>
> What i am trying to solve is this. A customer is looking for a
> solution to spelling mistakes made by children (upto 10) when
> typing in queries. The site is Dutch. Common mistakes are 'sgool'
> when searching for 'school'. The 'normal' spellcheckers and
> suggestors typically generate a list where the 'sounds like'
> candidates' are too far away from the result. So what I am thinking
> about doing is this:
>
> 1. create a parser that takes a word and creates a soundindex entry.
>
> 2. create list of 'correctly' spelled words either based on the
> index of the website or on some kind of dictionary.
> 2a. perhaps create a n-gram index based on these words
>
> 3. accept a query, figure out that a spelling mistake has been made
> 3a find alternatives by parsing the query and searching the 'sound
> like index' and then calculate and order  the results
>
> Steps 2 and 3 have been discussed at length in this forum and have
> even made it to the sandbox. What I am left with is 1.
>
> My thinking is processing a series of replacement statements that
> go like: --
> g sounds like ch if the immediate predecessor is an s. o sounds
> like oo if the immediate predecessor is a consonant --
>
> But before I takes this to the next step I am wondering if anybody
> has created or thought up alternative solutions?
>
> Cheers,
> Aad
>
>
> --------------------------------------------------------------------
> - To unsubscribe, e-mail: lucene-user-
>[EMAIL PROTECTED] For additional commands, e-mail:
>[EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to