Re: sounds like spellcheck [auf Viren geprueft]

Jonathan O'Connor Wed, 09 Feb 2005 07:46:10 -0800

Aad,
Are you trying to check the spelling of English words by Dutch children?
Then, Phonetix or any of these other solutions may not be perfect.
>From my little knowledge of Dutch, a "g" is some sort of velar fricative
(pronounced at the back of throat). And "ch" in english is also a velar
fricative.
You have to hope that the soundex/metaphone rules are broad enough to be
used by both languages.

Interesting little problem. No J2EE libraries to call, just static String
convertToSoundex(String word) to implement. Ah if only I could do more of
that sort of coding.
Ciao,
Jonathan O'Connor
XCOM Dublin

Kelvin Tan <[EMAIL PROTECTED]>
09/02/2005 13:03
Please respond to
"Lucene Users List" <lucene-user@jakarta.apache.org>

To
Lucene Users List <lucene-user@jakarta.apache.org>
cc

Subject
Re: sounds like spellcheck [auf Viren geprueft]

Hey Aad, I believe
http://jakarta.apache.org/lucene/docs/contributions.html has a link to
Phonetix (
http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html
), an LGPL-licensed lib for phonetic algorithms like Soundex, Metaphone
and DoubleMetaphone. There are Lucene adapters.

As to the suitability of the algorithms, I haven't taken a look at the
Phonetix implementation, but if
http://spottedtiger.tripod.com/D_Language/D_DoubleMetaPhone.html is
anything to go by (do a search for "dutch"), then it should meet your
needs, or at least won't be difficult to customize.

Is that what you're looking for?

k

On Wed, 09 Feb 2005 13:23:57 +0100, Aad Nales wrote:
> In my Clipper days I could build an index on English words using a
> technique that was called soundex. Searching in that index resulted
> in hits of words that sounded the same. From what i remember this
> technique only worked for English. Has it ever been generalized?
>
> What i am trying to solve is this. A customer is looking for a
> solution to spelling mistakes made by children (upto 10) when
> typing in queries. The site is Dutch. Common mistakes are 'sgool'
> when searching for 'school'. The 'normal' spellcheckers and
> suggestors typically generate a list where the 'sounds like'
> candidates' are too far away from the result. So what I am thinking
> about doing is this:
>
> 1. create a parser that takes a word and creates a soundindex entry.
>
> 2. create list of 'correctly' spelled words either based on the
> index of the website or on some kind of dictionary.
> 2a. perhaps create a n-gram index based on these words
>
> 3. accept a query, figure out that a spelling mistake has been made
> 3a find alternatives by parsing the query and searching the 'sound
> like index' and then calculate and order  the results
>
> Steps 2 and 3 have been discussed at length in this forum and have
> even made it to the sandbox. What I am left with is 1.
>
> My thinking is processing a series of replacement statements that
> go like: --
> g sounds like ch if the immediate predecessor is an s. o sounds
> like oo if the immediate predecessor is a consonant --
>
> But before I takes this to the next step I am wondering if anybody
> has created or thought up alternative solutions?
>
> Cheers,
> Aad
>
>
> --------------------------------------------------------------------
> - To unsubscribe, e-mail: lucene-user-
> [EMAIL PROTECTED] For additional commands, e-mail:
> [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

*** Aktuelle Veranstaltungen der XCOM AG ***

XCOM laedt ein zur IBM Workplace Roadshow in Frankfurt (16.02.2005), 
Duesseldorf (23.02.2005) und Berlin (02.03.2005)
Anmeldung und Information unter http://lotus.xcom.de/events

Workshop-Reihe "Mobilisierung von Lotus Notes Applikationen"  in Frankfurt 
(17.02.2005), Duesseldorf (24.02.2005) und Berlin (05.03.2005)
Anmeldung und Information unter http://lotus.xcom.de/events

*** XCOM AG Legal Disclaimer ***

Diese E-Mail einschliesslich ihrer Anhaenge ist vertraulich und ist allein fur 
den Gebrauch durch den vorgesehenen Empfaenger bestimmt. Dritten ist das Lesen, 
Verteilen oder Weiterleiten dieser E-Mail untersagt. Wir bitten, eine 
fehlgeleitete E-Mail unverzueglich vollstaendig zu loeschen und uns eine 
Nachricht zukommen zu lassen.

This email may contain material that is confidential and for the sole use of 
the intended recipient. Any review, distribution by others or forwarding 
without express permission is strictly prohibited. If you are not the intended 
recipient, please contact the sender and delete all copies.

Re: sounds like spellcheck [auf Viren geprueft]

Reply via email to