I use the soundex algorithm to generate a 4 characters fingerprint of the
phonetic sound of a string, then I create an index over that field and
serch for it.


But soundex has an "issue"... the soundex fingerprint should be the same
for victor than for bictor, but since the soundex index uses the first
letter as part of the fingerprint, you would get something like:

Soundex of victor = V1234
Soundex of bictor = B1234

So I add a "reverse index" , I reverse the chain and obtain its phonetic
fingerprint.
Soundex of rotciv = R1234
Soundex of rotcib = R1234



Then querys like
select * from lotoftextdata where 'soundex'="V1234" or 'xednuous'="R1234"

This cannot be done at db layer, must be done at app layer.

will work like a charm and very fast.
Yes it need a lot of work for setting it up, but it worth it.
Also is much more "flexible" and "precise" than looking for %pattern% when
you are dealing with human data as names, street address, locations, etc...
There are other phonetic algorithms, much better, but soundex is very easy
to implement.


BTW, php has soundex function built in and there is a soundex java class...


http://en.wikipedia.org/wiki/Soundex

Reply via email to