RE: string-likeness

Rick James Mon, 03 Jun 2013 14:46:57 -0700

Soundex is the 'right' approach, but it needs improvement.  So, find an 
improvement, then do something like this...
Store the Soundex value in a column of its own, INDEX that column, and JOIN on 
that column using "=".  Thus, ...
* You have spent the effort to convert to Soundex once, not on every call.
* Multiple strings will have the same Soundex, but generally not many will have 
the same.  Hence, the JOIN won't be 1:1, but rather some small number.


Other approaches (eg, Levenshtein) need both strings in the computation.  It 
_may_ be possible to work around that by the following.
Let's say you wanted to a "match" if
* one letter was dropped or added or changed, or
* one pair of adjacent letters was swapped.
Then...  For a N-letter word, store N+1 rows:
* The word, as is,
* The N words, each shortened by one letter.
Then an equal match on that hacked column will catch single 
dropped/added/changed letter with only N+1 matches.
(Minor note:  doubled letters make the count less than N+1.)

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Monday, June 03, 2013 8:30 AM
> To: [email protected]
> Subject: string-likeness
> 
> I wish to join two tables on likeness, not equality, of character strings.
> Soundex does not work. I am using the Levenstein edit distance, written in
> SQL, a very costly test, and I am in no position to write it in C and link
> it to MySQL--and joining on equality takes a fraction of a second, and
> this takes hours. Any good ideas?
> 
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

RE: string-likeness

Reply via email to