Soundex is the 'right' approach, but it needs improvement. So, find an improvement, then do something like this... Store the Soundex value in a column of its own, INDEX that column, and JOIN on that column using "=". Thus, ... * You have spent the effort to convert to Soundex once, not on every call. * Multiple strings will have the same Soundex, but generally not many will have the same. Hence, the JOIN won't be 1:1, but rather some small number.
Other approaches (eg, Levenshtein) need both strings in the computation. It _may_ be possible to work around that by the following. Let's say you wanted to a "match" if * one letter was dropped or added or changed, or * one pair of adjacent letters was swapped. Then... For a N-letter word, store N+1 rows: * The word, as is, * The N words, each shortened by one letter. Then an equal match on that hacked column will catch single dropped/added/changed letter with only N+1 matches. (Minor note: doubled letters make the count less than N+1.) > -----Original Message----- > From: h...@tbbs.net [mailto:h...@tbbs.net] > Sent: Monday, June 03, 2013 8:30 AM > To: mysql@lists.mysql.com > Subject: string-likeness > > I wish to join two tables on likeness, not equality, of character strings. > Soundex does not work. I am using the Levenstein edit distance, written in > SQL, a very costly test, and I am in no position to write it in C and link > it to MySQL--and joining on equality takes a fraction of a second, and > this takes hours. Any good ideas? > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql