Re: string-likeness

2013-06-03 Thread Michael Dykman
I will second Rick's approach and have implemented something very
similar for a client when soundex feel short of expectation.  It
worked very well.

On Mon, Jun 3, 2013 at 5:43 PM, Rick James  wrote:
> Soundex is the 'right' approach, but it needs improvement.  So, find an 
> improvement, then do something like this...
> Store the Soundex value in a column of its own, INDEX that column, and JOIN 
> on that column using "=".  Thus, ...
> * You have spent the effort to convert to Soundex once, not on every call.
> * Multiple strings will have the same Soundex, but generally not many will 
> have the same.  Hence, the JOIN won't be 1:1, but rather some small number.
>
> Other approaches (eg, Levenshtein) need both strings in the computation.  It 
> _may_ be possible to work around that by the following.
> Let's say you wanted to a "match" if
> * one letter was dropped or added or changed, or
> * one pair of adjacent letters was swapped.
> Then...  For a N-letter word, store N+1 rows:
> * The word, as is,
> * The N words, each shortened by one letter.
> Then an equal match on that hacked column will catch single 
> dropped/added/changed letter with only N+1 matches.
> (Minor note:  doubled letters make the count less than N+1.)
>
>> -Original Message-
>> From: h...@tbbs.net [mailto:h...@tbbs.net]
>> Sent: Monday, June 03, 2013 8:30 AM
>> To: mysql@lists.mysql.com
>> Subject: string-likeness
>>
>> I wish to join two tables on likeness, not equality, of character strings.
>> Soundex does not work. I am using the Levenstein edit distance, written in
>> SQL, a very costly test, and I am in no position to write it in C and link
>> it to MySQL--and joining on equality takes a fraction of a second, and
>> this takes hours. Any good ideas?
>>
>>
>> --
>> MySQL General Mailing List
>> For list archives: http://lists.mysql.com/mysql
>> To unsubscribe:http://lists.mysql.com/mysql
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/mysql
>



-- 
 - michael dykman
 - mdyk...@gmail.com

 May the Source be with you.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



RE: string-likeness

2013-06-03 Thread Rick James
Soundex is the 'right' approach, but it needs improvement.  So, find an 
improvement, then do something like this...
Store the Soundex value in a column of its own, INDEX that column, and JOIN on 
that column using "=".  Thus, ...
* You have spent the effort to convert to Soundex once, not on every call.
* Multiple strings will have the same Soundex, but generally not many will have 
the same.  Hence, the JOIN won't be 1:1, but rather some small number.

Other approaches (eg, Levenshtein) need both strings in the computation.  It 
_may_ be possible to work around that by the following.
Let's say you wanted to a "match" if
* one letter was dropped or added or changed, or
* one pair of adjacent letters was swapped.
Then...  For a N-letter word, store N+1 rows:
* The word, as is,
* The N words, each shortened by one letter.
Then an equal match on that hacked column will catch single 
dropped/added/changed letter with only N+1 matches.
(Minor note:  doubled letters make the count less than N+1.)

> -Original Message-
> From: h...@tbbs.net [mailto:h...@tbbs.net]
> Sent: Monday, June 03, 2013 8:30 AM
> To: mysql@lists.mysql.com
> Subject: string-likeness
> 
> I wish to join two tables on likeness, not equality, of character strings.
> Soundex does not work. I am using the Levenstein edit distance, written in
> SQL, a very costly test, and I am in no position to write it in C and link
> it to MySQL--and joining on equality takes a fraction of a second, and
> this takes hours. Any good ideas?
> 
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/mysql


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: string-likeness

2013-06-03 Thread Hartmut Holzgraefe
On 03.06.2013 17:29, h...@tbbs.net wrote:
> I wish to join two tables on likeness, not equality, of character strings. 
> Soundex does not work. I am using the Levenstein edit distance, written in 
> SQL, a very costly test, and I am in no position to write it in C and link it 
> to MySQL--and joining on equality takes a fraction of a second, and this 
> takes hours. Any good ideas?


equality checks have a linear cost of O(min(len1,len2)) and can make
use of indexes, too, while Levenshtein cost is is almost quadratic
O(len1*len2) and can't make any good use of indexes ... even using
a C UDF would help only so far with this kind of complexity. It will
increase performance by a constant factor, but given long enough
input strings the len1*len2 factor will still account for the majority
of the run time increase over simple equality comparions

there are a few possible points of optimization though, first of all
you can cut off equal start and end sequences (linear complexity for
that part instead of quadratic). You can also add a few more tricks
if you are only interested in matches below a certain distance threshold:

* if string lengths differ by more than the threshold value you can
  rule out this pair of strings as being "similar" right away

* while iterating over the distance array keep track of the min.
  distance value of the current row ... if at the end of a row
  is larger than the threshold distance you can terminate right away

* only calculate operation cost, not operation type

* do not maintain a full len1*len2 array, having only the previous
  and current row in two one dimensional arrays is sufficient
  (this esp. helps in C implementation as the functions working set
  is more likely to fit into CPU caches)

...

-- 
Hartmut Holzgraefe 
Principal Support Engineer (EMEA)
SkySQL AB - http://www.skysql.com/

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: string-likeness

2013-06-03 Thread Johan De Meersman
- Original Message -
> From: h...@tbbs.net
> 
> I wish to join two tables on likeness, not equality, of character
> strings. Soundex does not work. I am using the Levenstein edit
> distance, written in SQL, a very costly test, and I am in no
> position to write it in C and link it to MySQL--and joining on
> equality takes a fraction of a second, and this takes hours. Any
> good ideas?

Looks like Sphinx might be a helpful addition to your stack. Way too much to 
explain how to do so here, but it's a proper fulltext engine, and has a 
MySQL-compatible interface.


-- 
Unhappiness is discouraged and will be corrected with kitten pictures.

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



string-likeness

2013-06-03 Thread hsv
I wish to join two tables on likeness, not equality, of character strings. 
Soundex does not work. I am using the Levenstein edit distance, written in SQL, 
a very costly test, and I am in no position to write it in C and link it to 
MySQL--and joining on equality takes a fraction of a second, and this takes 
hours. Any good ideas?


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



ANN: Hopper - Stored Routine debugger - v1.3.0 released

2013-06-03 Thread Martijn Tonies

ANN: Hopper - Stored Routine debugger - v1.3.0 released



Dear ladies and gentlemen,

Upscene Productions is proud to announce version 1.3.0 of our
product called "Hopper".

Hopper is a Windows-based Stored Routine and Trigger Debugger,
available for InterBase, Firebird and MySQL.

For more information, see http://www.upscene.com/go/?go=hopper


This release contains bugfixes and improvements, see
http://www.upscene.com/go/?go=news&id=20130603

Bugfixes include, but not limited to:
- possible "data type not supported" on specific (param) IS NULL constructor 
in Firebird or InterBase

- (INSERT INTO ...) RETURNING-clause not working in Firebird version
- assigning values to output parameters fails using Trace Into in MySQL
- possible errors on using built in function in SQL queries in MySQL


With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql