>-----Original Message-----
>From: Johan De Meersman [mailto:vegiv...@tuxera.be]
>Sent: Tuesday, May 03, 2011 5:31 AM
>To: Jerry Schwartz
>Cc: Jim McNeely; mysql mailing list; Johan De Meersman
>Subject: Re: Join based upon LIKE
>
>
>http://www.gedpage.com/soundex.html offers a simple explanation of what it
>does.
>
>One possibility would be building a referential table with only a recordID 
>and
>soundex column, unique over both; and filling that with the soundex of
>individual nonjunk words.
>
>So, from the titles
>
>1 | Rain in Spain
>2 | Spain's Rain
>
>you'd get
>
>1 | R500
>1 | S150
>2 | S150
>2 | R500
>
>From thereon, you can see that all the same words have been used - ignoring a
>lot of spelling errors like Spian. Obviously not a magic solution, but it's a
>start.
>
[JS] Thanks.

I'm not sure that I could easily build a dictionary of non-junk words, since 
some of these reports have titles like "Toluene Diisocyanate Market Outlook 
2008", "Toluene Market Outlook 2008", and "Toluene: 2009 World Market Outlook 
And Forecast (Special Crisis Edition)".

I shall ponder this when I am caught up, or (more likely) in the afterlife.

Regards,

Jerry Schwartz
Global Information Incorporated
195 Farmington Ave.
Farmington, CT 06032

860.674.8796 / FAX: 860.674.8341
E-mail: je...@gii.co.jp
Web site: www.the-infoshop.com

>----- Original Message -----
>> From: "Jerry Schwartz" <je...@gii.co.jp>
>> To: "Johan De Meersman" <vegiv...@tuxera.be>
>> Cc: "Jim McNeely" <j...@newcenturydata.com>, "mysql mailing list"
><mysql@lists.mysql.com>
>> Sent: Monday, 2 May, 2011 4:09:36 PM
>> Subject: RE: Join based upon LIKE
>>
>> [JS] I've thought about using soundex(), but I'm not quite sure how.
>>
>> I didn't pursue it much because there are so many odd terms such as
>> chemical
>> names, but perhaps I should give it a try in my infinite free time.
>>
>>
>> [JS] Thanks for your condolences.
>>
>> Regards,
>>
>> Jerry Schwartz
>> Global Information Incorporated
>> 195 Farmington Ave.
>> Farmington, CT 06032
>>
>> 860.674.8796 / FAX: 860.674.8341
>> E-mail: je...@gii.co.jp
>> Web site: www.the-infoshop.com
>>
>
>--
>Bier met grenadyn
>Is als mosterd by den wyn
>Sy die't drinkt, is eene kwezel
>Hy die't drinkt, is ras een ezel




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql?unsub=arch...@jab.org

Reply via email to