>-----Original Message----- >From: Johan De Meersman [mailto:vegiv...@tuxera.be] >Sent: Tuesday, May 03, 2011 5:31 AM >To: Jerry Schwartz >Cc: Jim McNeely; mysql mailing list; Johan De Meersman >Subject: Re: Join based upon LIKE > > >http://www.gedpage.com/soundex.html offers a simple explanation of what it >does. > >One possibility would be building a referential table with only a recordID >and >soundex column, unique over both; and filling that with the soundex of >individual nonjunk words. > >So, from the titles > >1 | Rain in Spain >2 | Spain's Rain > >you'd get > >1 | R500 >1 | S150 >2 | S150 >2 | R500 > >From thereon, you can see that all the same words have been used - ignoring a >lot of spelling errors like Spian. Obviously not a magic solution, but it's a >start. > [JS] Thanks.
I'm not sure that I could easily build a dictionary of non-junk words, since some of these reports have titles like "Toluene Diisocyanate Market Outlook 2008", "Toluene Market Outlook 2008", and "Toluene: 2009 World Market Outlook And Forecast (Special Crisis Edition)". I shall ponder this when I am caught up, or (more likely) in the afterlife. Regards, Jerry Schwartz Global Information Incorporated 195 Farmington Ave. Farmington, CT 06032 860.674.8796 / FAX: 860.674.8341 E-mail: je...@gii.co.jp Web site: www.the-infoshop.com >----- Original Message ----- >> From: "Jerry Schwartz" <je...@gii.co.jp> >> To: "Johan De Meersman" <vegiv...@tuxera.be> >> Cc: "Jim McNeely" <j...@newcenturydata.com>, "mysql mailing list" ><mysql@lists.mysql.com> >> Sent: Monday, 2 May, 2011 4:09:36 PM >> Subject: RE: Join based upon LIKE >> >> [JS] I've thought about using soundex(), but I'm not quite sure how. >> >> I didn't pursue it much because there are so many odd terms such as >> chemical >> names, but perhaps I should give it a try in my infinite free time. >> >> >> [JS] Thanks for your condolences. >> >> Regards, >> >> Jerry Schwartz >> Global Information Incorporated >> 195 Farmington Ave. >> Farmington, CT 06032 >> >> 860.674.8796 / FAX: 860.674.8341 >> E-mail: je...@gii.co.jp >> Web site: www.the-infoshop.com >> > >-- >Bier met grenadyn >Is als mosterd by den wyn >Sy die't drinkt, is eene kwezel >Hy die't drinkt, is ras een ezel -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/mysql?unsub=arch...@jab.org