RE: SQL Name Sounds Like Matching
I've done this in perl, but not cf look here http://www.oreilly.com/catalog/maperl/toc.html Chapter 9 or 10 i think Check out here too http://www.uta.fi/~ccjapu/Handson2.html WG -Original Message- From: Haggerty, Mike [mailto:[EMAIL PROTECTED] Sent: 27 March 2003 19:34 To: CF-Talk Subject: OT: SQL Name Sounds Like Matching I have a feeling I am going to be working on this one for a while... One of my more demanding clients is asking for a name matching solution on the cheap. What it needs to do is find where a name 'sounds like' another name, even if it is in another language (including Middle Eastern and Oriental names). In addition, I need to be able to eliminate false positives wherever possible, in order to come up with the most consise list of matches. I really don't know where to start. SQL Server and Oracle both offer soundex support, which I suppose could be used to generate some values for comparisons. But I am not sure how this would work when different languages come into play. Does anyone have / know of any books / resources / advice on how to do this? M ~| Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4 Subscription: http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4 FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Your ad could be here. Monies from ads go to support these lists and provide more resources for the community. http://www.fusionauthority.com/ads.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: SQL Name Sounds Like Matching
You say on the cheap. Well that means they get whatever's standard with no customisation, right? That's what soundex is. It may not be great, but it's certainly easy. Demanding clients with a tight budget really need to be dealt with firmly. They're business killers. The thing is that phonetic tokenisation algorithms are language based, i.e. soundex is designed to work for English. However, it would be by far the easiest to implement. There's a udf here: http://cflib.org/udf.cfm?ID=39 . There are a few variations on Soundex which could be investigated. There's also metaphone which is a bit more complex to implement. I would store the soundex code for each name in the database, with an index on this column, and then use the soundex code of the input data as the search key. In this way it will be just as quick as a regular search and you can do it whether or not the dbms supports Soundex. Combined with regular searching perhaps with typo support from verity, and that would have to be way more than a budget client could expect. Matthew Walker Electric Sheep Web http://www.electricsheep.co.nz/ - Original Message - From: Haggerty, Mike [EMAIL PROTECTED] To: CF-Talk [EMAIL PROTECTED] Sent: Friday, March 28, 2003 7:34 AM Subject: OT: SQL Name Sounds Like Matching I have a feeling I am going to be working on this one for a while... One of my more demanding clients is asking for a name matching solution on the cheap. What it needs to do is find where a name 'sounds like' another name, even if it is in another language (including Middle Eastern and Oriental names). In addition, I need to be able to eliminate false positives wherever possible, in order to come up with the most consise list of matches. I really don't know where to start. SQL Server and Oracle both offer soundex support, which I suppose could be used to generate some values for comparisons. But I am not sure how this would work when different languages come into play. Does anyone have / know of any books / resources / advice on how to do this? M ~| Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4 Subscription: http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4 FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Signup for the Fusion Authority news alert and keep up with the latest news in ColdFusion and related topics. http://www.fusionauthority.com/signup.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
RE: SQL Name Sounds Like Matching
Matthew; Yeah, I'd like to give them the standard with no customization, period. Something tells me that's not going to happen. Youre message is very helpful. I hadn't thought about using Verity for this, but typo support would be great. My only issue with Soundex is with foreign names, and I can expect over a million of them from over 50 countries. Is there a way (meaning, has anyone already worked this out somehow) of compensating for international variations in pronounciation? Do you know of a cyrillic version of soundex? Something that does Greek, Russian, Hebrew? Thanks, M -Original Message- From: Matthew Walker [mailto:[EMAIL PROTECTED] Sent: Thursday, March 27, 2003 3:20 PM To: CF-Talk Subject: Re: SQL Name Sounds Like Matching You say on the cheap. Well that means they get whatever's standard with no customisation, right? That's what soundex is. It may not be great, but it's certainly easy. Demanding clients with a tight budget really need to be dealt with firmly. They're business killers. The thing is that phonetic tokenisation algorithms are language based, i.e. soundex is designed to work for English. However, it would be by far the easiest to implement. There's a udf here: http://cflib.org/udf.cfm?ID=39 . There are a few variations on Soundex which could be investigated. There's also metaphone which is a bit more complex to implement. I would store the soundex code for each name in the database, with an index on this column, and then use the soundex code of the input data as the search key. In this way it will be just as quick as a regular search and you can do it whether or not the dbms supports Soundex. Combined with regular searching perhaps with typo support from verity, and that would have to be way more than a budget client could expect. Matthew Walker Electric Sheep Web http://www.electricsheep.co.nz/ ~| Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4 Subscription: http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4 FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Your ad could be here. Monies from ads go to support these lists and provide more resources for the community. http://www.fusionauthority.com/ads.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
Re: SQL Name Sounds Like Matching
Well there's Daitch-Mokotoff Soundex which adds better support for Hebrew. http://www.avotaynu.com/soundex.html - Original Message - From: Haggerty, Mike [EMAIL PROTECTED] To: CF-Talk [EMAIL PROTECTED] Sent: Friday, March 28, 2003 8:33 AM Subject: RE: SQL Name Sounds Like Matching Matthew; Yeah, I'd like to give them the standard with no customization, period. Something tells me that's not going to happen. Youre message is very helpful. I hadn't thought about using Verity for this, but typo support would be great. My only issue with Soundex is with foreign names, and I can expect over a million of them from over 50 countries. Is there a way (meaning, has anyone already worked this out somehow) of compensating for international variations in pronounciation? Do you know of a cyrillic version of soundex? Something that does Greek, Russian, Hebrew? Thanks, M -Original Message- From: Matthew Walker [mailto:[EMAIL PROTECTED] Sent: Thursday, March 27, 2003 3:20 PM To: CF-Talk Subject: Re: SQL Name Sounds Like Matching You say on the cheap. Well that means they get whatever's standard with no customisation, right? That's what soundex is. It may not be great, but it's certainly easy. Demanding clients with a tight budget really need to be dealt with firmly. They're business killers. The thing is that phonetic tokenisation algorithms are language based, i.e. soundex is designed to work for English. However, it would be by far the easiest to implement. There's a udf here: http://cflib.org/udf.cfm?ID=39 . There are a few variations on Soundex which could be investigated. There's also metaphone which is a bit more complex to implement. I would store the soundex code for each name in the database, with an index on this column, and then use the soundex code of the input data as the search key. In this way it will be just as quick as a regular search and you can do it whether or not the dbms supports Soundex. Combined with regular searching perhaps with typo support from verity, and that would have to be way more than a budget client could expect. Matthew Walker Electric Sheep Web http://www.electricsheep.co.nz/ ~| Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4 Subscription: http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4 FAQ: http://www.thenetprofits.co.uk/coldfusion/faq This list and all House of Fusion resources hosted by CFHosting.com. The place for dependable ColdFusion Hosting. Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
RE: SQL Name Sounds Like Matching
WOW. That is a pretty cool system, and I could see how that could be used in a variety of situations. Thanks for the link. M -Original Message- From: Matthew Walker [mailto:[EMAIL PROTECTED] Sent: Thursday, March 27, 2003 3:41 PM To: CF-Talk Subject: Re: SQL Name Sounds Like Matching Well there's Daitch-Mokotoff Soundex which adds better support for Hebrew. http://www.avotaynu.com/soundex.html ~| Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4 Subscription: http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4 FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Structure your ColdFusion code with Fusebox. Get the official book at http://www.fusionauthority.com/bkinfo.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4