RE: SQL Name Sounds Like Matching

2003-03-28 Thread webguy
I've done this in perl, but not cf
look here http://www.oreilly.com/catalog/maperl/toc.html

Chapter 9 or 10 i think

Check out here too
http://www.uta.fi/~ccjapu/Handson2.html

WG

-Original Message-
From: Haggerty, Mike [mailto:[EMAIL PROTECTED]
Sent: 27 March 2003 19:34
To: CF-Talk
Subject: OT: SQL Name Sounds Like Matching


I have a feeling I am going to be working on this one for a while...

One of my more demanding clients is asking for a name matching solution on
the cheap. What it needs to do is find where a name 'sounds like' another
name, even if it is in another language (including Middle Eastern and
Oriental names). In addition, I need to be able to eliminate false positives
wherever possible, in order to come up with the most consise list of
matches.

I really don't know where to start. SQL Server and Oracle both offer soundex
support, which I suppose could be used to generate some values for
comparisons. But I am not sure how this would work when different languages
come into play.

Does anyone have / know of any books / resources / advice on how to do this?

M










~|
Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4
Subscription: 
http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Your ad could be here. Monies from ads go to support these lists and provide more 
resources for the community. http://www.fusionauthority.com/ads.cfm

Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4



OT: SQL Name Sounds Like Matching

2003-03-27 Thread Haggerty, Mike
I have a feeling I am going to be working on this one for a while...

One of my more demanding clients is asking for a name matching solution on
the cheap. What it needs to do is find where a name 'sounds like' another
name, even if it is in another language (including Middle Eastern and
Oriental names). In addition, I need to be able to eliminate false positives
wherever possible, in order to come up with the most consise list of
matches.

I really don't know where to start. SQL Server and Oracle both offer soundex
support, which I suppose could be used to generate some values for
comparisons. But I am not sure how this would work when different languages
come into play.

Does anyone have / know of any books / resources / advice on how to do this?

M









~|
Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4
Subscription: 
http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Signup for the Fusion Authority news alert and keep up with the latest news in 
ColdFusion and related topics. http://www.fusionauthority.com/signup.cfm

Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4



Re: OT: SQL Name Sounds Like Matching

2003-03-27 Thread Tony Schreiber
Search google for SOUNDEX.

 I have a feeling I am going to be working on this one for a while...

 One of my more demanding clients is asking for a name matching solution on
 the cheap. What it needs to do is find where a name 'sounds like' another
 name, even if it is in another language (including Middle Eastern and
 Oriental names). In addition, I need to be able to eliminate false positives
 wherever possible, in order to come up with the most consise list of
 matches.

 I really don't know where to start. SQL Server and Oracle both offer soundex
 support, which I suppose could be used to generate some values for
 comparisons. But I am not sure how this would work when different languages
 come into play.

 Does anyone have / know of any books / resources / advice on how to do this?

 M









 
~|
Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4
Subscription: 
http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Signup for the Fusion Authority news alert and keep up with the latest news in 
ColdFusion and related topics. http://www.fusionauthority.com/signup.cfm

Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4



Re: SQL Name Sounds Like Matching

2003-03-27 Thread Matthew Walker
You say on the cheap. Well that means they get whatever's standard with no
customisation, right? That's what soundex is. It may not be great, but it's
certainly easy.  Demanding clients with a tight budget really need to be
dealt with firmly. They're business killers.

The thing is that phonetic tokenisation algorithms are language based, i.e.
soundex is designed to work for English. However, it would be by far the
easiest to implement. There's a udf here: http://cflib.org/udf.cfm?ID=39 .
There are a few variations on Soundex which could be investigated. There's
also metaphone which is a bit more complex to implement. I would store the
soundex code for each name in the database, with an index on this column,
and then use the soundex code of the input data as the search key. In this
way it will be just as quick as a regular search and you can do it whether
or not the dbms supports Soundex. Combined with regular searching perhaps
with typo support from verity, and that would have to be way more than a
budget client could expect.

Matthew Walker
Electric Sheep Web
http://www.electricsheep.co.nz/

- Original Message -
From: Haggerty, Mike [EMAIL PROTECTED]
To: CF-Talk [EMAIL PROTECTED]
Sent: Friday, March 28, 2003 7:34 AM
Subject: OT: SQL Name Sounds Like Matching


 I have a feeling I am going to be working on this one for a while...

 One of my more demanding clients is asking for a name matching solution on
 the cheap. What it needs to do is find where a name 'sounds like' another
 name, even if it is in another language (including Middle Eastern and
 Oriental names). In addition, I need to be able to eliminate false
positives
 wherever possible, in order to come up with the most consise list of
 matches.

 I really don't know where to start. SQL Server and Oracle both offer
soundex
 support, which I suppose could be used to generate some values for
 comparisons. But I am not sure how this would work when different
languages
 come into play.

 Does anyone have / know of any books / resources / advice on how to do
this?

 M









 
~|
Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4
Subscription: 
http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Signup for the Fusion Authority news alert and keep up with the latest news in 
ColdFusion and related topics. http://www.fusionauthority.com/signup.cfm

Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4



RE: SQL Name Sounds Like Matching

2003-03-27 Thread Haggerty, Mike
Matthew;

Yeah, I'd like to give them the standard with no customization, period.
Something tells me that's not going to happen.

Youre message is very helpful. I hadn't thought about using Verity for this,
but typo support would be great. 

My only issue with Soundex is with foreign names, and I can expect over a
million of them from over 50 countries. Is there a way (meaning, has anyone
already worked this out somehow) of compensating for international
variations in pronounciation? Do you know of a cyrillic version of soundex?
Something that does Greek, Russian, Hebrew? 


Thanks,
M

-Original Message-
From: Matthew Walker [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 27, 2003 3:20 PM
To: CF-Talk
Subject: Re: SQL Name Sounds Like Matching


You say on the cheap. Well that means they get whatever's standard with no
customisation, right? That's what soundex is. It may not be great, but it's
certainly easy.  Demanding clients with a tight budget really need to be
dealt with firmly. They're business killers.

The thing is that phonetic tokenisation algorithms are language based, i.e.
soundex is designed to work for English. However, it would be by far the
easiest to implement. There's a udf here: http://cflib.org/udf.cfm?ID=39 .
There are a few variations on Soundex which could be investigated. There's
also metaphone which is a bit more complex to implement. I would store the
soundex code for each name in the database, with an index on this column,
and then use the soundex code of the input data as the search key. In this
way it will be just as quick as a regular search and you can do it whether
or not the dbms supports Soundex. Combined with regular searching perhaps
with typo support from verity, and that would have to be way more than a
budget client could expect.

Matthew Walker
Electric Sheep Web
http://www.electricsheep.co.nz/

~|
Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4
Subscription: 
http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Your ad could be here. Monies from ads go to support these lists and provide more 
resources for the community. http://www.fusionauthority.com/ads.cfm

Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4



Re: SQL Name Sounds Like Matching

2003-03-27 Thread Matthew Walker
Well there's Daitch-Mokotoff Soundex which adds better support for Hebrew.
http://www.avotaynu.com/soundex.html

- Original Message -
From: Haggerty, Mike [EMAIL PROTECTED]
To: CF-Talk [EMAIL PROTECTED]
Sent: Friday, March 28, 2003 8:33 AM
Subject: RE: SQL Name Sounds Like Matching


 Matthew;

 Yeah, I'd like to give them the standard with no customization, period.
 Something tells me that's not going to happen.

 Youre message is very helpful. I hadn't thought about using Verity for
this,
 but typo support would be great.

 My only issue with Soundex is with foreign names, and I can expect over a
 million of them from over 50 countries. Is there a way (meaning, has
anyone
 already worked this out somehow) of compensating for international
 variations in pronounciation? Do you know of a cyrillic version of
soundex?
 Something that does Greek, Russian, Hebrew?


 Thanks,
 M

 -Original Message-
 From: Matthew Walker [mailto:[EMAIL PROTECTED]
 Sent: Thursday, March 27, 2003 3:20 PM
 To: CF-Talk
 Subject: Re: SQL Name Sounds Like Matching


 You say on the cheap. Well that means they get whatever's standard with
no
 customisation, right? That's what soundex is. It may not be great, but
it's
 certainly easy.  Demanding clients with a tight budget really need to be
 dealt with firmly. They're business killers.

 The thing is that phonetic tokenisation algorithms are language based,
i.e.
 soundex is designed to work for English. However, it would be by far the
 easiest to implement. There's a udf here: http://cflib.org/udf.cfm?ID=39 .
 There are a few variations on Soundex which could be investigated. There's
 also metaphone which is a bit more complex to implement. I would store the
 soundex code for each name in the database, with an index on this column,
 and then use the soundex code of the input data as the search key. In this
 way it will be just as quick as a regular search and you can do it whether
 or not the dbms supports Soundex. Combined with regular searching perhaps
 with typo support from verity, and that would have to be way more than a
 budget client could expect.

 Matthew Walker
 Electric Sheep Web
 http://www.electricsheep.co.nz/

 
~|
Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4
Subscription: 
http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
This list and all House of Fusion resources hosted by CFHosting.com. The place for 
dependable ColdFusion Hosting.

Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4



RE: SQL Name Sounds Like Matching

2003-03-27 Thread Haggerty, Mike
WOW. That is a pretty cool system, and I could see how that could be used in
a variety of situations.

Thanks for the link.

M

-Original Message-
From: Matthew Walker [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 27, 2003 3:41 PM
To: CF-Talk
Subject: Re: SQL Name Sounds Like Matching


Well there's Daitch-Mokotoff Soundex which adds better support for Hebrew.
http://www.avotaynu.com/soundex.html

~|
Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4
Subscription: 
http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribeforumid=4
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Structure your ColdFusion code with Fusebox. Get the official book at 
http://www.fusionauthority.com/bkinfo.cfm

Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4