Re: String matching based on sound?

2018-01-29 Thread Steven D'Aprano
On Mon, 29 Jan 2018 13:28:32 -0900, Israel Brewster wrote:

> In initial searching, I did find the "fuzzy" library, which at first
> glance appeared to be what I was looking for, but it, apparently,
> ignores numbers, with the result that "all 4 one" gave the same output
> as "all in", but NOT the same output as "all 4 1" - even though "all 4
> 1" sounds EXACTLY the same, while "all in" is only similar if you ignore
> the 4.

Before passing the string to the fuzzy matcher, do a simple text 
replacement of numbers to their spelled out version: "4" -> "four".

You may want to do other text replacements too, based on sound or visual 
design, for example to deal with Kei$ha a.k.a. Keisha, etc.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


String matching based on sound?

2018-01-29 Thread Israel Brewster
I am working on a python program that, at one step, takes an input (string), 
and matches it to songs/artists in a users library. I'm having some difficulty, 
however, figuring out how to match when the input/library contains 
numbers/special characters. For example, take the group "All-4-One". In my 
library it might be listed exactly like that. I need to match this to ANY of 
the following inputs:

• all-4-one (of course)
• all 4 one (no dashes)
• all 4 1 (all numbers)
• all four one (all spelled out)
• all for one

Or, really, any other combination that sounds the same. The reasoning for this 
is that the input comes from a speech recognition system, so the user speaking, 
for example, "4", could be recognized as "for", "four" or "4". I'd imagine that 
Alexa/Siri/Google all do things like this (since you can ask them to play 
songs/artists), but I want to implement this in Python.

In initial searching, I did find the "fuzzy" library, which at first glance 
appeared to be what I was looking for, but it, apparently, ignores numbers, 
with the result that "all 4 one" gave the same output as "all in", but NOT the 
same output as "all 4 1" - even though "all 4 1" sounds EXACTLY the same, while 
"all in" is only similar if you ignore the 4.

So is there something similar that works with strings containing numbers? And 
that would only give me a match if the two strings sound identical? That is, 
even ignoring the numbers, I should NOT get a match between "all one" and "all 
in" - they are similar, but not identical, while "all one" and "all 1" would be 
identical.





-- 
https://mail.python.org/mailman/listinfo/python-list