Re: [Mixxx-devel] Search Enhancement - Phonetically improve search results for playlist filter

Lukas Smith Sun, 28 Mar 2010 05:31:13 -0700

On 28.03.2010, at 08:12, Dhiraj Lohiya wrote:

> Please find replies inline.
> 
> On Sun, Mar 28, 2010 at 1:23 AM, Lukas Smith <[email protected]> wrote:
> 
> On 27.03.2010, at 20:51, Owen Williams wrote:
> 
> >
> >>  As for actually implementing it yourself from scratch it probably
> >> will have to be done using sqlite_create_function:
> >> http://www.sqlite.org/c3ref/create_function.html.
> >>
> >
> > I have experience with sqlite and full text search, and I didn't get good
> > results with fts.  I had to use an external library called Xapian that
> > worked much, much better. So I would consider that library even if it
> > means adding a dependency.
> 
> 
> well for something like soundex or double methaphone you dont really need 
> full text search .. however both algorithms really only work well for single 
> words, which makes things a bit tricky, since you would have to maintain a 
> separate table with the hashes for each word.
> 
> 
> Can we do it in the following way:
> 
> When loading new songs in the database, the a custom hash column would have 
> the full text hash in the following manner:
> 
> The equivalence classes for similar sounds are the classes which have a list 
> of same sounding phonetic substrings. They are represented as alphabets since 
> we have 52 english alphabets (rather than 10 numbers) considering upper and 
> lower case and generally we only require 20-23 of those alphabets. Substrings 
> are formed as continuous vowels and consonants sequence.
> 
> Some equivalent classes for hindi based on the Hindi Phonology (devnagri 
> script) (wrt  the example song name encoding below):
> 
>       • p -> equivalent class P
>       • y | yy -> equivalent class Y
>       • a | aa -> equivalent class A
>       • r | rr -> equivalent class R
>       • k | c | q |ck -> equivalent class C
>       • m -> equivalent class M
>       • i | e | ee -> equivalent class E 
>       • n | kn -> equivalent class N
> 
> So, if the name of song entered is  "pyaar kameena", it is stored in the form 
> of it's equivalent class as -> "PYAR CAMENA". 
> 
> Now when we take the search query from the user, we encode it according to 
> this logic and then search. So whether the search query is any of the 
> following
> 
> pyar kamina | pyaar kamine | pyar kameena| pyar kaminaa etc. -> "PYAR CAMENA"
> 
> The encoding for all of these will be same and now this can be easily 
> searched using the FTS. (If Xapian works better than the default, we could 
> use that).
> 
> Now this seems to be scalable even if the number of songs in the list is more 
> than 1 lakh.[Citation needed]
> 
> Please Feel free to tear down at any of the above points. Thanks Mad Jester, 
> Lukas and Owen for the continuous feedback and support.


ok .. in this case you still need to do %foo% searches, which is fine, but 
means that you will not really see a performance improvement, probably a 
performance reduction. also as was noted in the thread, some people have 
already figured out how to deal with spelling mistakes by only searching for 
the pieces they know how to spell correctly or that are "most unique". reducing 
these entered text might then not really work all that well. so again this 
should be a fall back and not the default.

now the above approach has one issue .. often its not possible to really 
accurately identify only one possible hash, this is a fundamental realization 
that went into the double metaphone algorithm, where it can produce 2 hashes in 
some cases. this in turn would make it not really possible to just transform 
multiple words in a sequence of hashes.

anyways .. we do not need to solve all issues just this very second. i do not 
know how well double metaphone deals with hindi, but it might be an interesting 
approach to simply extend this algorithm with hindi rules if they are missing 
atm.

regards,
Lukas

DJ Suicide Dive
[email protected]




------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mixxx-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mixxx-devel

Re: [Mixxx-devel] Search Enhancement - Phonetically improve search results for playlist filter

Reply via email to