Please find replies inline.
On Sun, Mar 28, 2010 at 1:23 AM, Lukas Smith <[email protected]> wrote:
>
> On 27.03.2010, at 20:51, Owen Williams wrote:
>
> >
> >> As for actually implementing it yourself from scratch it probably
> >> will have to be done using sqlite_create_function:
> >> http://www.sqlite.org/c3ref/create_function.html.
> >>
> >
> > I have experience with sqlite and full text search, and I didn't get good
> > results with fts. I had to use an external library called Xapian that
> > worked much, much better. So I would consider that library even if it
> > means adding a dependency.
>
>
> well for something like soundex or double methaphone you dont really need
> full text search .. however both algorithms really only work well for single
> words, which makes things a bit tricky, since you would have to maintain a
> separate table with the hashes for each word.
>
>
Can we do it in the following way:
When loading new songs in the database, the a custom hash column would have
the full text hash in the following manner:
The equivalence classes for similar sounds are the classes which have a list
of same sounding phonetic substrings. They are represented as alphabets
since we have 52 english alphabets (rather than 10 numbers) considering
upper and lower case and generally we only require 20-23 of those alphabets.
Substrings are formed as continuous vowels and consonants sequence.
Some equivalent classes for hindi based on the Hindi Phonology (devnagri
script) (wrt the example song name encoding below):
1. p -> equivalent class P
2. y | yy -> equivalent class Y
3. a | aa -> equivalent class A
4. r | rr -> equivalent class R
5. k | c | q |ck -> equivalent class C
6. m -> equivalent class M
7. i | e | ee -> equivalent class E
8. n | kn -> equivalent class N
So, if the name of song entered is "pyaar kameena", it is stored in the
form of it's equivalent class as -> "PYAR CAMENA".
Now when we take the search query from the user, we encode it according to
this logic and then search. So whether the search query is any of the
following
pyar kamina | pyaar kamine | pyar kameena| pyar kaminaa etc. -> "PYAR
CAMENA"
The encoding for all of these will be same and now this can be easily
searched using the FTS. (If Xapian works better than the default, we could
use that).
Now this seems to be scalable even if the number of songs in the list is
more than 1 lakh.[Citation needed]
Please Feel free to tear down at any of the above points. Thanks Mad Jester,
Lukas and Owen for the continuous feedback and support.
--
Regards
Dhiraj Lohiya
IRC nick: Dj
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mixxx-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mixxx-devel