On 28.03.2010, at 08:12, Dhiraj Lohiya wrote: > Please find replies inline. > > On Sun, Mar 28, 2010 at 1:23 AM, Lukas Smith <[email protected]> wrote: > > On 27.03.2010, at 20:51, Owen Williams wrote: > > > > >> As for actually implementing it yourself from scratch it probably > >> will have to be done using sqlite_create_function: > >> http://www.sqlite.org/c3ref/create_function.html. > >> > > > > I have experience with sqlite and full text search, and I didn't get good > > results with fts. I had to use an external library called Xapian that > > worked much, much better. So I would consider that library even if it > > means adding a dependency. > > > well for something like soundex or double methaphone you dont really need > full text search .. however both algorithms really only work well for single > words, which makes things a bit tricky, since you would have to maintain a > separate table with the hashes for each word. > > > Can we do it in the following way: > > When loading new songs in the database, the a custom hash column would have > the full text hash in the following manner: > > The equivalence classes for similar sounds are the classes which have a list > of same sounding phonetic substrings. They are represented as alphabets since > we have 52 english alphabets (rather than 10 numbers) considering upper and > lower case and generally we only require 20-23 of those alphabets. Substrings > are formed as continuous vowels and consonants sequence. > > Some equivalent classes for hindi based on the Hindi Phonology (devnagri > script) (wrt the example song name encoding below): > > • p -> equivalent class P > • y | yy -> equivalent class Y > • a | aa -> equivalent class A > • r | rr -> equivalent class R > • k | c | q |ck -> equivalent class C > • m -> equivalent class M > • i | e | ee -> equivalent class E > • n | kn -> equivalent class N > > So, if the name of song entered is "pyaar kameena", it is stored in the form > of it's equivalent class as -> "PYAR CAMENA". > > Now when we take the search query from the user, we encode it according to > this logic and then search. So whether the search query is any of the > following > > pyar kamina | pyaar kamine | pyar kameena| pyar kaminaa etc. -> "PYAR CAMENA" > > The encoding for all of these will be same and now this can be easily > searched using the FTS. (If Xapian works better than the default, we could > use that). > > Now this seems to be scalable even if the number of songs in the list is more > than 1 lakh.[Citation needed] > > Please Feel free to tear down at any of the above points. Thanks Mad Jester, > Lukas and Owen for the continuous feedback and support.
ok .. in this case you still need to do %foo% searches, which is fine, but means that you will not really see a performance improvement, probably a performance reduction. also as was noted in the thread, some people have already figured out how to deal with spelling mistakes by only searching for the pieces they know how to spell correctly or that are "most unique". reducing these entered text might then not really work all that well. so again this should be a fall back and not the default. now the above approach has one issue .. often its not possible to really accurately identify only one possible hash, this is a fundamental realization that went into the double metaphone algorithm, where it can produce 2 hashes in some cases. this in turn would make it not really possible to just transform multiple words in a sequence of hashes. anyways .. we do not need to solve all issues just this very second. i do not know how well double metaphone deals with hindi, but it might be an interesting approach to simply extend this algorithm with hindi rules if they are missing atm. regards, Lukas DJ Suicide Dive [email protected] ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Mixxx-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mixxx-devel
