Hi Dhiraj, This is a very interesting proposal and it seems you've got a very good understanding of the issues. My main concern is more on our side, that this does not seem like such a high priority issue. We haven't seen any user requests for such functionality. Although admittedly I think we hadn't given much thought to how non-western languages interact with the search and probably non-english speaking users are less likely to use our english-language mailing list or forum.
I also think a lot of users are used to full text search in applications and know how to work with it to get the results they want. For example, with "Christina" and "Cristina" people type "istina", knowing that's unlikely to appear in other words. Or more simply just use the album title or track name if they aren't sure of the spelling of the artist. Of course this isn't an ideal solution but I suspect it works well for a lot of people. That said, I'm impressed with the level of thought and detail you put into this and I would definitely encourage you to apply for GSoC 2010 with Mixxx. But you should be prepared to justify strongly why this is important for Mixxx. I would also suggest that before you apply, you think about whether there are any higher priority projects you would be equally interested in working on. Either way though we'll be looking forward to your application. Thanks, Adam Davison On 28 March 2010 07:12, Dhiraj Lohiya <[email protected]> wrote: > Please find replies inline. > On Sun, Mar 28, 2010 at 1:23 AM, Lukas Smith <[email protected]> wrote: >> >> On 27.03.2010, at 20:51, Owen Williams wrote: >> >> > >> >> As for actually implementing it yourself from scratch it probably >> >> will have to be done using sqlite_create_function: >> >> http://www.sqlite.org/c3ref/create_function.html. >> >> >> > >> > I have experience with sqlite and full text search, and I didn't get >> > good >> > results with fts. I had to use an external library called Xapian that >> > worked much, much better. So I would consider that library even if it >> > means adding a dependency. >> >> >> well for something like soundex or double methaphone you dont really need >> full text search .. however both algorithms really only work well for single >> words, which makes things a bit tricky, since you would have to maintain a >> separate table with the hashes for each word. >> > > Can we do it in the following way: > When loading new songs in the database, the a custom hash column would have > the full text hash in the following manner: > The equivalence classes for similar sounds are the classes which have a list > of same sounding phonetic substrings. They are represented as alphabets > since we have 52 english alphabets (rather than 10 numbers) considering > upper and lower case and generally we only require 20-23 of those alphabets. > Substrings are formed as continuous vowels and consonants sequence. > Some equivalent classes for hindi based on the Hindi Phonology (devnagri > script) (wrt the example song name encoding below): > > p -> equivalent class P > y | yy -> equivalent class Y > a | aa -> equivalent class A > r | rr -> equivalent class R > k | c | q |ck -> equivalent class C > m -> equivalent class M > i | e | ee -> equivalent class E > n | kn -> equivalent class N > > So, if the name of song entered is "pyaar kameena", it is stored in the > form of it's equivalent class as -> "PYAR CAMENA". > Now when we take the search query from the user, we encode it according to > this logic and then search. So whether the search query is any of the > following > pyar kamina | pyaar kamine | pyar kameena| pyar kaminaa etc. -> "PYAR > CAMENA" > The encoding for all of these will be same and now this can be easily > searched using the FTS. (If Xapian works better than the default, we could > use that). > Now this seems to be scalable even if the number of songs in the list is > more than 1 lakh.[Citation needed] > Please Feel free to tear down at any of the above points. Thanks Mad Jester, > Lukas and Owen for the continuous feedback and support. > -- > Regards > Dhiraj Lohiya > IRC nick: Dj > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Mixxx-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/mixxx-devel > > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Mixxx-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/mixxx-devel
