Please find my replies inline.

Thanks for the suggestions and feedback. I would be glad to have more input.

On Sat, Mar 27, 2010 at 10:17 PM, Lukas Smith <[email protected]> wrote:

>
> On 27.03.2010, at 16:40, Dhiraj Lohiya wrote:
>
> > Hi
> >
> > I wanted to propose idea to improvise upon the playlist search filter of
> Mixxx and phonetically improvise upon it's search results and looking to
> take it forward as a GSoC project. Check out if you find this interesting
> enough:
> >
> > Why is this feature important for Mixxx?
> > The present search feature does a strict string match to provide results.
> But the meta data that we are searching for might have a phonetically
> different spelling than the input query of the user since it is based on how
> users pronounce and spell that query in English character set and not the
> standard spelling. Moreover, languages other than English face spelling
> standardization issues and different people spell the same words in
> different ways (which are same phonetically). Moreover, quite a few words be
> it in song meta data or otherwise are named entities which can't find a
> match even after a look up in the dictionary.
> >
> > So it is necessary to improvise upon the user experience on this issue by
> implementing a phonetic search feature which would search for phonetic
> matching words on the fly (not using dictionary or something which would
> also be memory and resource expensive).
> >
> >
> > How I plan to proceed?
> > I plan to customize the soundex algorithm for all languages where each
> language could have a different phonetic equivalent class of rules
> (Generally around 20 rules for most languages).  I would keep the approach
> layered so that support for multiple language rules could be easily
> contributed and more languages could be added by others.
>
> I think coming up with another algorithm might be a bit too much. For
> western languages double methaphone seems like a very good choice. i do not
> know a good algorithm for africa or asian names though.
>

Truely said that coming up with a another algorithm would be a tough task.
 Actually, I have already worked out a customized version of soundex
algorithm as a part of my ongoing project and implemented it in java. Right
now, the rule sets are designed only for Hindi, Marathi and English. The
results are narrowed down pretty well with much less false positives and
this works well with Marath and Hindi. Now since the algorithm part remains
same (almost equivalent to soundex) and only the rule set of each language
is to be contributed which would be used by the algorithm to process, I
guess this could do.

Moreover, through our algorithm, I plan not to take care of silent letters
 since songs that would be rare with song names and the other metadata like
artist name etc. is a named entity which generally doesn't have silent
letters.
Silent letters is one reason why the original algorithm gives irrelevant
false positives and we could capitalize upon this.


> one thing is that you do not want to default to doing searches with fuzzy
> search, but you might use it as a fallback in case there are few or no
> results. however in that case you will want to make sure that you separate
> those results.
>
>
Agreed and noted down!


> > Moreover, since it is important that once a base set of rules are defined
> by someone, the rules could themselves be added/evolve based on the user
> input and usage.
> > For instance, if many users(above a threshold set by us) insert some
> search string for which no wanted search result is retrieved, we could track
> what he finally selects and then accordingly append/modify our set of
> phonetic rules based on the phonetic mismatch amongst the  query inserted
> and result wanted according to our set of rules. Using this, the rule sets
> it could evolve itself when we collect usage statistics from users based on
> their experience. This feature would add a new dimension to the search
> functionality and would surely stand out.
> >
> > Initially I plan to code this for few Indian languages like Hindi,
> Marathi etc. and define a simple way (probably a gui on concept based on
> GoogleImageLabeler) in which rules for different languages can be directly
> added and then people knowing those languages could contribute.
>
> i guess such training could indeed make the solution better for any kind of
> language.
>
>
Exactly. With time, the users could be lazy enough while searching :) and we
could levitate the user experience.
In fact, at the onset itself, we could develop the optimized rule set by
crawling the data on the web which is available for most languages. Some
google apis exist which could be of help in this case.



>  regards,
> Lukas
>
> DJ Suicide Dive
> [email protected]
>
>
>
>
More suggestions/opinions/feedback or drawbacks of the approach are most
welcome. Thanks again!

-- 
Regards
Dhiraj Lohiya
IRC nick: Dj
------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mixxx-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mixxx-devel

Reply via email to