[marf-dev] [Help] RE: Pronounciation detection

SourceForge.net Sun, 01 Aug 2010 08:54:20 -0700

The following forum message was posted by mokhov at 
http://sourceforge.net/projects/marf/forums/forum/213052/topic/3782565:


[quote][quote]you just have to change categories from speakers to words[/quote]
Here do you mean change in just the \" speakers.txt \" or some change in code 
is required for changing the category?? 
[/quote]

No code changes are required, just speakers.txt. You list one word per line, 
give it a numeric ID, and all the recorded files that associate to it. The 
format is described in the manual, but it is essentially a CSV file, and the 
filenames for training and testing are separated by the vertical bar \"|\".

Now if you also want the output to say \"Word\" instead of \"Speaker\", etc. or 
if the filename \"speakers.txt\" is too confusing, you\'d need to change the 
code to fix the output messages, but this is a cosmetic change.

[quote]Also,in SpeakerIdentApp matches were found based on sound features of 
speaker and was independent of what was being said and how.Whereas in the 
pronunciation system,if we substitute \"words\" in place of \"speakers\" in the 
dictionary, wouldn\'t it become specific for the speaker of the words ??[/quote]

No, the features extracted would be per categories that you pick. They are 
clustered (groupped) as such, so the training sets (your problem models) would 
be different from that of about speakers themselves (basically that means you 
cannot re-use my .gzbin files, which correspond to the speaker clusters -- you 
will have to train your own based on your own dictionary, independent of the 
speaker). The best algorithm combination(s) you\'ll find (that give you the 
highest accuracy) will likely to be also different from that of speaker 
identification.

[quote]Yes ,I agree some experimentation is required to find right combination 
of algorithms ,though for comparison I think distance based algos would be 
better for the required accuracy .Here the problem is if we are too accurate it 
becomes a speaker or word identification software and if we are too fuzzy it 
wouldn\'t just check for the right pronunciation but any thing remotely near it 
,guess some optimization would be required. [/quote]

In general, I applied the MARF\'s pipeline in my recent works not only to 
audio, but for forensic file type analysis, natural language tasks (text 
analysis), writer recognition (handwriting attribution of scanned documents in 
images), and others. There are corresponding publications for all of those. I 
guess it\'d make sense at last to maintain the web page and list them there... 
Anyways. the approach is the same for all, but the selection of algorithms is 
different depending on the task at hand, and so is the dictionary of 
categories, etc. I was caught by surprise sometimes as to which combinations 
were the best for the task, so the experiments are definitively required.

In your particular case my feeling (supported by the gender classification task 
in that article) is that you\'d probably get better accuracy if you will group 
word pronunciations by different genders/age groups at the low level of .txt 
and then in the application match it up to one word category later. This wait I 
hypothesize it\'d be more robust and gender/age group independent. 
Specifically, e.g. in  the .txt file you have

...
1,Hello_adult_female,...
2,Hello_adult_male,...
3,Hello_child,...
...

and then in the application you match up IDs 1, 2, and 3 all into \"Hello\". It 
may turn out to be this step is not necessary, but I can\'t tell offhand for 
sure. It requires experiments for both with and without such separation.

It may also be required that a better algorithm can be implemented to do a 
better job than the existing ones.

I myself if I ever get time, I\'ll make something like that up as a demo app 
for MARF, but don\'t hold your breath.

On the other hand, if you require any adjustments to MARF to make your life 
easier, let me know. If you come up with something new and you would like to 
share to contribute it to the project, you are most welcome to. :)

Let me know how it goes.

-s


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
marf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/marf-devel

[marf-dev] [Help] RE: Pronounciation detection

Reply via email to