> Jeremy Bowers wrote: >> No matter how you slice it, this is not a Python problem, this is an >> intense voice recognition algorithm problem that would make a good >> PhD thesis.
Qiangning Hong wrote: > No, my goal is nothing relative to voice recognition. Sorry that I > haven't described my question clearly. We are not teaching English, so > the voice recognition isn't helpful here. To repeat what Jeremy wrote - what you are asking *is* relative to voice recognition. You want to recognize that two different voices, with different pitches, pauses, etc., said the same thing. There is a lot of data in speech. That's why sound files are bigger than text files. Some of it gets interpreted as emotional nuances, or as an accent, while others are simply ignored. > I just want to compare two sound WAVE file, not what the students or > the teacher really saying. For example, if the teacher recorded his > "standard" pronouncation of "god", then the student saying "good" will > get a higher score than the student saying "evil" ---- because "good" > sounds more like "god". Try this: record the word twice and overlay them. They will be different. And that's with the same speaker. Now try it with your voice compared with another's. You can hear just how different they are. One will be longer, another deeper, or with the "o" sound originating in a different part of the mouth. At the level you are working on the computer doesn't know which of the data can be ignored. It doesn't know how to find the start of the word (as when a student says "ummm, good"). It doesn't know how to stretch the timings, nor adjust for pitch between, say, a man and a woman's voice. My ex-girlfriend gave me a computer program for learning Swedish. It included a program to do a simpler version of what you are asking. It only compared phonemes, so I could practice the vowels. Even then it's comparison seemed more like a random value than meaningful. Again, as Jeremy said, you want something harder than what speech recognition programs do. They at least are trained to understand a given speaker, which helps improve the quality of the recognition. You don't want that -- that's the opposite of what you're trying to do. Speaker-independent voice recognition is harder than speaker-dependent. You can implement a solution on the lines you were thinking of but as you found it doesn't work. A workable solution will require good speech recognition capability and is still very much in the research stage (as far as I know; it's not my field). If your target language is a major one then there may be some commercial language recognition software you can use. You could have your reference speaker train the software on the vocabulary list and have your students try to have the software recognize the correct word. If your word list is too short or recognizer not set well enough then saying something like "thud" will also be recognized as being close enough to "good". Why don't you just have the students hear both the teachers voice and the student's just recorded voice, one right after the other? That gives feedback. Why does the computer need to judge the correctness? Andrew [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list