This is how you record sound:
http://www.getmicrophone.com/?p=69

If you're asking how to convert sound waves into speech, dude, what? Do you realize how challenging speech recognition is? Wait, why am I asking you this? If you did, you wouldn't be asking people on a Flash list how to do it, as if it's some piece of code somebody can copy and paste or a few links that will tell you the secret formula.

Most speech to text programs are based on the Hidden Markov models. In speech recognition, the hidden Markov model would output a sequence of n-dimensional real-valued vectors (with n being a small integer, such as 10), outputting one of these every 10 milliseconds. The vectors would consist of cepstral coefficients, which are obtained by taking a Fourier transform of a short time window of speech and decorrelating the spectrum using a cosine transform, then taking the first (most significant) coefficients. The hidden Markov model will tend to have in each state a statistical distribution that is a mixture of diagonal covariance Gaussians which will give a likelihood for each observed vector. Each word, or (for more general speech recognition systems), each phoneme, will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes.

There you have it. That's a high level overview of speech to text. Do you understand anything in that paragraph? Probably not.

Unless you're willing to study and put in the time to figure out how to do this, you're not going to figure it out. Nobody is going to point you in the right direction because this is a very niche knowledge area and none of these people are on Flashcoders. They're at universities working on their doctorates or working for the military or government, or some private company and they're not sharing this information. This is the stuff patents are made of.

So either give up now (because what you want is some easy solution and there isn't one) or start doing real research, learn some serious Calculus, become an expert on on sound, speech, waveforms, and then figure out how to port all of this into Flash, which, in all likelihood, lacks the performance to actually achieve this.

You'll probably have to do it on the server, passing the sound to the server as an mp3 file, and then pass the text back. That's the only thing I can think of that would possibly be able to do this.

Prove me wrong. If you pull this off, you could probably build an entire company around your technology.
_______________________________________________
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders

Reply via email to