-------- Original Message -------- Subject: Re: [svlug] Linux speech -> text Date: Mon, 28 Apr 2008 00:02:36 -0700 From: Rick Moen <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] References: <[EMAIL PROTECTED]>
Quoting Ajit Natarajan ([EMAIL PROTECTED]): > Is there any software out there (free or commercial) that converts > speech to text? This subject came up on another mailing list in February, so I'll quote my post to that thread: [snip some] http://tldp.org/HOWTO/Speech-Recognition-HOWTO/index.html Speech Recognition HOWTO, dtd 2002 (slightly moldy) Talks about various aspects of the problem including hardware. Software mentioned: A. Open source: 1. XVoice Dictation/continuous speech recognizer that can be used with a variety of X applications. Requires IBM ViaVoice for Linux and Motif/Lesstif graphics libs. http://www.compapp.dcu.ie/~tdoris/Xvoice/ http://www.zachary.com/creemer/xvoice.html http://xvoice.sourceforge.net/ http://www.onelist.com/community/xvoice/ 2. CVoiceControl (Console Voice Control) A basic speech recognition system that allows a user to execute Linux commands by using spoken commands, and includes a microphone-level configuration utility, a vocabulary "model editor" for adding new commands and utterances, and the speech recognition system.. (Replaces KVoiceControl.) http://www.kiecza.de/daniel/linux/ http://www.kiecza.de/daniel/linux/cvoicecontrol/ 3. Open Mind Speech Not end-user oriented, and still under development at the time of the HOWTO update. Previously called FreeSpeech, before that SpeechInput, before that VoiceControl. http://freespeech.sourceforge.net/ 2008 update: "mostly complete". Last update was 2002. http://sourceforge.net/projects/freespeech/ They've added a nice C++ rapid-development environment called FlowDesigner and are using that. http://flowdesigner.sourceforge.net/wiki/index.php/Main_Page Looks like the "Open Mind Speech environment aka Piper PL" has been given the name "Overflow". (I hope this is meaningful to some people, because it isn't to me.) 4. GVoice A library (i.e. core module to be used by other software) to use IBM's ViaVoice to control Gtk/GNOME apps, including libraries for initialization, recognition engine, vocabulary manipulation, and panel control. Development was stalled at the time of the HOWTO update. http://www.cse.ogi.edu/~omega/gnome/gvoice/ 5. ISIP Speech recognition engine (toolkit) from the Mississiptti State U. Institute for Signal and Information Processing, aimed at developers, including a front-end, a decoder, and a training module. http://www.isip.msstate.edu/project/speech/ 6. CMU Sphinx Large package, aimed at developers, including trainers, recognizers, acoustic models, language models, and some limited documentation. http://www.speech.cs.cmu.edu/sphinx/Sphinx.html http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz 7. Ears Another in-progress kit for developers. ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/ 8. NICO ANN Toolkit NICO Artificial Neural Network toolkit, aimed at developers, is a flexible back propagation neural network toolkit optimized for speech recognition applications. http://www.speech.kth.se/NICO/ 9. Myers's Hidden Markov Model Software Developers' toolkit implementing in C++ Hidden Markov Model algorithms detailed in L. Rabiner's book "Fundamentals of Speech Recognition". http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html 10. Jialong He's Speech Recognition Research Tool Research tool for developers implementing three different types of recognisers: DTW, Dynamic Hidden Markov Model, and a Continuous Density Hidden Markov Model. http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/jialong.html B. Proprietary Software. 1. IBM ViaVoice Proprietary, partly gratis, partly for pay as of the HOWTO update. Had hefty resource requirements for the day. Includes documentation (PDF), trainer, dictation system, and installation scripts. Some other components available. Apparently, Java stuff. http://www-4.ibm.com/software/speech/dev/sdk_linux.html (Gone.) (See footnote [1], below: IBM killed it.) 2. Vocalis Speechware http://www.vocalisspeechware.com/ http://www.vocalis.com/ 3. Babel Technologies's Babear SDK for Linux Speaker-independent system based on Hybrid Markov Models and Artificial Neural Networks technology. They also have a variety of products for Text-to-speech, speaker verification, and phoneme analysis. http://www.babeltech.com/ 4. SpeechWorks http://www.speechworks.com/ 5. Nuance Speech recognition/natural language product; can handle very large vocabularies and uses a unqiue distributed architecture for scalability and fault tolerance. http://www.nuance.com/ 6. Abbot/AbbotDemo very large vocabulary, speaker independent system, originally developed at Cambridge Univ., then spun off. http://www.softsound.com/ 7. Entropic Offered software for Linux, but then were bought by Microsoft. Old site http://www.entropic.com/ showed what they had (but you'll probably have to use an Internet Archive snapshot, by now). Older copy of their Hidden Markov Model Toolkit is available gratis (but proprietary) from http://htk.eng.cam.ac.uk/ . A bunch more options (a second catalogue of projects): http://linux-sound.org/speech.html A good page on the subject, last updated _June 2005_ and hence much less moldy than the HOWTO: http://volker.top.geek.nz/linux/speechrec.html [1] Article from 2004 about IBM plans to finally open-source ViaVoice: http://www.theinquirer.net/en/inquirer/news/2004/09/14/ibm-to-open-source-speech-recognition (At 2008, I see no sign that they ever did.) Article from 2002 about IBM making yet more bizarre moves, including discontuing without comment the Linux SDK for ViaVoice: http://www.linuxjournal.com/article/6383 Article from 2004 that IBM had open-sourced _some_ voice-recognition software, donating it to Apache Softwre Foundation and Eclipse Foundation, but had omitted ViaVoice: http://www.hackinthebox.org/modules.php?op=modload&name=News&file=article&sid=14 188&mode=thread&order=0&thold=0 Further detail: http://www.theinquirer.net/en/inquirer/news/2004/09/22/open-sourced-ibm-speech-code-doesnt-include-viavoice Sounds like ViaVoice for Linux -- both the SDK and runtime -- has been bureaucratised to death and buried somewhere within IBM. Too bad, but that's what happens all too often when you rely on proprietary software. http://xvoice.sourceforge.net/faq.html includes: What is xvoice? Xvoice enables continuous speech dictation and speech control of most X applications. To convert users' speech into text it uses the IBM ViaVoice speech recognition engine, which is no longer made available from IBM. Where can I get the ViaVoice Runtime RPM, the ViaVoice SDK RPM, or the ViaVoice Dictation (GUI) RPM? They are no longer available from IBM. Used versions may be available; ask on the mailing list for more help locating people who are willing to relinquish their license(s) to you. Check in at the xvoice mailing list to stay up to date on developments. _______________________________________________ svlug mailing list [EMAIL PROTECTED] http://lists.svlug.org/lists/listinfo/svlug -- raj shekhar facts: http://rajshekhar.net | opinions: http://rajshekhar.net/blog Yoda of Borg are we: Futile is resistance. Assimilate you, we will 'Borg? Sounds Swedish.' - Lily, Star Trek First Contact _______________________________________________ ilugd mailinglist -- ilugd@lists.linux-delhi.org http://frodo.hserus.net/mailman/listinfo/ilugd Next Event: http://freed.in - February 22-24, 2008 Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi http://www.mail-archive.com/ilugd@lists.linux-delhi.org/