-------- Original Message --------
Subject: Re: [svlug] Linux speech -> text
Date: Mon, 28 Apr 2008 00:02:36 -0700
From: Rick Moen <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
References: <[EMAIL PROTECTED]>

Quoting Ajit Natarajan ([EMAIL PROTECTED]):

> Is there any software out there (free or commercial) that converts 
> speech to text?  

This subject came up on another mailing list in February, so I'll quote
my post to that thread:

[snip some]
http://tldp.org/HOWTO/Speech-Recognition-HOWTO/index.html
   Speech Recognition HOWTO, dtd 2002 (slightly moldy)
   Talks about various aspects of the problem including hardware.
   Software mentioned:

   A. Open source:
   1.  XVoice
   Dictation/continuous speech recognizer that can be used with a variety
   of X applications.
   Requires IBM ViaVoice for Linux and Motif/Lesstif graphics libs.
   http://www.compapp.dcu.ie/~tdoris/Xvoice/
   http://www.zachary.com/creemer/xvoice.html
   http://xvoice.sourceforge.net/
   http://www.onelist.com/community/xvoice/

   2.  CVoiceControl (Console Voice Control)
   A basic speech recognition system that allows a user to execute Linux
   commands by using spoken commands, and includes a microphone-level
   configuration utility, a vocabulary "model editor" for adding new
   commands and utterances, and the speech recognition system..
   (Replaces KVoiceControl.)
   http://www.kiecza.de/daniel/linux/
   http://www.kiecza.de/daniel/linux/cvoicecontrol/

   3.  Open Mind Speech
   Not end-user oriented, and still under development at the time of the
   HOWTO update.  Previously called FreeSpeech, before that SpeechInput,
   before that VoiceControl.
   http://freespeech.sourceforge.net/

   2008 update:  "mostly complete".  Last update was 2002.
   http://sourceforge.net/projects/freespeech/  They've added a nice
   C++ rapid-development environment called FlowDesigner and are using that.
   http://flowdesigner.sourceforge.net/wiki/index.php/Main_Page
   Looks like the "Open Mind Speech environment aka Piper PL" has been
   given the name "Overflow".  (I hope this is meaningful to some people,
   because it isn't to me.)

   4.  GVoice
   A library (i.e. core module to be used by other software) to use
   IBM's ViaVoice to control Gtk/GNOME apps, including libraries for
   initialization, recognition engine, vocabulary manipulation, and panel
   control.  Development was stalled at the time of the HOWTO update.
   http://www.cse.ogi.edu/~omega/gnome/gvoice/

   5.  ISIP
   Speech recognition engine (toolkit) from the Mississiptti State U.
   Institute for Signal and Information Processing, aimed at developers,
   including a front-end, a decoder, and a training module.
   http://www.isip.msstate.edu/project/speech/

   6.  CMU Sphinx
   Large package, aimed at developers, including trainers, recognizers,
   acoustic models, language models, and some limited documentation.
   http://www.speech.cs.cmu.edu/sphinx/Sphinx.html
   http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz

   7.  Ears
   Another in-progress kit for developers.
   ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/

   8.  NICO ANN Toolkit
   NICO Artificial Neural Network toolkit, aimed at developers, is a
   flexible back propagation neural network toolkit optimized for
   speech recognition applications.
   http://www.speech.kth.se/NICO/

   9.  Myers's Hidden Markov Model Software
   Developers' toolkit implementing in C++ Hidden Markov Model algorithms
   detailed in L. Rabiner's book "Fundamentals of Speech Recognition".
   http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html

   10.  Jialong He's Speech Recognition Research Tool
   Research tool for developers implementing three different types of
   recognisers:  DTW, Dynamic Hidden Markov Model, and a Continuous
   Density Hidden Markov Model.
   http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/jialong.html

   B.   Proprietary Software.
   1.  IBM ViaVoice
   Proprietary, partly gratis, partly for pay as of the HOWTO update.
   Had hefty resource requirements for the day.  Includes documentation
   (PDF), trainer, dictation system, and installation scripts.  Some other
   components available.  Apparently, Java stuff.
   http://www-4.ibm.com/software/speech/dev/sdk_linux.html  (Gone.)
   (See footnote [1], below:  IBM killed it.)

   2.  Vocalis Speechware
   http://www.vocalisspeechware.com/
   http://www.vocalis.com/

   3.  Babel Technologies's Babear SDK for Linux
   Speaker-independent system based on Hybrid Markov Models and
   Artificial Neural Networks technology. They also have a variety of
   products for Text-to-speech, speaker verification, and phoneme analysis.
   http://www.babeltech.com/

   4.  SpeechWorks
   http://www.speechworks.com/

   5.  Nuance
   Speech recognition/natural language product; can handle very large
   vocabularies and uses a unqiue distributed architecture for scalability
   and fault tolerance.
   http://www.nuance.com/

   6.  Abbot/AbbotDemo
   very large vocabulary, speaker independent system, originally
   developed at Cambridge Univ., then spun off.
   http://www.softsound.com/

   7.  Entropic
   Offered software for Linux, but then were bought by Microsoft.
   Old site http://www.entropic.com/ showed what they had (but you'll
   probably have to use an Internet Archive snapshot, by now).
   Older copy of their Hidden Markov Model Toolkit is available gratis
   (but proprietary) from http://htk.eng.cam.ac.uk/ .


A bunch more options (a second catalogue of projects):
http://linux-sound.org/speech.html

A good page on the subject, last updated _June 2005_ and hence much less
moldy than the HOWTO:
http://volker.top.geek.nz/linux/speechrec.html


[1] Article from 2004 about IBM plans to finally open-source ViaVoice:
http://www.theinquirer.net/en/inquirer/news/2004/09/14/ibm-to-open-source-speech-recognition
(At 2008, I see no sign that they ever did.)
Article from 2002 about IBM making yet more bizarre moves, including
discontuing without comment the Linux SDK for ViaVoice:
http://www.linuxjournal.com/article/6383
Article from 2004 that IBM had open-sourced _some_ voice-recognition
software, donating it to Apache Softwre Foundation and Eclipse
Foundation, but had omitted ViaVoice:
http://www.hackinthebox.org/modules.php?op=modload&name=News&file=article&sid=14
188&mode=thread&order=0&thold=0
Further detail:
http://www.theinquirer.net/en/inquirer/news/2004/09/22/open-sourced-ibm-speech-code-doesnt-include-viavoice

Sounds like ViaVoice for Linux -- both the SDK and runtime -- has been
bureaucratised to death and buried somewhere within IBM.  Too bad, but
that's what happens all too often when you rely on proprietary software.
http://xvoice.sourceforge.net/faq.html includes:

   What is xvoice?
     Xvoice enables continuous speech dictation and speech control of
   most X applications. To convert users' speech into text it uses the IBM
   ViaVoice speech recognition engine, which is no longer made available
   from IBM.

   Where can I get the ViaVoice Runtime RPM, the ViaVoice SDK RPM, or the
   ViaVoice Dictation (GUI) RPM?
     They are no longer available from IBM. Used versions may be
   available; ask on the mailing list for more help locating people who are
   willing to relinquish their license(s) to you. Check in at the xvoice
   mailing list to stay up to date on developments.


_______________________________________________
svlug mailing list
[EMAIL PROTECTED]
http://lists.svlug.org/lists/listinfo/svlug


-- 
raj shekhar
facts: http://rajshekhar.net | opinions: http://rajshekhar.net/blog
Yoda of Borg are we: Futile is resistance. Assimilate you, we will
'Borg? Sounds Swedish.' - Lily, Star Trek First Contact

_______________________________________________
ilugd mailinglist -- ilugd@lists.linux-delhi.org
http://frodo.hserus.net/mailman/listinfo/ilugd
Next Event: http://freed.in - February 22-24, 2008
Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi 
http://www.mail-archive.com/ilugd@lists.linux-delhi.org/

Reply via email to