[ilugd] [Fwd: Re: [svlug] Linux speech -> text]

Raj Shekhar Mon, 28 Apr 2008 01:23:49 -0700


-------- Original Message --------
Subject: Re: [svlug] Linux speech -> text
Date: Mon, 28 Apr 2008 00:02:36 -0700
From: Rick Moen <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
References: <[EMAIL PROTECTED]>

Quoting Ajit Natarajan ([EMAIL PROTECTED]):

> Is there any software out there (free or commercial) that converts
> speech to text?

This subject came up on another mailing list in February, so I'll quote
my post to that thread:

[snip some]
http://tldp.org/HOWTO/Speech-Recognition-HOWTO/index.html
Speech Recognition HOWTO, dtd 2002 (slightly moldy)
Talks about various aspects of the problem including hardware.
Software mentioned:

A. Open source:
1. XVoice
Dictation/continuous speech recognizer that can be used with a variety
of X applications.
Requires IBM ViaVoice for Linux and Motif/Lesstif graphics libs.
http://www.compapp.dcu.ie/~tdoris/Xvoice/
http://www.zachary.com/creemer/xvoice.html
http://xvoice.sourceforge.net/
http://www.onelist.com/community/xvoice/

2. CVoiceControl (Console Voice Control)
A basic speech recognition system that allows a user to execute Linux
commands by using spoken commands, and includes a microphone-level
configuration utility, a vocabulary "model editor" for adding new
commands and utterances, and the speech recognition system..
(Replaces KVoiceControl.)
http://www.kiecza.de/daniel/linux/
http://www.kiecza.de/daniel/linux/cvoicecontrol/

3. Open Mind Speech
Not end-user oriented, and still under development at the time of the
HOWTO update. Previously called FreeSpeech, before that SpeechInput,
before that VoiceControl.
http://freespeech.sourceforge.net/

2008 update: "mostly complete". Last update was 2002.
http://sourceforge.net/projects/freespeech/ They've added a nice
C++ rapid-development environment called FlowDesigner and are using that.
http://flowdesigner.sourceforge.net/wiki/index.php/Main_Page
Looks like the "Open Mind Speech environment aka Piper PL" has been
given the name "Overflow". (I hope this is meaningful to some people,
because it isn't to me.)

4. GVoice
A library (i.e. core module to be used by other software) to use
IBM's ViaVoice to control Gtk/GNOME apps, including libraries for
initialization, recognition engine, vocabulary manipulation, and panel
control. Development was stalled at the time of the HOWTO update.
http://www.cse.ogi.edu/~omega/gnome/gvoice/

5. ISIP
Speech recognition engine (toolkit) from the Mississiptti State U.
Institute for Signal and Information Processing, aimed at developers,
including a front-end, a decoder, and a training module.
http://www.isip.msstate.edu/project/speech/

6. CMU Sphinx
Large package, aimed at developers, including trainers, recognizers,
acoustic models, language models, and some limited documentation.
http://www.speech.cs.cmu.edu/sphinx/Sphinx.html
http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz

7. Ears
Another in-progress kit for developers.
ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/

8. NICO ANN Toolkit
NICO Artificial Neural Network toolkit, aimed at developers, is a
flexible back propagation neural network toolkit optimized for
speech recognition applications.
http://www.speech.kth.se/NICO/

9. Myers's Hidden Markov Model Software
Developers' toolkit implementing in C++ Hidden Markov Model algorithms
detailed in L. Rabiner's book "Fundamentals of Speech Recognition".
http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html

10. Jialong He's Speech Recognition Research Tool
Research tool for developers implementing three different types of
recognisers: DTW, Dynamic Hidden Markov Model, and a Continuous
Density Hidden Markov Model.
http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/jialong.html

B. Proprietary Software.
1. IBM ViaVoice
Proprietary, partly gratis, partly for pay as of the HOWTO update.
Had hefty resource requirements for the day. Includes documentation
(PDF), trainer, dictation system, and installation scripts. Some other
components available. Apparently, Java stuff.
http://www-4.ibm.com/software/speech/dev/sdk_linux.html (Gone.)
(See footnote [1], below: IBM killed it.)

2. Vocalis Speechware
http://www.vocalisspeechware.com/
http://www.vocalis.com/

3. Babel Technologies's Babear SDK for Linux
Speaker-independent system based on Hybrid Markov Models and
Artificial Neural Networks technology. They also have a variety of
products for Text-to-speech, speaker verification, and phoneme analysis.
http://www.babeltech.com/

4. SpeechWorks
http://www.speechworks.com/

5. Nuance
Speech recognition/natural language product; can handle very large
vocabularies and uses a unqiue distributed architecture for scalability
and fault tolerance.
http://www.nuance.com/

6. Abbot/AbbotDemo
very large vocabulary, speaker independent system, originally
developed at Cambridge Univ., then spun off.
http://www.softsound.com/

7. Entropic
Offered software for Linux, but then were bought by Microsoft.
Old site http://www.entropic.com/ showed what they had (but you'll
probably have to use an Internet Archive snapshot, by now).
Older copy of their Hidden Markov Model Toolkit is available gratis
(but proprietary) from http://htk.eng.cam.ac.uk/ .

A bunch more options (a second catalogue of projects):
http://linux-sound.org/speech.html

A good page on the subject, last updated _June 2005_ and hence much less
moldy than the HOWTO:
http://volker.top.geek.nz/linux/speechrec.html

[1] Article from 2004 about IBM plans to finally open-source ViaVoice:
http://www.theinquirer.net/en/inquirer/news/2004/09/14/ibm-to-open-source-speech-recognition
(At 2008, I see no sign that they ever did.)
Article from 2002 about IBM making yet more bizarre moves, including
discontuing without comment the Linux SDK for ViaVoice:
http://www.linuxjournal.com/article/6383
Article from 2004 that IBM had open-sourced _some_ voice-recognition
software, donating it to Apache Softwre Foundation and Eclipse
Foundation, but had omitted ViaVoice:
http://www.hackinthebox.org/modules.php?op=modload&name=News&file=article&sid=14
188&mode=thread&order=0&thold=0
Further detail:
http://www.theinquirer.net/en/inquirer/news/2004/09/22/open-sourced-ibm-speech-code-doesnt-include-viavoice

Sounds like ViaVoice for Linux -- both the SDK and runtime -- has been
bureaucratised to death and buried somewhere within IBM. Too bad, but
that's what happens all too often when you rely on proprietary software.
http://xvoice.sourceforge.net/faq.html includes:

What is xvoice?
Xvoice enables continuous speech dictation and speech control of
most X applications. To convert users' speech into text it uses the IBM
ViaVoice speech recognition engine, which is no longer made available
from IBM.

Where can I get the ViaVoice Runtime RPM, the ViaVoice SDK RPM, or the
ViaVoice Dictation (GUI) RPM?
They are no longer available from IBM. Used versions may be
available; ask on the mailing list for more help locating people who are
willing to relinquish their license(s) to you. Check in at the xvoice
mailing list to stay up to date on developments.

_______________________________________________
svlug mailing list
[EMAIL PROTECTED]
http://lists.svlug.org/lists/listinfo/svlug

--
raj shekhar
facts: http://rajshekhar.net | opinions: http://rajshekhar.net/blog
Yoda of Borg are we: Futile is resistance. Assimilate you, we will
'Borg? Sounds Swedish.' - Lily, Star Trek First Contact

_______________________________________________
ilugd mailinglist -- ilugd@lists.linux-delhi.org
http://frodo.hserus.net/mailman/listinfo/ilugd
Next Event: http://freed.in - February 22-24, 2008
Archives at: http://news.gmane.org/gmane.user-groups.linux.delhi
http://www.mail-archive.com/ilugd@lists.linux-delhi.org/

[ilugd] [Fwd: Re: [svlug] Linux speech -> text]

Reply via email to