John Wojnaroski wrote:
Hi,

The last month or so I've been working with adding synthetic speech and
voice recognition to my 747 project.

What type of project is that ? - (FlightGear related ?)

The results have been quite good;
unfortunately it's kind of hard to demonstrate or display the results.

lol, right - except of course if you want to shoot a movie :-)

But for those amongst us who have no local festival engine, it might
be illustrative to hear some simple ATC phrases (generated by festival)
for download, somewhere at http://www.cstr.ed.ac.uk/projects/festival/
you can find a link to a web-based interface to a festival engine, whose
output will then be sent to your browser.

Jim Brennan is preparing a corpus of messages and ATC phrases which will be
used to create a LM (Language Model) for speech recognition and the
synthetic speech voices come from a variety of sources -- most notably, the
FestVox folks at CMU, MBROLA, and the OGI-Festival project at CSLU.

Not that sure, though about the speech-recognition part, I simply think there are too many variables and limiting factors, to really make it feasible within the near future - maybe I'm simply being to pessimistic ;-)

But of course speech synthesis just itself would already be advantageous
to have.

Both the ASR program and TTS program can run as applications (foreground or
background) on a single machine interfacing with FG via the loopback IP
address 127.0.0.1 or on additional machines connected via a LAN.

...or on tts.flightgear.org as an on demand "stream"-server, offering centralized speech synthesis even for users with slower machines ;-)

Just wondering if there is any interest in adding this capability to FG.

I think, definitely yes - TTS/speech synthesis-ideas have been brought up various times here, looking for "TTS" or directly "festival" within the mail archive returns threads like:

http://www.mail-archive.com/flightgear-devel%40flightgear.org/msg20744.html

[OT:]
(BTW: even without a locally installed search engine -several were suggested - for flightgear's mailing list archives on flightgear.org,
it would be nice if the addresses to mail-archive.com could be added to
http://www.flightgear.org/mail.html where you keep reading:
"There is currently no search capability [...]")


But getting back to the old discussions: so there seems to be a great
interest in it - pretty much most of FlightGear's counterparts are
meanwhile equipped with basic TTS functionality, so why not FG, too ?

About all that is required is a socket-type (IPC) interface to send the text
string to the TTS application in the specified wrapper, and the TTS program
(Festival) running in server mode to create an audio signal.

I have recently looked into the IO handling stuff, as well as the ability to use xml-definable protocols , looks to me as if these two things could come in handy when it really comes to establishing a simple IPC mechanism for FlightGear <-> festival interaction.

Maybe using a new node within the property tree that does not
only hold the string to be processed, but also the respective
rules that apply, cause one would need to define some kind of
 aviation specific "dialect", in order to have festival speak
special parts of a transmission using a separate rule,
like for example a callsign, which shouldn't be spelled
character by character, but rather converted to it's
"ALPHA-ZULU"-equivalents.
(Just as a simple example meant though ...)

Of course it would later on also be interesting, to have
festival bindings available within XML-configurable sound files,
so that not only audio files can be played, but also speech dynamically
synthesized on demand, thinking of the logical implementation of more
advanced airliner mechanisms like the GPWS, it would certainly come in
very handy to suddenly be able to make FlightGear 'talk' to the user,
like "Terrain Terrain Terrain" ;-)


In addition to interactive inputs, the TTS program will receive comm traffic
from other AI controllers that produce communications with other model
entities active in the simulation.

The most frequently requested words/phrases could even be buffered either locally within FlightGear's base package directory (using a pre-defined buffer size), or indeed remotely as I suggested already above, that way one could actually create a rather comprehensive repository of common ATC chatter, and use a similar mechanism to that of terrasync, by rsync'ing such snippets on demand - in order to take care of different bandwidth, the ATC submodel might have to "pre-request" some files, though - so that they are available when needed.


Installing, compiling, and configuring the TTS and ASR packages requires a
little work, but it's not brain surgery.

While the compiling & packaging part is a no-brainer for the windows folks, one might ponder about offering statically linked versions of festival for most common other platforms and put 1-3 of these packages (including the proper configs and plenty of sounds) directly on the FlightGear-CD ROM: another good reason to purchase the FG CD-ROMs ;-)

And of course, having a subfolder with hundreds of ATC snippets by
default, becomes as much an advantage as having all the scenery
available, at least when it comes to really having to download
all those files ...

Both packages are open-source and
available (see http://linux-sound.org/speech.html for some sources).  The
real body of work is in the code and logic to create the AI controller(s)
that can respond to a real, live, unstructured input.

I think particularly the latter is extremely challenging, this sounds like work for a whole new project to me :-/

Even *IF* the recognition part is handled well enough, there are still
many factors ...

Like, people would definitely have to follow standard procedures
- as in real life, - which is not going to appeal to novices,
who don't know anything about the used phraselogy.

So all this would need to be dictionary-based, dealing with
loops that expect a certain transmission depending on the
previous transmissions - which is certainly not clever to
hard-code, but rather a script-able approach would be
preferable, possibly using a mechanism like state machine
implemented via xml files for each transmission, so that
these are being recursively parsed depending on the current
context ... now that I'm thinking about it, it does sound
manageable, but admittedly also pretty involved ... :-/


But as soon as it all works - there would be an entirely new dilemma if the speech synthesis output needs to be artificially distorted just to make it "as real as it gets" ;-)


-------- Boris


_______________________________________________ Flightgear-devel mailing list [EMAIL PROTECTED] http://mail.flightgear.org/mailman/listinfo/flightgear-devel 2f585eeea02e2c79d7b1d8c4963bae2d

Reply via email to