Hi People,
I sent a copy of this to [email protected], as well as other people
from my university account. The version to the mail list was bounced, possibly
due to the university system mucking up my authorisation. I have set up a
second subscription from this email address, outside the University.
I attach the text of the original post (Elena, in particular, will be
interested).
--
David Hill
Subject: Re: Real-time articulatory speech-synthesis-by-rules
Hi Jonathan,
Yes, indeed there has been work done since 1995.
If you visit:
http://savannah.gnu.og.projects/gnuspeech
you should find various kinds of information, including access to the CVS
repository where the Mac/GnuSpeech software is available under "current". Greg
Casamento has been working on the additions to the Mac G5/xcode source to allow
it to compile under GnuStep but I think there are still a few things to
complete. The same source should compile and run on a Mac G5 running Panther
or Tiger. The project home page has a digram showing an overview of the
system.
Using Google, you should have been able to find the links to this work fairly
easily ("tube model articulatory speech synthesis distinctive region formant
sensitivity analysis" -- some of these keywords usually bring up the necessary
links within the first two or three Google references).
The tube model itself exists as a C program, and, given the correct parameters
will run and produce a .snd, .wav or .au file on any machine.
Of course, the trick is producing the correct parameters, which is what the
Monet system is all about. The current Monet system (which is what the
available source effectively compiles to) is designed as an interactive tool
for producing the databases needed to establish a language, and was ported from
the original implementation on a NeXT. Both
Mac xcode and GnuStep have a lot in common with the original NextStep/OpenStep
that ran on the NeXT -- particularly the very powerful development environment
including all kinds of Libraries for graphics & interaction, and the use of
Objective-C. Recreating all the graphical stuff was done for the Mac, and is
almost complete within GnuStep, so the port has been reasonably possible even
within the limited programming effort resources available to us. Porting the
whole graphical/interactive Monet to plain vanilla Linux would be more
problematical (consider how long it has taken to get GnuStep up).
However, if you simply want speech output, you don't actually need all the
graphical/interactive stuff. On the NeXT there was a stripped down version of
Monet called "real-time Monet" that used the databases and rules to process
text into the parameters needed by the tube model (on the NeXT the tube ran on
the built-in DSP [Digital Signal Processor] -- these days, the host processors
are fast enough, and have additional instructions, so that a DSP is not
necessary). Real-time Monet was the main part of what was effectively a daemon
on the NeXT called the "Speech Server" that provided speech services to any
program that needed them, and simply showed up as an item in the "Services"
menu for any application. An applet called BigMouth (not that other speaking
program of the same name!) used the Speech Server to allow users to play around
with text-to-speech synthesis. There were also facilities to create User and
Application Dictionaries that were in a hierarchy with the main dictionary, so
that the Main dictionary could be modified for application or personal reasons.
A tool called "PrEditor" allowed users to adjust pronunciations of individual
words and put them in these dictionaries. There were also tools available for
Developer and/or Experimenter use, the most important of which were Main
dictionary development tools, a Server Test Plus program that gave full access
to the Speech Server facilities, including the ability to obtain the phonetic
translation of arbitrary input text; and an Applet called "Synthesizer" that
gave full interactive graphical access to the tube model to allow experimenters
(along with the full interactive Monet -- also originally restricted as part of
the "Experimenters Kit") to investigate the tube model and the speech postures
needed for various static sounds and "locii" as part of developing databases
for different languages. Monet allowed the development of posture data, and
dynamic composition data, plus some adjustment of intonation and rhythm.
However intonation and rhythm were basically determined by models based on MAK
Halliday's model of British intonation, plus rhythm data from the original
authors' research at the U of Calgary. The intonation and rhythm were
considered to be one of the great strengths
of the system. With the exception of Monet (which does allow text to be
converted to speech, and all the posture databases and rules to be manipulated)
none of the other tools and facilities have yet been ported to the new
platforms.
One of the most urgent tasks is to re-create "real-time Monet" (RTM) in a form
that can be run under plain vanilla Linux, as well as on the Mac by stripping
the ported version of Monet -- maybe converting it to plain C. The databases
already exist (though could be improved) so if RTM were created, speech
services and general speech output could be provided for both Mac and Linux.
It would be valuable also to port the other tools and facilities to the Mac
(xcode) and GnuStep so that research and development on all aspects of the
speech output, including the creation of databases for other languages than
English, could proceed on modern platforms.
It is for speech experimentation (for language development, phonetic
experimentation, and psychophysical work) that the articulatory speech
synthesis software we created is most important, however important and useful
high quality speech output for computers may be.
Future work will not only include the kinds of developments I've hinted at
above, but also better models of sibilant sounds and larynx excitation by
modelling the airflow characteristics of these sounds instead of arbitrarily
injecting waveforms approximating what is observed in speech, plus getting a
much tighter connection between the underlying speech gestures that create
speech and the control of the tube model. We'd also like to generalise the
frameworks used for the Halliday-based intonation and rhythm so that a basis
for trying other approaches to intonation might be made easily available to the
world.
All the existing software is (as noted) available under a GPL (see:
http://www.gnu.org for details).
I hope this may be helpful. I would value your comments and would be very
happy to answer any further questions you may have. We are *very* interested
in obtaining help in further development of the gnuspeech project and wonder if
you have the interest and skills to become involved?
All good wishes.
david
- -------
David Hill, Prof. Emeritus, Computer Science | Imagination is more |
U. Calgary, Calgary, AB, Canada T2N 1N4 | important than knowledge |
[EMAIL PROTECTED] OR [EMAIL PROTECTED] | (Albert Einstein) |
http://www.cpsc.ucalgary.ca/~hill | Kill your television |
On Fri, 20 May 2005, Jonathan Schreiter wrote:
> Hi,
> I read your paper from 1995 titled "Real-time
> articulatory speech-synthesis-by-rules". I am
> interested in this area of research. I noticed that
> the document stated the software would be available
> via the GNU software website, but I was unable to find
> it. Is it the software / ruleset database publicly
> available? Has any updated work been done since 1995?
>
> Any help would be greatly appreciated.
>
> Many thanks,
> Jonathan
>
_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact