Hi Sasivimon,
I apologise for the delay in responding to you email. Been *very*
busy with far too many incoming emails and other activities.
On Jan 13, 2009, at 10:32 AM, sasivimon wrote:
Hello,
I'm trying to experiment the human voice pronouncing using TRM and
a spreadsheet application.
But I can't figure out how to find the relation between the
variation of interpolation function of vocal tract shape
and the variation of formants(f1 and f2)(i think monet use bezier
curve to calculate the interpolation of vocaltract shape) .
Something like if I change the interpolation function from bezier
curve to sine curve then how the f1 and f2 value would change.
My question are:
1. How the interpolation function of vocal tract shape affect to
the formant?
The TRM is simply a waveguide model of the vocal tube and contains no
information about how to produce speech in any language. In fact,
the TRM could equally well simulate a trumpet.
2. How come the monet use bezier curve to manipolate between 2
vocal tract shape?
It doesn't really use "Bezier" curves.
Monet is a multipurpose editor that allows the TRM values for speech
postures to be stored, the shape of parameter tracks between postures
to be defined according to a collection of rules, with yet more rules
selected according to the particular combinations of speech postures
defined by the input. These further rules decide which parameter
track rules to use, and what timing to apply to the quasi-steady-
state and transitional regions of each dynamic change from posture to
posture. When Monet produces its output speech, it further applies
rhythm and intonation models to vary to prosody of the speech.
3 And how can we improve the interpolation function compare to the
real human?
By using Monet to define/refine suitable trajectories between
successive postures.
4. According to your website (http://pages.cpsc.ucalgary.ca/~hill/
papers/synthesizer/body.html). I would like to know the principle
of how you define the value of the parameters in the parameter
table in the Appendix A?
Ideally, we would have X-Ray data that would allow us to define the
area functions for the TRM that would be stored as part of the Monet
database. In practice the interactive program "Synthesizer" is used
to determine which steady state values of the TRM region parameters
will produce the sounds that are needed. In the case of postures
that do not produce much sound during closure (stops, fricatives),
the concept of "locus" as determined by the Haskins Laboratories in
the 50s and 60s is used. The locus is the "origin" of the spectral
transitions, and represents the posture that would produce the
"virtual sound" that represents the "locus" of the stop sound.
I read the pronunciation guide (Manzara & Hill 2002)
but I have no clue on how to interpret those notation in to a
numerical parameter for TRM especially for non-English language.
You need to get Monet up and running on a Macintosh or under GNUstep
on a Linux machine -- sources are in the savannah repository,
accessible using SVN (ignore the CVS repository):
http://savannah.gnu.org/projects/gnuspeech
and you will have a much better idea of what is involved. There is
also a web page accessible from that page, under the heading "Project
Home Page" that provides a short descriptive overview of the whole
Gnuspeech system. It can be accessed directly at:
http://www.gnu.org/software/gnuspeech/
Hope this helps.
Sorry for my bad English.
Your English is fine, thank you.
All good wishes.
david
--------
David Hill
[email protected]
http://savannah.gnu.org/projects/gnuspeech
--------
Simplicity, patience, compassion. These three are your greatest
treasures (Tao Te Ching #67)
---------
_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact