Re: [gnuspeech-contact] TRM as backend for festival

David Hill Sat, 10 Feb 2007 15:49:19 -0800

I have tried accessing the samples you provided. Only one of themloaded and played. It did not sound anything like speech. The TRMis simply the waveguide model of an acoustic tube, with controlregions applied according to the Distinctive Region Model developedby Carré, based on earlier work by Fant. The underlying theory isoutlined in the paper "Real-time articulatory speech-synthesis-by-rules" on my university web site and referenced from the gnuspeechproject site (see below for the university web site URL). Manualsfor "Synthesiser" and "Monet" also appear on that web site, towardsthe end of section E of the published papers page. In the Monetmanual there is a table showing the equivalences between IPS symbolsand the Monet symbols. This should allow you to translate into theFestival set.

Monet is an interactive tool for developing data sets for arbitrarylanguages. Real-time Monet (which has not yet been ported) is theheart of a daemon that uses these data sets to convert text tospeech. It is a stripped down version of Monet and it would bereally nice if someone would take on that task (please ;-). Withoutthe data sets, and the algorithms for manipulating the parameterstracks, you don't have a speech synthesiser, you have a ratherspecialised trumpet!

The data sets developed for synthesis in "diphones.monet" weredeveloped based on several years of research in which British Englishspeech was analysed for sound data, rhythmic (duration) data, andintonation data. This research is reported in other papers on the site.

If you would like to hear some samples of gnuspeech, go to myuniversity web site:


http://www.cpsc.ucalgary.ca/~hill

click on "Published papers" in the left menu and click on the firstpaper in section "B. National and international invitedcontributions ..." and select the first paper in that section (it isthe one referred to above).

at the bottom of the left side menu in the resulting page you willfind a whole bunch of examples of gnuspeech synthesis. Some short,some long.

The tube resonance model parameters are specified in the source codefor the TRM. I attach a sample set of parameters, 24 fixed(utterance-rate) parameters, and 6 blocks of 16 parameters that drivethe so-called "speech-rate" parameters. The utterance represented is"oi", lasting about 1.5 seconds (the input control rate is 4 herz andthere are 6 blocks).


The 16 speech-rate parameters are, in order:

GlottalPitch, GlottalVolume, AspirationVolume, FricativeVolume,FricativePosition, FricativeCentreFrequency,FricativeBandWidth, radius1, radius2, radius3, raduius4, radius5,radius6, radius7, radius8, velumRadius


I hope this helps.

--------
David Hill

Simplicity, patience, compassion. These three are your greatesttreasures (Tao Te Ching #67)

---------

On Feb 8, 2007, at 9:20 AM, Nickolay V. Shmyrev wrote:

Heh, since it seems it would be hard to build Monet on Linux I'vetriedto adopt trm to work with festival. Actually for me it seems itwould be
interesting work, since festival predicts intonation and duration much
more precisely and is able to produce very good annotations.


----------
Parameter set for TRM "oi" (1.5 seconds, falling pitch during diphthong)

------------------------------------------------------------------------------------------------


4         ; input control rate (1 - 1000 Hz)
60.0      ; master volume (0 - 60 dB)
1         ; number of sound output channels (1 or 2)
0.0       ; stereo balance (-1 to +1)
0         ; glottal source waveform type (0 = pulse, 1 = sine)
40.0      ; glottal pulse rise time (5 - 50 % of GP period)
22.0      ; glottal pulse fall time minimum (5 - 50 % of GP period)
45.0      ; glottal pulse fall time maximum (5 - 50 % of GP period)
2.50      ; glottal source breathiness (0 - 10 % of GS amplitude)
10.0      ; nominal tube length (10 - 20 cm)
32        ; tube temperature (25 - 40 degrees celsius)
1.00      ; junction loss factor (0 - 5 % of unity gain)
3.05      ; aperture scaling radius (3.05 - 12 cm)
0.75      ; mouth aperture coefficient (0 - 0.99)
0.72      ; nose aperture coefficient (0 - 0.99)
1.35      ; radius of nose section 1 (0 - 3 cm)
1.96      ; radius of nose section 2 (0 - 3 cm)
1.91      ; radius of nose section 3 (0 - 3 cm)
1.3       ; radius of nose section 4 (0 - 3 cm)
0.73      ; radius of nose section 5 (0 - 3 cm)
1500.0    ; throat lowpass frequency cutoff (50 - nyquist Hz)
6.0       ; throat volume (0 - 48 dB)
1         ; pulse modulation of noise (0 = off, 1 = on)
48.0      ; noise crossmix offset (30 - 60 db)

10.0 0.0 0.0 0.0 4.0 4400 600 0.80.8 0.4 0.4 1.78 1.78 1.26 0.8 0.09.5 54.0 0.0 0.0 4.0 4400 600 0.80.8 0.4 0.4 1.78 1.78 1.26 0.8 0.09.0 60.0 0.0 0.0 4.0 4400 600 0.80.8 0.6 0.6 1.58 1.58 1.13 1.01 0.18.5 60.0 0.0 0.0 4.0 4450 550 0.80.8 1.28 1.28 1.0 1.0 1.0 0.8 1.08.0 54.0 0.0 0.0 4.0 4500 500 0.80.8 1.68 1.58 0.8 0.8 0.5 0.4 1.07.0 51.0 0.0 0.0 4.0 4500 500 0.80.8 1.78 1.78 0.2 0.2 0.4 0.0 1.0


----------

_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: [gnuspeech-contact] TRM as backend for festival

Reply via email to