Re: [gnuspeech-contact] TRM as backend for festival

David Hill Sun, 11 Feb 2007 16:27:58 -0800

It would probably help your understanding if you were to read theMonet manual. You wrote (see below):

But I have no idea how Monet
reproduces consonants. There are examples, but no trm files for them.

The .trm files are associated strictly with the tube model ("trm" ="tube resonance model") and are saved and used by the "Synthesiser"application (which is a GUI application for playing with the tube --but only steady state configurations). (You should probably readthat manual as well). Consonants are mostly created by the dynamicsof the vocal tract changes, though there are some continuant soundssuch as frication as well (e.g. /s/) but even for these transitionalcues are important. Thus it is impossible to create consonantsfrom .trm files alone. They were really only useful in exploring thevocal tract configurations needed to create the vocal tract"postures" needed as anchor points (loosely related to "phones" forthe varying speech parameters. The dynamic information needed forcomplete speech is created from these quasi-steady-state valuesrepresenting vocal tract postures, plus context sensitive rules formoving from posture to posture, according to timing information thatreflects the rhythmic character of British English. This informationis all held within "diphones.monet" (the rules are actually morecomplex than diphones in many cases and include triphones & eventetraphones). Monet has the algorithms to use this informationappropriately. The intonation is applied to the varying stream oftube parameters generated on this basis according to a model ofBritish English intonation based on work by M.A.K. Halliday andelaborated by our own studies by varying the pitch (Fo) parameter,but these variations are added to small pitch changes created at theposture (segmental) level by constrictions in the vocal tract -- so-called "micro-intonation -- which provide additional cues for theidentification of consonants. Many of the relevant papers areavailable on my university web site.

The "oi" sound is just a succession of vowel sounds with a varyingpitch, so a series of what appear to be .trm values will work. Toproduce speech, you need to be able to construct a more complex setof varying parameters reflecting the reality of speech. This is whatMonet does. This is the part of Monet that needs to be extracted ifall you wish to do is convert sound specifications to a speechwaveform specification. The current Monet does much more since itallows you to create the databases as well as listen to the speechthat can then be produced. The extracted part (non-interactive) thatwould simply use the databases to convert streams of posture symbolsto an output waveform is what we call "Real-time Monet". It has notbeen ported from the original NeXT implementation yet.


david

On Feb 11, 2007, at 1:06 PM, Nickolay V. Shmyrev wrote:

В Сбт, 10/02/2007 в 15:53 -0800, David Hill пишет:

I have tried accessing the samples you provided.  Only one of them

loaded and played. It did not sound anything like speech. TheTRM is

simply the waveguide model of an acoustic tube, with control regions

applied according to the Distinctive Region Model developed byCarré,

based on earlier work by Fant.  The underlying theory is outlined in
the paper "Real-time articulatory speech-synthesis-by-rules" on my
university web site and referenced from the gnuspeech project site
(see below for the university web site URL).  Manuals for
"Synthesiser" and "Monet" also appear on that web site, towards the
end of section E of the published papers page.  In the Monet manual
there is a table showing the equivalences between IPS symbols and the
Monet symbols.  This should allow you to translate into the Festival
set.


Ok, thanks, I'll do

Monet is an interactive tool for developing data sets for arbitrary
languages.  Real-time Monet (which has not yet been ported) is the

heart of a daemon that uses these data sets to convert text tospeech.

It is a stripped down version of Monet and it would be really nice if
someone would take on that task (please ;-).  Without the data sets,
and the algorithms for manipulating the parameters tracks, you don't
have a speech synthesiser, you have a rather specialised trumpet!

Well, I can do that. I just need more explanation. Is it somethingSteve

splitted in Framework dir? Currently Monet compiles file, only gorm
files are missing. I don't think sound is required btw, it's enough to
be able to save audio file.

The data sets developed for synthesis in "diphones.monet" were
developed based on several years of research in which British English
speech was analysed for sound data, rhythmic (duration) data, and
intonation data.  This research is reported in other papers on the
site.

Btw, have you heart about MOSHA database?
http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html

It seems that Alan already used it in unit-selection synthesis.Althoughit's not free I suppose, that's why this work isn't availablestill. If

it will be possible to generate set of prompts (around 1000 will be
enough I suppose) with Monet and later process coefficients with
unit-selection that would be interesting thing I suppose.


If you would like to hear some samples of gnuspeech, go to my
university web site:

Yeah, I've downloaded them, but the problem is that I can reproduce
vowels, like in example "oi" you've sent. But I have no idea how Monet
reproduces consonants. There are examples, but no trm files for them.

And the examples I have (for instance the one Steve kindly sent tome),they sound like trumpet as you've noticed :) That's why I suspectthere

is a bug in trm that makes consonants generation impossible.

_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: [gnuspeech-contact] TRM as backend for festival

Reply via email to