В Сбт, 06/01/2007 в 07:38 -0700, Steve Nygard пишет: > The first set of lines represent fixed values, and are commented in > the sample file. After that each line has a bunch of values. These > are parameters to the tube model, including the radii of eight > sections of the tube, some values controlling frication, and a few > other parameters. > > If I recall correctly, the input control rate controls the time > represented by each line. The sample file is 4 Hz, so each line > should represent 0.25 seconds of sound. For generating speech the > input control rate is higher, something like 250 Hz. > > If you look in the diphones.mxml file in the source for the Monet > application, you'll find the parameters that the tube uses listed in > the <parameters> section -- the values in the tube model input file > occur in the same order they are listed in dihpones.mxml. > > The Monet application is used to create and edit the diphones.mxml, > which is a set of rules for creating "key frames" between sets of > tube model parameters and interpolating between these key frames to > generate the input to the tube model. > > I'll send you a bigger sample file generated from Monet. >
Thanks a lot Steve for precise description, sounds impressive. I've uploaded result on wiki for those interesting: http://festlang.berlios.de/docu/doku.php?id=gnuspeech Really such description of speech looks more natural than probability parameters popular this days. So I wonder, is it possible to update some existing linux TTS, say festival to generate speech with SoftwareTRM from diphone parameters file you have.
signature.asc
Description: Эта часть сообщения подписана цифровой подписью
_______________________________________________ gnuspeech-contact mailing list [email protected] http://lists.gnu.org/mailman/listinfo/gnuspeech-contact
