В Сбт, 06/01/2007 в 07:38 -0700, Steve Nygard пишет:
> The first set of lines represent fixed values, and are commented in  
> the sample file.  After that each line has a bunch of values.  These  
> are parameters to the tube model, including the radii of eight  
> sections of the tube, some values controlling frication, and a few  
> other parameters.
> 
> If I recall correctly, the input control rate controls the time  
> represented by each line.  The sample file is 4 Hz, so each line  
> should represent 0.25 seconds of sound.  For generating speech the  
> input control rate is higher, something like 250 Hz.
> 
> If you look in the diphones.mxml file in the source for the Monet  
> application, you'll find the parameters that the tube uses listed in  
> the <parameters> section -- the values in the tube model input file  
> occur in the same order they are listed in dihpones.mxml.
> 
> The Monet application is used to create and edit the diphones.mxml,  
> which is a set of rules for creating "key frames" between sets of  
> tube model parameters and interpolating between these key frames to  
> generate the input to the tube model.
> 
> I'll send you a bigger sample file generated from Monet.
> 

Thanks a lot Steve for precise description, sounds impressive. I've
uploaded result on wiki for those interesting:

http://festlang.berlios.de/docu/doku.php?id=gnuspeech

Really such description of speech looks more natural than probability
parameters popular this days. So I wonder, is it possible to update some
existing linux TTS, say festival to generate speech with SoftwareTRM
from diphone parameters file you have.

Attachment: signature.asc
Description: Эта часть сообщения подписана цифровой подписью

_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Reply via email to