I have a theoretical question for the list regarding comparison of
speech synthesis techniques and their capabilities for voice control/
modification at runtime while maintaining naturalness.
It is pretty clear to me that articulatory speech synthesis
potentially has a great deal of flexibility when it comes to
dynamically altering the voice, e.g. for natural intonation,
emotional speech, singing, changing dialect or language, or changing
the identity/gender/age of the speaker, etc.
I am interested in comparing these capabilities to those in HMM-based
synthesis. Can anyone comment on or point me to information regarding
the extent that HMM-based synthesis (e.g. using the HTS toolkit) has
capabilities in this regard?
Would it be fair to say that while there may be more control over the
voice during the training phase in HMM-based synthesis as compared to
unit-concatenative approaches, the feasibility of controlling the
voice at runtime in HMM-based synthesis is about as limited as that
with unit-concatenation (i.e. without losing its perceived
"naturalness")?
_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact