Hi,
On 11/01/2015 09:53 PM, Advrk Aplmrkt wrote:
Thanks Marcelo for the exaplanation. So *that's* why Siri sounds so good!
I can see how articulatory synthesis, when fully developed, can be
more powerful because you don't need to pre-record everything!
And the user can (more or less) easily change the voices. Articulatory
synthesis will (hopefully) allow users to change accent / intonation /
emotion. Another application is singing synthesis (see Pavarobotti,
http://www.cs.princeton.edu/~prc/SingingSynth.html and VocalTractLab).
Gnuspeech already allows changing the voices and testing custom
intonation curves.
Articulatory synthesis also can be used to study the phonatory system,
and can simulate speech problems.
Also, as a non-programmer and complete non-expert on the subject, how
can an user support and expedite development of Gnuspeech?
Users can tell other people about the advantages of Gnuspeech (while not
hiding its disadvantages). For example, with Gnuspeech you can easily
change the voices (vocal tract length, breathness, etc) and Gnuspeech is
still the only _articulatory_ text-to-speech system (it converts english
text to speech).
Finally, other than Gnuspeech are there other Free Software
text-to-speech software that can produce equal of better quality
synthesis? Thanks!!
The perceived quality depends on the person. I know these:
Espeak
Festival
Flite (Festival lite)
MaryTTS
RHVoice
Regards,
Marcelo
On 01/11/2015, Marcelo Y. Matuda <[email protected]> wrote:
Hi,
On 11/01/2015 03:45 PM, Advrk Aplmrkt wrote:
Thanks for the links, and I agree a proper man page or quickstart
guide would be super useful for end users! (and not just speech
synthesis researchers)
I checked out the YouTube videos, and I confess it was hard for me to
understand what Gnuspeech was saying... Is there a reason why it
doesn't sound nearly as natural as, say, Siri yet???
Siri uses a method called Unit Selection (AFAIK), which joins segments
of recorded speech. That is why the quality can be so good.
Gnuspeech uses articulatory synthesis, which uses a mathematical model
of the human vocal tract to synthesize the speech from scratch. It is
very difficult to adjust the many parameters. Also GnuspeechSA is a C++
port of the original TTS_Server (for NeXTSTEP), developed a long time
ago. It doesn't yet incorporate the research done in all these years.
Hopefully articulatory synthesis will reach the quality of unit
selection, but there is much work to do.
Regards,
Marcelo
_______________________________________________
gnuspeech-contact mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/gnuspeech-contact