Re: Making a Vintage Sounding TTS Voice
regarding deep learning and wavenet:
wavenet takes some audio as it's input for training, and it can produce audio output.
i'm not talking about the computational power either, just talking about the speech model
it is not only for generating speech, it can even generate music.
regarding festival/festvox, they are not so much needed in this project, since festvox by itself is used to build voices for festival/flite
also, festival has support for diphone, but it's recommended way is clustergen (unit selection).
now, coming to the text processing part:
for converting text into phones, g2p is your best option.
it becomes better, when it is trained on a sequence to sequence model.
p.s: checkout soloud
it has a little synthisizer.
-- Audiogames-reflector mailing list Audiogames-reflector@sabahattin-gucukoglu.com https://sabahattin-gucukoglu.com/cgi-bin/mailman/listinfo/audiogames-reflector