Re: [gnuspeech-contact] Synthetic Speech...

Farlie A Wed, 30 Mar 2016 17:52:06 -0700

On 30/03/2016 21:14, David Hill wrote:

Dear Alex,
I have copied this response to the gnuspeech list as it will be ofgeneral interest, and also encourages others to participate.


Feel free to copy my further thoughts below,,

You certainly could use /gnuspeech/ to do the sort of thing youdiscuss in you email (copy below). However, right now you'd have toget yourself a Macintosh running OS x 10.10.x and use the Monetsystem. Used Mac Pro Towers (say early 2009 versions) are available atOther World Computing (a very reliable company). They are powerful andupgradeable. I have an early 2008 and an early 2009, and have added anSSD to both of them, which gives an even better performance than thestandard machine.

Is Monet "free" software?, or at the very least the voice generatorcompatible with Creative Commons Share Alike? That was importantbecause the aim was to try and be as 'free' as possible and so that theoutput was usable by others ( and could be edited in tools like GarageBand.)

The platform independent /gnuspeechsa/ does not yet incorporate theMonet facility though I believe Marcelo is working on that aspect,judging by some of the image material he has previewed to me.

Thanks.

In order to get different accents, intonation and rhythm, as requiredfor your examples, you may have to get involved in significant manualwork, modifying the databases. For intonation, you'd have to createthe required intonation contour manually.

Hmm, and as I am not a speech professional, this may be beyond my levelof expertise, other than marking notes in the script. as to intonationintent. Your note about adding tonic feet below is something I was missing.

Something else that will need to be worked out is how to translatebetween Gnuspeech's phoneme names and E-Speak's one (based on theKirshenbaum encoding. (see my other recent e-mail).

However, here are two variants of the "Miss Jones" utterance yousuggested /using just the basic system/ and the existing databases:
It would be helpful if the original code were modified to allow amanually constructed intonation contour to be save. At the moment, ifthe synthesis is saved to a sound file, the contour is regeneratedprior to synthesis, losing any contour that was constructed manually.This is an unfortunate omission, but would be fairly easy to correct.You'll notice that the threatening "German accent" version has twoadded tonic feet added (marked by '*' in the parsed version of theutterance).

Ah. that's possibly something I also need to look up how to do inE-speak, add tonic-feet.

In order to make the process easier and less trouble, the user andapplication dictionaries should be added and made usable. Thenparticular dictionaries (a lot smaller than the main dictionary) couldbe set up for particular dialogue and accent requirements.

Hmm... Would consideration some kind of Unintophonic( Universalintonation phonetic encoding) to represent both sounds and intonationintent? An older speech synth program I found called Superior Speech! (running under RISC OS 3 years ago) , allowed for at least 8 different(albeit fixed) intonation pitches on individual phonemes as well as somemore advanced features for "singing" phonemes at specific notes (something which I understand is an area of current research by others.). There are some possible encodings like XSAMPA which incorporateintonation advice. MBROLA (which is non-free) stores intonation data ina format which deals at a much lower level so it is possible to do muchmore finely tuned intonation contouring, if I understand what that meanscorrectly. ( Thought: If there was a way to add MBROLA's PHO style datato GNUspeech/Espeak input files.... hmmm...)

The cut-in and phrase echoing would have to be done by synthesisingthe cut-in phrase and then mixing, or possibly in the future by havingtwo copies on Monet running.

That's what I thought the current situation was likely to need. Howeverfor audio-drama this is less of an issue given that ihe generated speechaudio will probably be edited together in a non-linear way anyway.Marking the cut-in's then become a partioning(?) issue during thelexical parsing(?) and timecoding in any automated scripts that wouldgenerate to ressamble the audio output. Muse(http://www.muse-sequencer.org/) is certainly scriptable, and dependingon programmer interest, it looks possible that a future gnuspeech mightbe able to pipe output directly into the tool via various Audiointerfaces like LV2, JACk etc... Granted that 'scripted' semi-automatedediting for cues is outside your area of focus on the speech generationportion.

Having access to the source code and databases, you could in principlecreate any facilities you needed to facilitate the kinds of dramaticdialogue for which you are looking. Do you have a programmer with whomyou could work? It would amount to creating a "dramatic dialogue"application, based on /gnuspeech/.

I don't yet, but was considering asking around on projects likeWikipedia/Wikisource/Wikiversity, given that certain aspects of it arequite broad.

Although not strictly within the remit of Gnuspeech, I am wondering ifanyone collects dialect examples. There was an Australian Governmentproject to collect examples of Australian English. and the BBC may havedone so in the UK a couple of years ago. I am not sure how 'free'these examples would be for researchers.. I will also note that thereis a Spoken Wikipedia archive at Wikimedia commons, which contains audioexamples of manualyl read Wikipedia articles. ( Aside: I wonder ifanyone's tried using Text to Speech for Wikipedia articles? )

On a different but related topic... from some of your papers you builtan approximate Tract model. This is presumably flexible enough to copewith most human charcteristics (including voices that "Sound like thatguy from the Trailer, that's been smoking since he was old enough to buythem."(another 'staged' voice type I will add to my earlier examples of vocaltypes.).

Call this a very premature April 1st peice if you like (but others withan interest may want to take this in more serious earnest.) but this gotme thinking way outside the box, in respect of what might be termed"Fantasy Voices" and how they could be modelled. (Clearly an involvedproject for someone creative given that you'd almost have to develop aconstructed language at the same time in order to have adictionary/transliteration to draw on. "Analysis on the creatiion ofspeculative phonology for speech vocalistion in non-humanoids" almostsounds like an unusual paper ;) )

Presumably by modifying the tract model you could have "fantasy voices"based on alternative evolution of a dominant mammal, at the very least,it being a matter of scaling certain frequencies IIRC, based on someaudio reconstruction done in the Early 1990's to recreate mammothcalls. I'm also not a biologist so whilst I appreciate speechcapability in humans has certain anatomical aspects that are required,I'm not sure of what precisely the characteristics are.

Humanoid like aliens, are also a possibility, The so-termed Nordictypes would probably have a voice closest to human (from a tract modelperspective, based on internet accounts of alleged encounters), "Stage"aliens such as in old radio/TV are from what I recall mostly accentedhuman langauge albiet with much modified grammar or intonational rythm.On the other hand you may have aliens that have "clicks" in theirlanguage (not sure of what these are called in speech/IPA terms) inadditional to tonal and noise based phonemes.

Nearly all the non-naturalistic 'robot/computer' voices I've heard inTV/Film/Radio have different (or at the least modified) tone andintonation counters, with many being post-processd using specifc effectAlthough not strictly a "robot", I will note here that both the Dalekand Clasic Series Cybermen used a ring modulator, and that Classic eraGalactica Cylons seemingly used a vocoder. I'm not sure how Glados(Half-life) was done, but the intonation contor(?) is notnaturalistic."Oh, HelLO, itS You AGaiN!" However in some UK Sci-Fishows, the robots are speaking a natural sounding sterotyped BritishRP, albiet pitched ever so slightly higher than normal..Example: "Ohreally.. I don't think that would be possible at all, Sir"

Getting back to some other thoughts on biological "voices" (these areall HIGHLY speculative)

Human speech (at least in English) doesn't have a teeth grindingcomponent in it's phonemes, but if modelling speculative speech forother biological species this may need to be considered ( seem mythoughts on Insects below).

Avian (i.e birds) would have a beak component, in addition to athroat... I'm not sure if this would be directly comparable to adding anoverly scaled nose though.

Insectoid - (Highly Speculative) - Insect (audio) speech in someexamples I once considered informally could be depending on the typecould consist of1. Pure Tone, albiet very high pitched, The insectoid effectivlysinging in a pure tone at very high pitch ( maybe ultrasound?), butslowed done so Humans could understand it.) - Examples


2. Clicks, or rasping , Caused by moving mouth parts.....

3. Drone.... This component may be caused by high speed movement ofpart of the body or an interefence pattern set up by vents for air andso on.( This is the sawtooth/noise sound you hear when a bee comes close toyour ear etc.)

Not surprisingly, the best approximation I thought of for a Bee like"voice" was the output from a not very advanced 80's speech add-oncalled the Currah Microspeech, extending the end of certain of thez,s,sh,ch (fricatives?) giving the output an 'buzzy' quality...

A sample phrase in a sci-fi audio drama with bees might be "Yooou hasssbeeen iinviiiteeed, yooou wiiil meeet thhheee quuueeen, oooonccceeeyooouu hasss beeen trrraaansssiiitionnnneeddd. Yooou wwwilllpreeepaaaiiiirrr fffoor trrraaansssiitiiiooon" (English: You have beeninvited, you will meet the queen, once you have been transitioned. Youwill prepare for transition." with the intonation countor emphasingcertain vowel frequncies and extended fricatives. In this instance thespecific context is to build a sense of some fearful act to occur.

Unlike human speech some fricatives(?) may be a sawtooth sound ratherthan noise.. (Hmm... maybe you could have "bee" speech where the dronelevel indicates the urgency of the speaker? - This suggests a moregeneral thought that maybe looking at how speech rythm changed withurgency is something to look at in respect of 'dramatic' speechgeneration.)

Most bees would be female sounding, but in audio-drama attempting to doa "special voice" they may be pitched wrongly (Some reasearch onactual bees sugsted they 'spoke' at around alto pitch) .

I'm not sure how you might do reptilian or snake-like creatures... Maybeby extending the fricatives and vowels as with an insectoid, but with adifferent intonation countor. Snakes would be "noisy" rather than aninsects sawtooth drone...

Moving into the realm of mythological creatures... (albeit human-ishones for now) - again speculative, and mostly noting the trope stylesused in various Radio/TV/film.

Jinn/Genie - Convention seems to vary, sometimes they have a definiteArab dialect, and in others not, sometimes they have a very deep Bassedpitching..An example line known to most people that have seen Britishpantos is :- "(flashbang) Al-A-din! I AM the GENIE of the LAMP! "

Leprachauns - Convention seems to be that these are high pitched butnot quite child-like Irish dialect.. "Well then , You''ll be notgetting, me pot 'o gold!"

Elves - Tolkein-like elves, speak like humans, allbiet the pitchingmay be a tad higher.. Tolkein wrote extensively about Elvish in boththe Appendices to Lord of the Rings and in other works published since.( Of constructed languages, Quenya probably has enough information onit's phonology for someone to make a plausible elvish voice in my view.)

Fairies - These vary considerably, Some are very human like, perhapswith a very slightly upward pitching (Oberon and Titania in Shakespearecome to mind).

I can probably think of more (or related aspects to ones mentioned) ifanyone is interested in discussing this in more depth seriously.


Alex Farlie.






































































---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
gnuspeech-contact mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: [gnuspeech-contact] Synthetic Speech...

Reply via email to