On 30/03/2016 21:14, David Hill wrote:
Dear Alex,
I have copied this response to the gnuspeech list as it will be of
general interest, and also encourages others to participate.
Feel free to copy my further thoughts below,,
You certainly could use /gnuspeech/ to do the sort of thing you
discuss in you email (copy below). However, right now you'd have to
get yourself a Macintosh running OS x 10.10.x and use the Monet
system. Used Mac Pro Towers (say early 2009 versions) are available at
Other World Computing (a very reliable company). They are powerful and
upgradeable. I have an early 2008 and an early 2009, and have added an
SSD to both of them, which gives an even better performance than the
standard machine.
Is Monet "free" software?, or at the very least the voice generator
compatible with Creative Commons Share Alike? That was important
because the aim was to try and be as 'free' as possible and so that the
output was usable by others ( and could be edited in tools like GarageBand.)
The platform independent /gnuspeechsa/ does not yet incorporate the
Monet facility though I believe Marcelo is working on that aspect,
judging by some of the image material he has previewed to me.
Thanks.
In order to get different accents, intonation and rhythm, as required
for your examples, you may have to get involved in significant manual
work, modifying the databases. For intonation, you'd have to create
the required intonation contour manually.
Hmm, and as I am not a speech professional, this may be beyond my level
of expertise, other than marking notes in the script. as to intonation
intent. Your note about adding tonic feet below is something I was missing.
Something else that will need to be worked out is how to translate
between Gnuspeech's phoneme names and E-Speak's one (based on the
Kirshenbaum encoding. (see my other recent e-mail).
However, here are two variants of the "Miss Jones" utterance you
suggested /using just the basic system/ and the existing databases:
It would be helpful if the original code were modified to allow a
manually constructed intonation contour to be save. At the moment, if
the synthesis is saved to a sound file, the contour is regenerated
prior to synthesis, losing any contour that was constructed manually.
This is an unfortunate omission, but would be fairly easy to correct.
You'll notice that the threatening "German accent" version has two
added tonic feet added (marked by '*' in the parsed version of the
utterance).
Ah. that's possibly something I also need to look up how to do in
E-speak, add tonic-feet.
In order to make the process easier and less trouble, the user and
application dictionaries should be added and made usable. Then
particular dictionaries (a lot smaller than the main dictionary) could
be set up for particular dialogue and accent requirements.
Hmm... Would consideration some kind of Unintophonic( Universal
intonation phonetic encoding) to represent both sounds and intonation
intent? An older speech synth program I found called Superior Speech! (
running under RISC OS 3 years ago) , allowed for at least 8 different
(albeit fixed) intonation pitches on individual phonemes as well as some
more advanced features for "singing" phonemes at specific notes (
something which I understand is an area of current research by others.
). There are some possible encodings like XSAMPA which incorporate
intonation advice. MBROLA (which is non-free) stores intonation data in
a format which deals at a much lower level so it is possible to do much
more finely tuned intonation contouring, if I understand what that means
correctly. ( Thought: If there was a way to add MBROLA's PHO style data
to GNUspeech/Espeak input files.... hmmm...)
The cut-in and phrase echoing would have to be done by synthesising
the cut-in phrase and then mixing, or possibly in the future by having
two copies on Monet running.
That's what I thought the current situation was likely to need. However
for audio-drama this is less of an issue given that ihe generated speech
audio will probably be edited together in a non-linear way anyway.
Marking the cut-in's then become a partioning(?) issue during the
lexical parsing(?) and timecoding in any automated scripts that would
generate to ressamble the audio output. Muse
(http://www.muse-sequencer.org/) is certainly scriptable, and depending
on programmer interest, it looks possible that a future gnuspeech might
be able to pipe output directly into the tool via various Audio
interfaces like LV2, JACk etc... Granted that 'scripted' semi-automated
editing for cues is outside your area of focus on the speech generation
portion.
Having access to the source code and databases, you could in principle
create any facilities you needed to facilitate the kinds of dramatic
dialogue for which you are looking. Do you have a programmer with whom
you could work? It would amount to creating a "dramatic dialogue"
application, based on /gnuspeech/.
I don't yet, but was considering asking around on projects like
Wikipedia/Wikisource/Wikiversity, given that certain aspects of it are
quite broad.
Although not strictly within the remit of Gnuspeech, I am wondering if
anyone collects dialect examples. There was an Australian Government
project to collect examples of Australian English. and the BBC may have
done so in the UK a couple of years ago. I am not sure how 'free'
these examples would be for researchers.. I will also note that there
is a Spoken Wikipedia archive at Wikimedia commons, which contains audio
examples of manualyl read Wikipedia articles. ( Aside: I wonder if
anyone's tried using Text to Speech for Wikipedia articles? )
On a different but related topic... from some of your papers you built
an approximate Tract model. This is presumably flexible enough to cope
with most human charcteristics (including voices that "Sound like that
guy from the Trailer, that's been smoking since he was old enough to buy
them."
(another 'staged' voice type I will add to my earlier examples of vocal
types.).
Call this a very premature April 1st peice if you like (but others with
an interest may want to take this in more serious earnest.) but this got
me thinking way outside the box, in respect of what might be termed
"Fantasy Voices" and how they could be modelled. (Clearly an involved
project for someone creative given that you'd almost have to develop a
constructed language at the same time in order to have a
dictionary/transliteration to draw on. "Analysis on the creatiion of
speculative phonology for speech vocalistion in non-humanoids" almost
sounds like an unusual paper ;) )
Presumably by modifying the tract model you could have "fantasy voices"
based on alternative evolution of a dominant mammal, at the very least,
it being a matter of scaling certain frequencies IIRC, based on some
audio reconstruction done in the Early 1990's to recreate mammoth
calls. I'm also not a biologist so whilst I appreciate speech
capability in humans has certain anatomical aspects that are required,
I'm not sure of what precisely the characteristics are.
Humanoid like aliens, are also a possibility, The so-termed Nordic
types would probably have a voice closest to human (from a tract model
perspective, based on internet accounts of alleged encounters), "Stage"
aliens such as in old radio/TV are from what I recall mostly accented
human langauge albiet with much modified grammar or intonational rythm.
On the other hand you may have aliens that have "clicks" in their
language (not sure of what these are called in speech/IPA terms) in
additional to tonal and noise based phonemes.
Nearly all the non-naturalistic 'robot/computer' voices I've heard in
TV/Film/Radio have different (or at the least modified) tone and
intonation counters, with many being post-processd using specifc effect
Although not strictly a "robot", I will note here that both the Dalek
and Clasic Series Cybermen used a ring modulator, and that Classic era
Galactica Cylons seemingly used a vocoder. I'm not sure how Glados
(Half-life) was done, but the intonation contor(?) is not
naturalistic."Oh, HelLO, itS You AGaiN!" However in some UK Sci-Fi
shows, the robots are speaking a natural sounding sterotyped British
RP, albiet pitched ever so slightly higher than normal..Example: "Oh
really.. I don't think that would be possible at all, Sir"
Getting back to some other thoughts on biological "voices" (these are
all HIGHLY speculative)
Human speech (at least in English) doesn't have a teeth grinding
component in it's phonemes, but if modelling speculative speech for
other biological species this may need to be considered ( seem my
thoughts on Insects below).
Avian (i.e birds) would have a beak component, in addition to a
throat... I'm not sure if this would be directly comparable to adding an
overly scaled nose though.
Insectoid - (Highly Speculative) - Insect (audio) speech in some
examples I once considered informally could be depending on the type
could consist of
1. Pure Tone, albiet very high pitched, The insectoid effectivly
singing in a pure tone at very high pitch ( maybe ultrasound?), but
slowed done so Humans could understand it.) - Examples
2. Clicks, or rasping , Caused by moving mouth parts.....
3. Drone.... This component may be caused by high speed movement of
part of the body or an interefence pattern set up by vents for air and
so on.
( This is the sawtooth/noise sound you hear when a bee comes close to
your ear etc.)
Not surprisingly, the best approximation I thought of for a Bee like
"voice" was the output from a not very advanced 80's speech add-on
called the Currah Microspeech, extending the end of certain of the
z,s,sh,ch (fricatives?) giving the output an 'buzzy' quality...
A sample phrase in a sci-fi audio drama with bees might be "Yooou hasss
beeen iinviiiteeed, yooou wiiil meeet thhheee quuueeen, oooonccceee
yooouu hasss beeen trrraaansssiiitionnnneeddd. Yooou wwwilll
preeepaaaiiiirrr fffoor trrraaansssiitiiiooon" (English: You have been
invited, you will meet the queen, once you have been transitioned. You
will prepare for transition." with the intonation countor emphasing
certain vowel frequncies and extended fricatives. In this instance the
specific context is to build a sense of some fearful act to occur.
Unlike human speech some fricatives(?) may be a sawtooth sound rather
than noise.. (Hmm... maybe you could have "bee" speech where the drone
level indicates the urgency of the speaker? - This suggests a more
general thought that maybe looking at how speech rythm changed with
urgency is something to look at in respect of 'dramatic' speech
generation.)
Most bees would be female sounding, but in audio-drama attempting to do
a "special voice" they may be pitched wrongly (Some reasearch on
actual bees sugsted they 'spoke' at around alto pitch) .
I'm not sure how you might do reptilian or snake-like creatures... Maybe
by extending the fricatives and vowels as with an insectoid, but with a
different intonation countor. Snakes would be "noisy" rather than an
insects sawtooth drone...
Moving into the realm of mythological creatures... (albeit human-ish
ones for now) - again speculative, and mostly noting the trope styles
used in various Radio/TV/film.
Jinn/Genie - Convention seems to vary, sometimes they have a definite
Arab dialect, and in others not, sometimes they have a very deep Bassed
pitching..An example line known to most people that have seen British
pantos is :- "(flashbang) Al-A-din! I AM the GENIE of the LAMP! "
Leprachauns - Convention seems to be that these are high pitched but
not quite child-like Irish dialect.. "Well then , You''ll be not
getting, me pot 'o gold!"
Elves - Tolkein-like elves, speak like humans, allbiet the pitching
may be a tad higher.. Tolkein wrote extensively about Elvish in both
the Appendices to Lord of the Rings and in other works published since.
( Of constructed languages, Quenya probably has enough information on
it's phonology for someone to make a plausible elvish voice in my view.)
Fairies - These vary considerably, Some are very human like, perhaps
with a very slightly upward pitching (Oberon and Titania in Shakespeare
come to mind).
I can probably think of more (or related aspects to ones mentioned) if
anyone is interested in discussing this in more depth seriously.
Alex Farlie.
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
_______________________________________________
gnuspeech-contact mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/gnuspeech-contact