Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-29 Thread Jens Lechtenboerger
On 2023-09-28, Jude DaShiell wrote:

> espeak-ng likes to have speechdispatcher on a system

Does this improve the quality of the generated speech?

> and festival likes to have language-specific voices on it to use.

Indeed.  Which one(s) do you recommend?  I tried
voice_cmu_us_slt_arctic_hts and the mbrola us voices.

> fenrir which you didn't mention runs in user land and has no kernel
> dependencies.

I briefly tried that but failed.  Also, I was under the impression
that fenrir does produce speech by itself but relies on some TTS
implementation like espeak.  Is that wrong?

Best wishes
Jens


smime.p7s
Description: S/MIME cryptographic signature


Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-28 Thread Jude DaShiell
espeak-ng likes to have speechdispatcher on a system and festival likes to
have language-specific voices on it to use.
fenrir which you didn't mention runs in user land and has no kernel
dependencies.


-- Jude  "There are four boxes to be used in
defense of liberty: soap, ballot, jury, and ammo. Please use in that
order." Ed Howdershelt 1940.

On Thu, 28 Sep 2023, Jens Lechtenboerger wrote:

> Dear all,
>
> some time ago I asked for suggestions concerning Text-To-Speech
> (TTS) from Org sources.  Thank you to everyone who provided
> suggestions!  In case you are interested, you can listen to sample
> results at [1].
>
> Briefly, Emacspeak, espeak-ng, and festival are not good enough for
> my purposes.  Maybe I'm missing relevant backend options.  IMHO,
> Coqui-AI TTS [2] and Microsoft SpeechT5 [3] are far superior.
>
> Best wishes
> Jens
>
> [1] https://gitlab.com/oer/emacs-reveal/-/wikis/Sample-TTS-results
> [2] https://github.com/coqui-ai/TTS/
> [3] https://huggingface.co/microsoft/speecht5_tts
>
>



Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-28 Thread Jens Lechtenboerger
Dear all,

some time ago I asked for suggestions concerning Text-To-Speech
(TTS) from Org sources.  Thank you to everyone who provided
suggestions!  In case you are interested, you can listen to sample
results at [1].

Briefly, Emacspeak, espeak-ng, and festival are not good enough for
my purposes.  Maybe I'm missing relevant backend options.  IMHO,
Coqui-AI TTS [2] and Microsoft SpeechT5 [3] are far superior.

Best wishes
Jens

[1] https://gitlab.com/oer/emacs-reveal/-/wikis/Sample-TTS-results
[2] https://github.com/coqui-ai/TTS/
[3] https://huggingface.co/microsoft/speecht5_tts



Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread Jude DaShiell
Thanks much, that one worked.


-- Jude  "There are four boxes to be used in
defense of liberty: soap, ballot, jury, and ammo. Please use in that
order." Ed Howdershelt 1940.

On Mon, 11 Sep 2023, to...@tuxteam.de wrote:

> On Mon, Sep 11, 2023 at 09:52:30AM -0400, Jude DaShiell wrote:
> > Why does this happen?
> >
> > The gpg command given in the faq does not return what the faq claims will
> > be returned.
> > Look for the public key of Klaus Knopper:
> > bash: Look: command not found
> > bash-5.1$  gpg --keyserver pool.sks-keyservers.net --search-keys "Klaus
> > Knopper"
>
> Hm. Keyservers seem to be a dying species these days. Try
>
>   --keyserver hkp://keyserver.ubuntu.com/
>
> Cheers
>



Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread tomas
On Mon, Sep 11, 2023 at 09:52:30AM -0400, Jude DaShiell wrote:
> Why does this happen?
> 
> The gpg command given in the faq does not return what the faq claims will
> be returned.
> Look for the public key of Klaus Knopper:
> bash: Look: command not found
> bash-5.1$  gpg --keyserver pool.sks-keyservers.net --search-keys "Klaus
> Knopper"

Hm. Keyservers seem to be a dying species these days. Try

  --keyserver hkp://keyserver.ubuntu.com/

Cheers
-- 
t


signature.asc
Description: PGP signature


Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread Jude DaShiell
fenrir-screenreader is installable with pip though you may need some
support getting it set up and configured.


-- Jude  "There are four boxes to be used in
defense of liberty: soap, ballot, jury, and ammo. Please use in that
order." Ed Howdershelt 1940.

On Mon, 11 Sep 2023, to...@tuxteam.de wrote:

> On Mon, Sep 11, 2023 at 08:06:34AM -0400, Jude DaShiell wrote:
> > fenrir-screenreader is also available.
> > https://nashcentral.duckdns.org/projects/Jenux
> > uses fenrir by default.
> > Klaus Knopper's public key I haven't been able to find and none of his
> > email addresses seem to be working any longer either for the ones I found.
>
> The knoppix signing keys seem to be around here:
>
>   http://ftp.knoppix.net/wiki/Downloading_FAQ
>
> > You have a chance of getting a good version of knoppix if you download
> > with a good bittorrent client and make sure your encryption required is
> > turned on and make sure of integrity checks.
>
> Look here:
>
>   http://knoppix.net/
>
> ...and consider buying a CD (yes, that's still a thing ;-) or a stick
> to support development.
>
> Cheers
>



Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread Jude DaShiell
Why does this happen?

The gpg command given in the faq does not return what the faq claims will
be returned.
Look for the public key of Klaus Knopper:
bash: Look: command not found
bash-5.1$  gpg --keyserver pool.sks-keyservers.net --search-keys "Klaus
Knopper"
gpg: error searching keyserver: Server indicated a failure
gpg: keyserver search failed: Server indicated a failure
bash-5.1$
bash-5.1$


-- 
Jude 
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo.
Please use in that order."
Ed Howdershelt 1940.

On Mon, 11 Sep 2023, to...@tuxteam.de wrote:

> On Mon, Sep 11, 2023 at 08:06:34AM -0400, Jude DaShiell wrote:
> > fenrir-screenreader is also available.
> > https://nashcentral.duckdns.org/projects/Jenux
> > uses fenrir by default.
> > Klaus Knopper's public key I haven't been able to find and none of his
> > email addresses seem to be working any longer either for the ones I found.
>
> The knoppix signing keys seem to be around here:
>
>   http://ftp.knoppix.net/wiki/Downloading_FAQ
>
> > You have a chance of getting a good version of knoppix if you download
> > with a good bittorrent client and make sure your encryption required is
> > turned on and make sure of integrity checks.
>
> Look here:
>
>   http://knoppix.net/
>
> ...and consider buying a CD (yes, that's still a thing ;-) or a stick
> to support development.
>
> Cheers
>



Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread Jens Lechtenboerger
Thank you for the additional pointers!  I still need to check out
promising combinations of those approaches, also options for MBROLA
(which is not free, but applies a custom AGPL-3.0-but-not-be-sold
license to the voices).  ADRIANE is certainly fascinating.

Best wishes
Jens

On 2023-09-11, briangpowell wrote:

> * eSpeak seems to focus on small footprints & a "format synthesis" method
>
> * Suggest using Festival with MBrola:
>
> https://www.cstr.ed.ac.uk/projects/festival/mbrola.html
>
> https://www.cstr.ed.ac.uk/projects/festival/
>
> and/or just install FestivalLite:
>
> apt-get install -f -y --force-yes flite
>
> * Note EmacSpeak {mentioned in another email} is written by OrgMode user &
> programmer TV Raman--not sure EmacSpeak will help you at all; but it might
> be interesting for you
>
> ** Klaus Knopper distributes some very interesting free software that
> includes an audio-desktop called ADRIANE that maybe you can look at--I'd
> love to hear what you find out if you do:
>
> https://www.knopper.net/knoppix-adriane/index-en.html
>
> ** Knopper invented the "run Linux entirely from a cdrom" craze--which
> still is very useful in many ways--suggest you give Knoppix & Adriane a look
>
> On Mon, Sep 11, 2023 at 4:02 AM Christian Thäter  wrote:
>
>> On Sun, 10 Sep 2023 16:39:26 +0200
>> Jens Lechtenboerger  wrote:
>>
>> > On 2023-09-10, Ihor Radchenko wrote:
>> >
>> > > Jens Lechtenboerger  writes:
>> > >
>> > >> does someone here produce audio via Text-To-Speech (TTS) from Org
>> > >> sources?  I plan to do that in the context of emacs-reveal to
>> > >> generate voice-over for reveal.js presentations, with open
>> > >> questions [1] concerning my initial, experimental approach.
>> > >
>> > > Emacspeak is a mature Emacs solution for TTS. However, it aims blind
>> > > users, not presentations. Still,
>> > > http://tvraman.github.io/emacspeak/manual/Quick-Installation.html
>> > > might be a good starting point for TTS options.
>> >
>> > Thank you for the suggestion.  With espeak this indeed pronounces
>> > numbers and abbreviations but its audio quality it not good enough
>> > for my purposes.  I am looking for (near-) human voices...
>>
>> using mbrola is probably as good as possible with free software:
>> https://en.wikipedia.org/wiki/MBROLA
>>
>> still not perfect, but much better than the builtin voices of espeak or
>> festival (YYMV).
>>
>> >
>> > Best wishes
>> > Jens
>> >
>>
>>
>>


smime.p7s
Description: S/MIME cryptographic signature


Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread tomas
On Mon, Sep 11, 2023 at 08:06:34AM -0400, Jude DaShiell wrote:
> fenrir-screenreader is also available.
> https://nashcentral.duckdns.org/projects/Jenux
> uses fenrir by default.
> Klaus Knopper's public key I haven't been able to find and none of his
> email addresses seem to be working any longer either for the ones I found.

The knoppix signing keys seem to be around here:

  http://ftp.knoppix.net/wiki/Downloading_FAQ

> You have a chance of getting a good version of knoppix if you download
> with a good bittorrent client and make sure your encryption required is
> turned on and make sure of integrity checks.

Look here:

  http://knoppix.net/

...and consider buying a CD (yes, that's still a thing ;-) or a stick
to support development.

Cheers
-- 
t


signature.asc
Description: PGP signature


Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread Jude DaShiell
espeak-ng is a fork of espeak and can use speechdispatcher.


-- 
Jude 
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo.
Please use in that order."
Ed Howdershelt 1940.

On Mon, 11 Sep 2023, briangpowell wrote:

> * eSpeak seems to focus on small footprints & a "format synthesis" method
>
> * Suggest using Festival with MBrola:
>
> https://www.cstr.ed.ac.uk/projects/festival/mbrola.html
>
> https://www.cstr.ed.ac.uk/projects/festival/
>
> and/or just install FestivalLite:
>
> apt-get install -f -y --force-yes flite
>
> * Note EmacSpeak {mentioned in another email} is written by OrgMode user &
> programmer TV Raman--not sure EmacSpeak will help you at all; but it might
> be interesting for you
>
> ** Klaus Knopper distributes some very interesting free software that
> includes an audio-desktop called ADRIANE that maybe you can look at--I'd
> love to hear what you find out if you do:
>
> https://www.knopper.net/knoppix-adriane/index-en.html
>
> ** Knopper invented the "run Linux entirely from a cdrom" craze--which
> still is very useful in many ways--suggest you give Knoppix & Adriane a look
>
> On Mon, Sep 11, 2023 at 4:02 AM Christian Thäter  wrote:
>
> > On Sun, 10 Sep 2023 16:39:26 +0200
> > Jens Lechtenboerger  wrote:
> >
> > > On 2023-09-10, Ihor Radchenko wrote:
> > >
> > > > Jens Lechtenboerger  writes:
> > > >
> > > >> does someone here produce audio via Text-To-Speech (TTS) from Org
> > > >> sources?  I plan to do that in the context of emacs-reveal to
> > > >> generate voice-over for reveal.js presentations, with open
> > > >> questions [1] concerning my initial, experimental approach.
> > > >
> > > > Emacspeak is a mature Emacs solution for TTS. However, it aims blind
> > > > users, not presentations. Still,
> > > > http://tvraman.github.io/emacspeak/manual/Quick-Installation.html
> > > > might be a good starting point for TTS options.
> > >
> > > Thank you for the suggestion.  With espeak this indeed pronounces
> > > numbers and abbreviations but its audio quality it not good enough
> > > for my purposes.  I am looking for (near-) human voices...
> >
> > using mbrola is probably as good as possible with free software:
> > https://en.wikipedia.org/wiki/MBROLA
> >
> > still not perfect, but much better than the builtin voices of espeak or
> > festival (YYMV).
> >
> > >
> > > Best wishes
> > > Jens
> > >
> >
> >
> >
>



Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread Jude DaShiell
fenrir-screenreader is also available.
https://nashcentral.duckdns.org/projects/Jenux
uses fenrir by default.
Klaus Knopper's public key I haven't been able to find and none of his
email addresses seem to be working any longer either for the ones I found.
You have a chance of getting a good version of knoppix if you download
with a good bittorrent client and make sure your encryption required is
turned on and make sure of integrity checks.


-- 
Jude 
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo.
Please use in that order."
Ed Howdershelt 1940.

On Mon, 11 Sep 2023, briangpowell wrote:

> * eSpeak seems to focus on small footprints & a "format synthesis" method
>
> * Suggest using Festival with MBrola:
>
> https://www.cstr.ed.ac.uk/projects/festival/mbrola.html
>
> https://www.cstr.ed.ac.uk/projects/festival/
>
> and/or just install FestivalLite:
>
> apt-get install -f -y --force-yes flite
>
> * Note EmacSpeak {mentioned in another email} is written by OrgMode user &
> programmer TV Raman--not sure EmacSpeak will help you at all; but it might
> be interesting for you
>
> ** Klaus Knopper distributes some very interesting free software that
> includes an audio-desktop called ADRIANE that maybe you can look at--I'd
> love to hear what you find out if you do:
>
> https://www.knopper.net/knoppix-adriane/index-en.html
>
> ** Knopper invented the "run Linux entirely from a cdrom" craze--which
> still is very useful in many ways--suggest you give Knoppix & Adriane a look
>
> On Mon, Sep 11, 2023 at 4:02 AM Christian Thäter  wrote:
>
> > On Sun, 10 Sep 2023 16:39:26 +0200
> > Jens Lechtenboerger  wrote:
> >
> > > On 2023-09-10, Ihor Radchenko wrote:
> > >
> > > > Jens Lechtenboerger  writes:
> > > >
> > > >> does someone here produce audio via Text-To-Speech (TTS) from Org
> > > >> sources?  I plan to do that in the context of emacs-reveal to
> > > >> generate voice-over for reveal.js presentations, with open
> > > >> questions [1] concerning my initial, experimental approach.
> > > >
> > > > Emacspeak is a mature Emacs solution for TTS. However, it aims blind
> > > > users, not presentations. Still,
> > > > http://tvraman.github.io/emacspeak/manual/Quick-Installation.html
> > > > might be a good starting point for TTS options.
> > >
> > > Thank you for the suggestion.  With espeak this indeed pronounces
> > > numbers and abbreviations but its audio quality it not good enough
> > > for my purposes.  I am looking for (near-) human voices...
> >
> > using mbrola is probably as good as possible with free software:
> > https://en.wikipedia.org/wiki/MBROLA
> >
> > still not perfect, but much better than the builtin voices of espeak or
> > festival (YYMV).
> >
> > >
> > > Best wishes
> > > Jens
> > >
> >
> >
> >
>



Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread briangpowell
* eSpeak seems to focus on small footprints & a "format synthesis" method

* Suggest using Festival with MBrola:

https://www.cstr.ed.ac.uk/projects/festival/mbrola.html

https://www.cstr.ed.ac.uk/projects/festival/

and/or just install FestivalLite:

apt-get install -f -y --force-yes flite

* Note EmacSpeak {mentioned in another email} is written by OrgMode user &
programmer TV Raman--not sure EmacSpeak will help you at all; but it might
be interesting for you

** Klaus Knopper distributes some very interesting free software that
includes an audio-desktop called ADRIANE that maybe you can look at--I'd
love to hear what you find out if you do:

https://www.knopper.net/knoppix-adriane/index-en.html

** Knopper invented the "run Linux entirely from a cdrom" craze--which
still is very useful in many ways--suggest you give Knoppix & Adriane a look

On Mon, Sep 11, 2023 at 4:02 AM Christian Thäter  wrote:

> On Sun, 10 Sep 2023 16:39:26 +0200
> Jens Lechtenboerger  wrote:
>
> > On 2023-09-10, Ihor Radchenko wrote:
> >
> > > Jens Lechtenboerger  writes:
> > >
> > >> does someone here produce audio via Text-To-Speech (TTS) from Org
> > >> sources?  I plan to do that in the context of emacs-reveal to
> > >> generate voice-over for reveal.js presentations, with open
> > >> questions [1] concerning my initial, experimental approach.
> > >
> > > Emacspeak is a mature Emacs solution for TTS. However, it aims blind
> > > users, not presentations. Still,
> > > http://tvraman.github.io/emacspeak/manual/Quick-Installation.html
> > > might be a good starting point for TTS options.
> >
> > Thank you for the suggestion.  With espeak this indeed pronounces
> > numbers and abbreviations but its audio quality it not good enough
> > for my purposes.  I am looking for (near-) human voices...
>
> using mbrola is probably as good as possible with free software:
> https://en.wikipedia.org/wiki/MBROLA
>
> still not perfect, but much better than the builtin voices of espeak or
> festival (YYMV).
>
> >
> > Best wishes
> > Jens
> >
>
>
>


Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread Jens Lechtenboerger
On 2023-09-10, Christian Thäter wrote:

> On Sun, 10 Sep 2023 16:39:26 +0200
> Jens Lechtenboerger  wrote:
>
>> On 2023-09-10, Ihor Radchenko wrote:
>> 
>> > Jens Lechtenboerger  writes:
>> >  
>> >> does someone here produce audio via Text-To-Speech (TTS) from Org
>> >> sources?  I plan to do that in the context of emacs-reveal to
>> >> generate voice-over for reveal.js presentations, with open
>> >> questions [1] concerning my initial, experimental approach.  
>> >
>> > Emacspeak is a mature Emacs solution for TTS. However, it aims blind
>> > users, not presentations. Still,
>> > http://tvraman.github.io/emacspeak/manual/Quick-Installation.html
>> > might be a good starting point for TTS options.  
>> 
>> Thank you for the suggestion.  With espeak this indeed pronounces
>> numbers and abbreviations but its audio quality it not good enough
>> for my purposes.  I am looking for (near-) human voices...
>
> using mbrola is probably as good as possible with free software:
> https://en.wikipedia.org/wiki/MBROLA
>
> still not perfect, but much better than the builtin voices of espeak or
> festival (YYMV).

This sounds promising.  I’ll check it out.

Many thanks
Jens


smime.p7s
Description: S/MIME cryptographic signature


Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-11 Thread Christian Thäter
On Sun, 10 Sep 2023 16:39:26 +0200
Jens Lechtenboerger  wrote:

> On 2023-09-10, Ihor Radchenko wrote:
> 
> > Jens Lechtenboerger  writes:
> >  
> >> does someone here produce audio via Text-To-Speech (TTS) from Org
> >> sources?  I plan to do that in the context of emacs-reveal to
> >> generate voice-over for reveal.js presentations, with open
> >> questions [1] concerning my initial, experimental approach.  
> >
> > Emacspeak is a mature Emacs solution for TTS. However, it aims blind
> > users, not presentations. Still,
> > http://tvraman.github.io/emacspeak/manual/Quick-Installation.html
> > might be a good starting point for TTS options.  
> 
> Thank you for the suggestion.  With espeak this indeed pronounces
> numbers and abbreviations but its audio quality it not good enough
> for my purposes.  I am looking for (near-) human voices...

using mbrola is probably as good as possible with free software:
https://en.wikipedia.org/wiki/MBROLA

still not perfect, but much better than the builtin voices of espeak or
festival (YYMV).

> 
> Best wishes
> Jens
> 




Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-10 Thread Jens Lechtenboerger
On 2023-09-10, Ihor Radchenko wrote:

> Jens Lechtenboerger  writes:
>
>> does someone here produce audio via Text-To-Speech (TTS) from Org
>> sources?  I plan to do that in the context of emacs-reveal to
>> generate voice-over for reveal.js presentations, with open questions
>> [1] concerning my initial, experimental approach.
>
> Emacspeak is a mature Emacs solution for TTS. However, it aims blind
> users, not presentations. Still,
> http://tvraman.github.io/emacspeak/manual/Quick-Installation.html might
> be a good starting point for TTS options.

Thank you for the suggestion.  With espeak this indeed pronounces
numbers and abbreviations but its audio quality it not good enough
for my purposes.  I am looking for (near-) human voices...

Best wishes
Jens



Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-10 Thread Jens Lechtenboerger
On 2023-09-09, briangpowell wrote:

> I've turned OrgMode files into audio desktops
>
> It was pretty simple
>
> Just find the code that reveals what an icon is when you hover over it &
> pipe it to some text-to-speech engine & then on to usual routes

Thank you for the reply.  In my case (GNU/Linux), I guess that the
usual text-to-speech engine is espeak.  I find its speech quality to
be disappointing (much worse than the language models that I
mentioned earlier).  I am looking for (near-) human quality...

Best wishes
Jens



Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-10 Thread Ihor Radchenko
Jens Lechtenboerger  writes:

> does someone here produce audio via Text-To-Speech (TTS) from Org
> sources?  I plan to do that in the context of emacs-reveal to
> generate voice-over for reveal.js presentations, with open questions
> [1] concerning my initial, experimental approach.

Emacspeak is a mature Emacs solution for TTS. However, it aims blind
users, not presentations. Still,
http://tvraman.github.io/emacspeak/manual/Quick-Installation.html might
be a good starting point for TTS options.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at .
Support Org development at ,
or support my work at 



Re: Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-09 Thread briangpowell
I've turned OrgMode files into audio desktops

It was pretty simple

Just find the code that reveals what an icon is when you hover over it &
pipe it to some text-to-speech engine & then on to usual routes

On Sat, Sep 9, 2023 at 2:06 PM Jens Lechtenboerger <
lech...@wi.uni-muenster.de> wrote:

> Dear all,
>
> does someone here produce audio via Text-To-Speech (TTS) from Org
> sources?  I plan to do that in the context of emacs-reveal to
> generate voice-over for reveal.js presentations, with open questions
> [1] concerning my initial, experimental approach.
>
> Currently, I like the default model of Coqui-AI TTS [2] and
> Microsoft SpeechT5 [3] best.  Any suggestions for free and open TTS
> implementations that produce even better results?  Other models of
> Coqui-AI?  The solution should work without GPU support, which seems
> to rule out Suno Bark [4].
>
> The above models do not pronounce numbers/digits, and they fail to
> pronounce most acronyms.  In a preprocessing step I could replace
> those.  I use preprocessing anyways to get rid of Org markup that
> might confuse the language models.  Anyone here who did that
> already?  Maybe gruut [5] in conjunction with SSML [6] handling?
>
> Any other suggestions?
>
> Best wishes
> Jens
>
> [1] https://gitlab.com/oer/emacs-reveal/-/issues/20
> [2] https://github.com/coqui-ai/TTS/
> [3] https://huggingface.co/microsoft/speecht5_tts
> [4] https://github.com/suno-ai/bark
> [5] https://github.com/rhasspy/gruut
> [6] https://www.w3.org/TR/speech-synthesis11/
>
>


Suggestions for Text-To-Speech (TTS) from Org sources?

2023-09-09 Thread Jens Lechtenboerger
Dear all,

does someone here produce audio via Text-To-Speech (TTS) from Org
sources?  I plan to do that in the context of emacs-reveal to
generate voice-over for reveal.js presentations, with open questions
[1] concerning my initial, experimental approach.

Currently, I like the default model of Coqui-AI TTS [2] and
Microsoft SpeechT5 [3] best.  Any suggestions for free and open TTS
implementations that produce even better results?  Other models of
Coqui-AI?  The solution should work without GPU support, which seems
to rule out Suno Bark [4].

The above models do not pronounce numbers/digits, and they fail to
pronounce most acronyms.  In a preprocessing step I could replace
those.  I use preprocessing anyways to get rid of Org markup that
might confuse the language models.  Anyone here who did that
already?  Maybe gruut [5] in conjunction with SSML [6] handling?

Any other suggestions?

Best wishes
Jens

[1] https://gitlab.com/oer/emacs-reveal/-/issues/20
[2] https://github.com/coqui-ai/TTS/
[3] https://huggingface.co/microsoft/speecht5_tts
[4] https://github.com/suno-ai/bark
[5] https://github.com/rhasspy/gruut
[6] https://www.w3.org/TR/speech-synthesis11/