HI Luke, Will, and all: For what it's worth, I agree with the bulk of what's been said already. It will be fantastic to get some more sanity in the speech/audio arena.
As for the first item Will identifies as a 'proposal', namely relying on the TTS engine to return digital sound samples rather than doing the output itself, I think this is a great idea but I would just suggest looking carefully at the potential latency issues there. Also, key requirements of any speech/audio integration API(s) include the ability to know, at least roughly, two pieces of information: what is currently in the output queue and approximately how close to completion it is, and the ability to "sync up" and actually know, at some point in time, exactly what has been spoken. These are subtly different, in that the second one requires information about completion as opposed to "approximate progress". I think the second one implies at least some degree of interrupt capability in the audio output stream as well. Use cases include audio/voice synchronization, braille synchronization, and (perhaps more importantly), the ability to reliably break an utterance into pieces and restart output at a known point. As for moving away from Bonobo Activation (note; not the same as "Bonobo" in the broad sense), I think this makes sense. I also think moving away from the use of CORBA for gnome-speech IPC is a good idea; the speech APIs seem like excellent candidates for dBUS migration and we have very few, if any, platform bincompat guarantees to deal with as long as the consumers of the speech interfaces are kept in the loop. Best regards, Bill Willie Walker wrote: > Hi Luke: > > First of all, I say "Hear, hear!" The audio windmill is something > people have been charging at for a long time. Users who rely upon > speech synthesis working correctly and integrating well with the rest of > their environment are among those that need reliable audio support most > critically. > > I see two main proposals in the below: > > 1) Modify gnome-speech drivers to obtain samples from their > speech engines and then handle the audio playing themselves. > This is different from the current state where the > gnome-speech driver expects the speech engine to do all the > audio management. > > This sounds like an interesting proposal. I can tell you > for sure, though, that the current gnome-speech maintainer > has his hands full with other things (e.g., leading Orca). > So, the work would need to come from the community. > > 2) As part of #1, move to an API that is pervasive on the system. > The proposed API is GStreamer. > > Moving to a pervasive API is definitely very interesting, and > I would encourage looking at a large set of platforms: Linux > to Solaris, GNOME to KDE, etc. An API of recent interest is > Pulse Audio (https://wiki.ubuntu.com/PulseAudio), which might > be worth watching. I believe there might be many significant > improvements in the works for OSS as well. > > In the bigger scheme of things, however, there is discussion of > deprecating Bonobo. Bonobo is used by gnome-speech to activate > gnome-speech drivers. As such, one might consider alternatives to > gnome-speech. For example, SpeechDispatcher > (http://www.freebsoft.org/speechd) or TTSAPI > (http://www.freebsoft.org/tts-api-provider) might be something to > consider. They are not without issue, however. Some of the issues > include cumbersome configuration, reliability, etc. I believe that's > all solvable with work. The harder issue in my mind is that they will > introduce an external dependency for things like GNOME, and I've also > not looked at what their licensing scheme is. > > Will > _______________________________________________ Gnome-accessibility-devel mailing list [email protected] http://mail.gnome.org/mailman/listinfo/gnome-accessibility-devel
