Re: Screen Reader Support in Audio Games

AudioGames . net Forum — General Game Discussion : philip_bennefall via Audiogames-reflector Sat, 28 Dec 2019 10:31:55 -0800

Thanks everyone for your great feedback! It's very valuable to me to get this information at an early stage in my development, as I can then make choices that will hopefully satisfy as many people as possible.

Now, to explain a little further what I have implemented thus far, I can say that I now know way more about the low level Sapi and OneCore interfaces than I ever cared to. I am basically writing a very lightweight, portable C library for text to speech output. However, it has a twist. The main focus is not on playing audio, only on generating it. In short, the audio does not get sent directly to the audio device - it gets sent to a memory buffer of raw samples. This means that the developer is then free to do any kind of digital signal processing analysis/processing on the resulting audio that they wish, and it allows them to make the speech an integral part of the game soundscape. Doing multiple speech channels is easy, including things such as applying different effects and HRTF positioning to different speech output streams. I can also get rid of latency in the beginning of the speech, so at this point my latency is much lower than NVDA and Jaws for most voices. I am not sure if NVDA's new speech refactoring effort will do something similar, but if it does, we will see a drastic drop in latency for voices other than ESpeak and Eloquence.

Also, the OneCore interface does actually enable me to change the length of pauses at the end of the speech as well as the pause that is inserted between punctuation. This was not available in the original interface, but can be easily queried at run-time so will work out of the box if you have a recent build of Windows 10.

As things currently stand, when I use the existing screen reader interfaces, I run into the following issues:
1. I cannot tell when the voice is speaking. This requires a lot of extra work in terms of design choices along the way.
2. I cannot do any kind of processing on the speech output such as a compressor/limiter, or even a volume change to make it fit into the game's over-all soundscape.
3. I don't know what language the voice is speaking.
4. I don't know how loudly the voice is speaking, so I cannot automatically adjust the volume of the game audio to fit the speech.
5. Generally latency will be higher, because the screen readers currently do not trim the speech.

But there are also some benefits that screen reader output has:
1. I may not have access to the voice that the user wants, as there are voice packages that only interface with a specific screen reader.
2. The user may have preexisting speech dictionaries that I don't have access to, improving pronunciation for the specific voice they've chosen.

I can easily solve the second point by allowing the user to have a dictionary for the game, which is literally just a few lines of code, but it of course requires the user to customize this for the specific game which is not always practical.

And of course, changing the pitch and the speech rate is something that games should allow if they make use of a lot of text to speech, so screen reader output offers no benefit in that regard as long as the game developer spends the time to implement these features. With the library I am developing, this will be very simple for all the speech engines that support it.

As for having fallback translations that are not provided by the game, I believe it would be a better option to simply make a translation based entirely on Google translate and have the game load that, than intercepting the text at run-time and attempting to translate it using the Internet based API. The game would of course have to make it possible for users to translate all the text, and this is exactly what I have implemented. So while I can see a strong use case for this approach in games that don't offer translations, its usefulness is strongly reduced if not eliminated altogether when you have a game that offers native translation support out of the box.

In summary, I will add support for screen readers to my library, as it is trivial to implement, but as for whether I will expose this in all of my future games I cannot be sure. it really depends on the type of game, and whether the limitations outlined above present enough of a problem for me. I will make a great effort to support the screen reader interfaces as far as is practical for each particular game, but there are definite tradeoffs that I have covered which make this far from a trivial decision. If the NVDA speech refactor improves the API, this will definitely encourage myself and I'm sure other developers as well to integrate it.

This ended up being a far longer post than I had intended, but I wanted to cover as many of your points as I could, and also outline my current thoughts.

Thanks again for taking the time to give feedback!

Kind regards,

Philip Bennefall

-- 
Audiogames-reflector mailing list
Audiogames-reflector@sabahattin-gucukoglu.com
https://sabahattin-gucukoglu.com/cgi-bin/mailman/listinfo/audiogames-reflector

Re: Screen Reader Support in Audio Games

Re: Screen Reader Support in Audio Games

Reply via email to