Re: [PD] speech recognition and ethics

David Medine Sat, 07 Feb 2015 12:48:34 -0800

Just a quick disclaimer about the extern. It's little more than a Pdwrapper for the sphinx hello world example. The build environment works,though (and was a real pain to get right on Linux because of somefunction name conflicts between sphinx and Pd), so it's a good jumpingoff point for developing something more powerful.

Originally I wanted to make a C application that could do automatictraining so that people could do voice commands with high accuracy, butthis is a big project and I got distracted away from it.

One note about this is that voice recognizers typically want to beoptimized to correctly decode most voices most of the time, but onecould certainly train it to correctly decode a particular voice almostall of the time. This is another great advantage of sphinx: flexibility.


This extern doesn't build on Windows, by the way, sorry.


On 02/07/2015 11:55 AM, Jonathan Wilkes wrote:

Thanks, I didn't know there was a Sphinx external. It also looks likethe Sphinx website got a face-lift-- hopefully the software is alsomore approachable than the last time I looked.
-Jonathan
On Saturday, February 7, 2015 2:16 PM, david medine <dmed...@ucsd.edu>wrote:
One of the bad things about Google is that it is essentially a giantbillboard. Having said that, I am going to advertise a couple of things.
If you want a speech recognition API that doesn't rely on a tax-exemptcorporation that has more money than the nation of Russia, builds itsproducts in unsafe overseas sweatshops, charges you $99/year todevelop software for the device you already paid for, eagerly aids thefederal government in unconstitutional spying, or is in the process ofassimilating all of human culture, you might want to check CMU'sspeech recognition toolkit, Sphinx.
http://cmusphinx.sourceforge.net/
Another advantage of Sphinx is that it doesn't rely on internet accessto decode speech. And, someone even wrote a simple Pd extern with Sphinx.
https://github.com/dmedine/recog_tilde
And yes, it is quite difficult to train Sphinx. Building a dictionaryis copious work, and Google and Apple have done it 1000 better thananyone else because they have mountains of data and cash and luxurymodel machine learning algorithms. . . but no one ever said DIY was easy.
On 2/7/15 9:55 AM, Spencer Russell wrote:
I saw a really interesting talk last year by Johan Schalkwyk, the headof the Google speech recognition group. One of the points he made wasthat while Google's algorithms are important, they got a lot moreleverage from the sheer amount of data they have access to. It allowsthem to get away with much simpler algorithms. I think that's one ofthe biggest problems with trying to compete with Google and Apple onspeech recognition, because OSS developers just don't have access to ahuge corpus of data.Even though a lot of that data is unlabeled (they don't know what theactual words are that correspond to the audio), they have a hugeamount of interaction data, so they can for instance look at whetherthe user tried multiple times with a particular phrase or whether theuser accepted a given transcription.It seems like if we want an open-source speech recognition package weshould focus on finding ways to get an accessible shared corpus.Unless there was some tricky licensing I think that corpus would alsobenefit the big guys though, so their corpus would remain a propersuperset of what's available to OSS developers.
On Sat, Feb 7, 2015, at 11:39 AM, Jonathan Wilkes via Pd-list wrote:
Hi list,
Here's a fun thought-experiment: suppose you're doing a port of Pd,and the graphics toolkit you're using will include functionality tohook in to Google's speech recognition API. Such an API could makethe software accessible to people who would otherwise find it veryhard to write Pd patches.However, the API works by shipping off your audio data to Google'sservers, doing the computation on their machines, and sending youback the results.
Do you use the API in your port, or not?
I'm decidedly not going to use that API, for what I think are obvioussecurity, privacy, and philosophical reasons. But I'm curious justhow obvious the security and privacy implications are to othershere. How many people would use a speech-patching mechanism thatsends all your speech to Google?I'm also increasingly worried by the apparent gap between theusability of Google and Apple's products, and the seemingly glacialpace at which _usable_ free software speech recognition is beingdeveloped. My position won't change, but I'm afraid it's becomingmore symbolic than practical as these insecure tools become a naturalpart of most people's lives.
-Jonathan
_________________________________________________
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management ->http://lists.puredata.info/listinfo/pd-list
_______________________________________________
Pd-list@lists.iem.at  mailing list
UNSUBSCRIBE and account-management ->http://lists.puredata.info/listinfo/pd-list


_______________________________________________
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management ->http://lists.puredata.info/listinfo/pd-list

_______________________________________________
Pd-list@lists.iem.at mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list

Re: [PD] speech recognition and ethics

Reply via email to