subject:"Re\: Intent to ship\: Web Speech API \- Speech Recognition with Pocketsphinx"

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread Nick Alexander


On 2014-10-30, 4:18 PM, Andre Natal wrote:

I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx


First, Andre, let me offer my congratulations on getting this project to 
this point.  We've talked a few times and I've always been impressed.


Can you point me at Fennec try builds?  I vaguely recall that these 
speech recognition approaches require large pattern matching files, and 
I'd like to see what including the Speech API does to the Fennec APK 
size.  We're pushing pretty hard on reducing our APK size right now 
because we believe it's a big barrier to entry and especially to 
upgrading older devices.


Nick
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread smaug


On 10/31/2014 02:21 AM, smaug wrote:

Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)

I wouldn't ship the implementation in desktop FF without plenty of more testing.



But I guess the question is what people think about shipping the pocketspinx + 
API, even if disabled by default.

Andre, we need some numbers here. How much does Pocketsphinx increase binary 
size? or download size?
When the pref is enabled, how much does it use memory on desktop, what about on 
b2g?





-Olli


On 10/31/2014 01:18 AM, Andre Natal wrote:

I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx

The required patches for achieve this are:

  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
  - Embed english models. Bug 1065911 [4]
  - Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]


Also, other important features that we don't have patches yet:
  - Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
  - Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
  - Inlcude and build models for other languages [9]
  - Continuous and wordspotting recognition [10]

The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].

At this comment you can see a cpu usage on flame while recognition is
happening [14]

I wish to hear your comments.

Thanks,

Andre Natal

[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14





___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread smaug


Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)

I wouldn't ship the implementation in desktop FF without plenty of more testing.



-Olli


On 10/31/2014 01:18 AM, Andre Natal wrote:

I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx

The required patches for achieve this are:

  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
  - Embed english models. Bug 1065911 [4]
  - Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]


Also, other important features that we don't have patches yet:
  - Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
  - Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
  - Inlcude and build models for other languages [9]
  - Continuous and wordspotting recognition [10]

The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].

At this comment you can see a cpu usage on flame while recognition is
happening [14]

I wish to hear your comments.

Thanks,

Andre Natal

[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread Chris Hofmann


On 10/30/14 5:24 PM, smaug wrote:

On 10/31/2014 02:21 AM, smaug wrote:

Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)

I wouldn't ship the implementation in desktop FF without plenty of 
more testing.




But I guess the question is what people think about shipping the 
pocketspinx + API, even if disabled by default.


Andre, we need some numbers here. How much does Pocketsphinx increase 
binary size? or download size?
When the pref is enabled, how much does it use memory on desktop, what 
about on b2g?



This is important work and the competition is ramping quicky after many 
years of promises about this year being the year of voice recognition.  
We will probably fall behind quickly if we don't get something going 
here in the next year.


Can you also talk a bit about what the plan and set of challenges look 
like for expanding the supported languages, and how these would impact 
the numbers ollie has asked for?


The place we really need this is b2g, but phones are only shipping in 
international markets right now so english only is not all that helpful.


-chofmann




-Olli


On 10/31/2014 01:18 AM, Andre Natal wrote:
I've been researching speech recognition in Firefox for two years. 
First
SpeechRTC, then emscripten, and now Web Speech API with CMU 
pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to 
develop for

Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the 
management of

Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The 
preference to

enable it is: media.webspeech.service.default = pocketsphinx

The required patches for achieve this are:

  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
  - Embed english models. Bug 1065911 [4]
  - Change SpeechGrammarList to store grammars inside SpeechGrammar 
objects.

Bug 1088336 [5]
  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 
1051148 [6]



Also, other important features that we don't have patches yet:
  - Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
  - Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
  - Inlcude and build models for other languages [9]
  - Continuous and wordspotting recognition [10]

The wip repo is here [11] and this Air Mozilla video [12] plus this 
wiki

has more detailed info [13].

At this comment you can see a cpu usage on flame while recognition is
happening [14]

I wish to hear your comments.

Thanks,

Andre Natal

[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] 
https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump

to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14





___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread Mark Hammond


On 31/10/2014 11:45 AM, Chris Hofmann wrote:

The place we really need this is b2g, but phones are only shipping in
international markets right now so english only is not all that helpful.


While this doesn't change the point you are making in any way, FWIW, 
Firefox OS phones are on sale in Australia via one of our largest 
electronics retailers:


https://www.jbhifi.com.au/phones/Outright-Mobile-Handsets/zte/zte-open-c-handset-grey/624980/

http://www.gizmodo.com.au/2014/10/jb-hi-fi-is-now-selling-australias-first-firefox-os-phone/

Nice!

Mark

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread Marco Chen

Hi Andre, 

It is a nice work and expect the voice recognition on B2G. 

Beside this final result, I am also interesting in the reason of you migrate 
from SpeechRTC -> emscripten -> Web Speech API. 
Could you also share what is the factor triggered these transition? Then that 
can be the lesson learn for us. 

ex: SpeechRTC -> voice recognition can't be performed on local. 
emscripten -> performance issue? or license issue? or ? 

Thanks, 
Sincerely yours. 

- Original Message -

From: "Andre Natal"  
To: dev-platform@lists.mozilla.org, "Sandip Kamat" , 
"Olli.Pettay"  
Sent: Friday, October 31, 2014 7:18:06 AM 
Subject: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx 

I've been researching speech recognition in Firefox for two years. First 
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx 
[1] embedded in Gecko C++ layer, project that I had the luck to develop for 
Google Summer of Code with the mentoring of Olli Pettay, Guilherme 
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of 
Sandip Kamat. 

The implementation already works in B2G, Fennec and all FF desktop 
versions, and the first language supported will be english. The API and 
implementation are in conformity with W3C standard [2]. The preference to 
enable it is: media.webspeech.service.default = pocketsphinx 

The required patches for achieve this are: 

- Import pocketsphinx sources in Gecko. Bug 1051146 [3] 
- Embed english models. Bug 1065911 [4] 
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects. 
Bug 1088336 [5] 
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] 


Also, other important features that we don't have patches yet: 
- Relax VAD strategy to be les strict and avoid stop in the middle of 
speech when speaking low volume phonemes [7] 
- Integrate or develop a grapheme to phoneme algorithm to realtime 
generator when compiling grammars [8] 
- Inlcude and build models for other languages [9] 
- Continuous and wordspotting recognition [10] 

The wip repo is here [11] and this Air Mozilla video [12] plus this wiki 
has more detailed info [13]. 

At this comment you can see a cpu usage on flame while recognition is 
happening [14] 

I wish to hear your comments. 

Thanks, 

Andre Natal 

[1] http://cmusphinx.sourceforge.net/ 
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html 
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and 
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 
[11] https://github.com/andrenatal/gecko-dev 
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump 
to 12:00) 
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web 
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 
___ 
dev-platform mailing list 
dev-platform@lists.mozilla.org 
https://lists.mozilla.org/listinfo/dev-platform 

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-03 Thread Chris Mills

Awesome to see this mail, Andre!

And remember that we do have the pages set up on MDN ready to be filled in also.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

Once this is shipped, do you think we can find some time to start collaborating 
on these docs?

Chris Mills
   Senior tech writer || Mozilla
developer.mozilla.org || MDN
   cmi...@mozilla.com || @chrisdavidmills



> On 31 Oct 2014, at 02:27, Marco Chen  wrote:
> 
> Hi Andre, 
> 
> It is a nice work and expect the voice recognition on B2G. 
> 
> Beside this final result, I am also interesting in the reason of you migrate 
> from SpeechRTC -> emscripten -> Web Speech API. 
> Could you also share what is the factor triggered these transition? Then that 
> can be the lesson learn for us. 
> 
> ex: SpeechRTC -> voice recognition can't be performed on local. 
> emscripten -> performance issue? or license issue? or ? 
> 
> Thanks, 
> Sincerely yours. 
> 
> - Original Message -
> 
> From: "Andre Natal"  
> To: dev-platform@lists.mozilla.org, "Sandip Kamat" , 
> "Olli.Pettay"  
> Sent: Friday, October 31, 2014 7:18:06 AM 
> Subject: Intent to ship: Web Speech API - Speech Recognition with 
> Pocketsphinx 
> 
> I've been researching speech recognition in Firefox for two years. First 
> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx 
> [1] embedded in Gecko C++ layer, project that I had the luck to develop for 
> Google Summer of Code with the mentoring of Olli Pettay, Guilherme 
> Gonçalves, Steven Lee, Randell Jesup plus others and with the management of 
> Sandip Kamat. 
> 
> The implementation already works in B2G, Fennec and all FF desktop 
> versions, and the first language supported will be english. The API and 
> implementation are in conformity with W3C standard [2]. The preference to 
> enable it is: media.webspeech.service.default = pocketsphinx 
> 
> The required patches for achieve this are: 
> 
> - Import pocketsphinx sources in Gecko. Bug 1051146 [3] 
> - Embed english models. Bug 1065911 [4] 
> - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. 
> Bug 1088336 [5] 
> - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] 
> 
> 
> Also, other important features that we don't have patches yet: 
> - Relax VAD strategy to be les strict and avoid stop in the middle of 
> speech when speaking low volume phonemes [7] 
> - Integrate or develop a grapheme to phoneme algorithm to realtime 
> generator when compiling grammars [8] 
> - Inlcude and build models for other languages [9] 
> - Continuous and wordspotting recognition [10] 
> 
> The wip repo is here [11] and this Air Mozilla video [12] plus this wiki 
> has more detailed info [13]. 
> 
> At this comment you can see a cpu usage on flame while recognition is 
> happening [14] 
> 
> I wish to hear your comments. 
> 
> Thanks, 
> 
> Andre Natal 
> 
> [1] http://cmusphinx.sourceforge.net/ 
> [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html 
> [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 
> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 
> [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 
> [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 
> [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 
> [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 
> [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 
> [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 
> [11] https://github.com/andrenatal/gecko-dev 
> [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump 
> to 12:00) 
> [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web 
> [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 
> ___ 
> dev-platform mailing list 
> dev-platform@lists.mozilla.org 
> https://lists.mozilla.org/listinfo/dev-platform 
> 
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-08 Thread Andre Natal

Thanks Nick, I appreciate your help.

I created two versions of Fennec apk: one [1] with the english models
bundled (43.7 mb), and other [2] without it (34.6mb).  This was the
mozconfig I used [3]

Actually, I had a conversation with Jonas Sicking some months ago and we
agreed that the ideal scenario about this is to allow the user to download
the package for the language he prefer from some sort of preferences
screen, instead ship them bundled into the apk.


[1]
https://www.dropbox.com/s/6snv6e3mqqcs4zi/fennec-34.0a1.en-US.android-arm.apk?dl=0
[2]
https://www.dropbox.com/s/zxxop34unj21r1s/fennec-35.0a1.en-US.android-arm.apk?dl=0
[3]
#DEBUG
#ac_add_options --enable-debug
#ac_add_options --enable-trace-malloc
#ac_add_options --enable-accessibility
#ac_add_options --enable-signmar
ac_add_options --disable-tests

# android options
ac_add_options --enable-application=mobile/android
ac_add_options --with-android-ndk="/Volumes/extra/android-ndk-r8e/"
ac_add_options
--with-android-sdk="/Volumes/extra/android-sdk-macosx/platforms/android-19/"

# FOR ARM
ac_add_options --target=arm-linux-androideabi
mk_add_options MOZ_OBJDIR=./obj-arm-linux-androideabi-debug


# FOR 386
#ac_add_options --target=i386-linux-android
#mk_add_options MOZ_OBJDIR=./objdir-droid-i386

On Thu, Oct 30, 2014 at 9:36 PM, Nick Alexander 
wrote:

> On 2014-10-30, 4:18 PM, Andre Natal wrote:
>
>> I've been researching speech recognition in Firefox for two years. First
>> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
>> [1] embedded in Gecko C++ layer, project that I had the luck to develop
>> for
>> Google Summer of Code with the mentoring of Olli Pettay, Guilherme
>> Gonçalves, Steven Lee, Randell Jesup plus others and with the management
>> of
>> Sandip Kamat.
>>
>> The implementation already works in B2G, Fennec and all FF desktop
>> versions, and the first language supported will be english. The API and
>> implementation are in conformity with W3C standard [2]. The preference to
>> enable it is: media.webspeech.service.default = pocketsphinx
>>
>
> First, Andre, let me offer my congratulations on getting this project to
> this point.  We've talked a few times and I've always been impressed.
>
> Can you point me at Fennec try builds?  I vaguely recall that these speech
> recognition approaches require large pattern matching files, and I'd like
> to see what including the Speech API does to the Fennec APK size.  We're
> pushing pretty hard on reducing our APK size right now because we believe
> it's a big barrier to entry and especially to upgrading older devices.
>
> Nick
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-08 Thread Andre Natal

Hi Olli,


> How much does Pocketsphinx increase binary size? or download size?

In the past was suggested to avoid ship the models with packages, but yes
to create a preferences panel in the apps to allow the user to download the
models he wants to.

About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3
mb [1]. I don't know which type of compression the build system does when
compiling/packaging, but should be efficient enough.

[1]
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libsphinxbase.a
2184 -rw-r--r--  1 root  admin  1114840 Jul  7 14:39
/usr/local/lib/libsphinxbase.a
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libpocketsphinx.a
2352 -rw-r--r--  1 root  admin  1201240 Jul  7 14:52
/usr/local/lib/libpocketsphinx.a



When the pref is enabled, how much does it use memory on desktop, what
> about on b2g?
>
>
>
On b2g, it uses memory only after the decoder be activated and loaded the
models. I did a profile in Zte Open C and here is the report [2] and here
the exact snapshot [3]. Seems ~ 21 mb is used after load the models.

In desktop mac os Nightly, the memory usage was of ~11mb.

[2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
[3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0






>
>
>>
>> -Olli
>>
>>
>> On 10/31/2014 01:18 AM, Andre Natal wrote:
>>
>>> I've been researching speech recognition in Firefox for two years. First
>>> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
>>> [1] embedded in Gecko C++ layer, project that I had the luck to develop
>>> for
>>> Google Summer of Code with the mentoring of Olli Pettay, Guilherme
>>> Gonçalves, Steven Lee, Randell Jesup plus others and with the management
>>> of
>>> Sandip Kamat.
>>>
>>> The implementation already works in B2G, Fennec and all FF desktop
>>> versions, and the first language supported will be english. The API and
>>> implementation are in conformity with W3C standard [2]. The preference to
>>> enable it is: media.webspeech.service.default = pocketsphinx
>>>
>>> The required patches for achieve this are:
>>>
>>>   - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
>>>   - Embed english models. Bug 1065911 [4]
>>>   - Change SpeechGrammarList to store grammars inside SpeechGrammar
>>> objects.
>>> Bug 1088336 [5]
>>>   - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
>>> [6]
>>>
>>>
>>> Also, other important features that we don't have patches yet:
>>>   - Relax VAD strategy to be les strict and avoid stop in the middle of
>>> speech when speaking low volume phonemes [7]
>>>   - Integrate or develop a grapheme to phoneme algorithm to realtime
>>> generator when compiling grammars [8]
>>>   - Inlcude and build models for other languages [9]
>>>   - Continuous and wordspotting recognition [10]
>>>
>>> The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
>>> has more detailed info [13].
>>>
>>> At this comment you can see a cpu usage on flame while recognition is
>>> happening [14]
>>>
>>> I wish to hear your comments.
>>>
>>> Thanks,
>>>
>>> Andre Natal
>>>
>>> [1] http://cmusphinx.sourceforge.net/
>>> [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
>>> [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
>>> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
>>> [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
>>> [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
>>> [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
>>> [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
>>> [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
>>> https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
>>> [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
>>> [11] https://github.com/andrenatal/gecko-dev
>>> [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
>>> (Jump
>>> to 12:00)
>>> [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
>>> [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
>>>
>>>
>>
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-08 Thread Andre Natal

Hi Chris.

For new languages, after the decoder get integrated inside gecko, you only
need to build new models (acoustic and language), since the decoder is
language agnostic.

The procedure of model building is the same for every language: in pretty
big picture, you need to record thousands of hours of spoken phrases
covering all phones of the aimed language from people of different genders
age, regions, accents and etc... all this data is compiled and transformed
in the acoustic model.

For the language model, you need to build a phonetic dictionary for that
language, to then allow tools that do grapheme-to-phoneme (like
phonetisaurus [1], e.g.) generate real-time phonetic representations of the
words input in your grammar.

Build models it is not a trivial task, and requires a closer work between
speech engineers and linguists.

Pocketsphinx offers some models besides English [2]  and they have useful
tutorials about acoustic [3] and language [4] model creation.

Thanks,

Andre

[1] https://code.google.com/p/phonetisaurus/
[2]
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
[3] http://cmusphinx.sourceforge.net/wiki/tutorialam?s[]=acoustic&s[]=models
[4] http://cmusphinx.sourceforge.net/wiki/tutoriallm

On Thu, Oct 30, 2014 at 10:45 PM, Chris Hofmann 
wrote:

> On 10/30/14 5:24 PM, smaug wrote:
>
>> On 10/31/2014 02:21 AM, smaug wrote:
>>
>>> Intent to ship is too strong for this.
>>> We need to first have implementation landed and tested ;)
>>>
>>> I wouldn't ship the implementation in desktop FF without plenty of more
>>> testing.
>>>
>>>
>> But I guess the question is what people think about shipping the
>> pocketspinx + API, even if disabled by default.
>>
>> Andre, we need some numbers here. How much does Pocketsphinx increase
>> binary size? or download size?
>> When the pref is enabled, how much does it use memory on desktop, what
>> about on b2g?
>>
>>
>>  This is important work and the competition is ramping quicky after many
> years of promises about this year being the year of voice recognition.  We
> will probably fall behind quickly if we don't get something going here in
> the next year.
>
> Can you also talk a bit about what the plan and set of challenges look
> like for expanding the supported languages, and how these would impact the
> numbers ollie has asked for?
>
> The place we really need this is b2g, but phones are only shipping in
> international markets right now so english only is not all that helpful.
>
> -chofmann
>
>
>>>
>>> -Olli
>>>
>>>
>>> On 10/31/2014 01:18 AM, Andre Natal wrote:
>>>
 I've been researching speech recognition in Firefox for two years. First
 SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
 [1] embedded in Gecko C++ layer, project that I had the luck to develop
 for
 Google Summer of Code with the mentoring of Olli Pettay, Guilherme
 Gonçalves, Steven Lee, Randell Jesup plus others and with the
 management of
 Sandip Kamat.

 The implementation already works in B2G, Fennec and all FF desktop
 versions, and the first language supported will be english. The API and
 implementation are in conformity with W3C standard [2]. The preference
 to
 enable it is: media.webspeech.service.default = pocketsphinx

 The required patches for achieve this are:

   - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
   - Embed english models. Bug 1065911 [4]
   - Change SpeechGrammarList to store grammars inside SpeechGrammar
 objects.
 Bug 1088336 [5]
   - Creation of a SpeechRecognitionService for Pocketsphinx. Bug
 1051148 [6]

 Also, other important features that we don't have patches yet:
   - Relax VAD strategy to be les strict and avoid stop in the middle of
 speech when speaking low volume phonemes [7]
   - Integrate or develop a grapheme to phoneme algorithm to realtime
 generator when compiling grammars [8]
   - Inlcude and build models for other languages [9]
   - Continuous and wordspotting recognition [10]

 The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
 has more detailed info [13].

 At this comment you can see a cpu usage on flame while recognition is
 happening [14]

 I wish to hear your comments.

 Thanks,

 Andre Natal

 [1] http://cmusphinx.sourceforge.net/
 [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
 https://bugzilla.m

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-08 Thread Andre Natal

Thank you Chris, sure we can do it!

Here we have a straightforward page with all objects and methods for the
Speech API we are aiming to do:

https://github.com/andrenatal/webspeechapi/blob/gh-pages/index_clean.html

Maybe we can start from it.

Thanks!

Andre


On Mon, Nov 3, 2014 at 9:58 AM, Chris Mills  wrote:

> Awesome to see this mail, Andre!
>
> And remember that we do have the pages set up on MDN ready to be filled in
> also.
>
> https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
>
> Once this is shipped, do you think we can find some time to start
> collaborating on these docs?
>
> Chris Mills
>Senior tech writer || Mozilla
> developer.mozilla.org || MDN
>cmi...@mozilla.com || @chrisdavidmills
>
>
>
> > On 31 Oct 2014, at 02:27, Marco Chen  wrote:
> >
> > Hi Andre,
> >
> > It is a nice work and expect the voice recognition on B2G.
> >
> > Beside this final result, I am also interesting in the reason of you
> migrate from SpeechRTC -> emscripten -> Web Speech API.
> > Could you also share what is the factor triggered these transition? Then
> that can be the lesson learn for us.
> >
> > ex: SpeechRTC -> voice recognition can't be performed on local.
> > emscripten -> performance issue? or license issue? or ?
> >
> > Thanks,
> > Sincerely yours.
> >
> > - Original Message -
> >
> > From: "Andre Natal" 
> > To: dev-platform@lists.mozilla.org, "Sandip Kamat" ,
> "Olli.Pettay" 
> > Sent: Friday, October 31, 2014 7:18:06 AM
> > Subject: Intent to ship: Web Speech API - Speech Recognition with
> Pocketsphinx
> >
> > I've been researching speech recognition in Firefox for two years. First
> > SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
> > [1] embedded in Gecko C++ layer, project that I had the luck to develop
> for
> > Google Summer of Code with the mentoring of Olli Pettay, Guilherme
> > Gonçalves, Steven Lee, Randell Jesup plus others and with the management
> of
> > Sandip Kamat.
> >
> > The implementation already works in B2G, Fennec and all FF desktop
> > versions, and the first language supported will be english. The API and
> > implementation are in conformity with W3C standard [2]. The preference to
> > enable it is: media.webspeech.service.default = pocketsphinx
> >
> > The required patches for achieve this are:
> >
> > - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
> > - Embed english models. Bug 1065911 [4]
> > - Change SpeechGrammarList to store grammars inside SpeechGrammar
> objects.
> > Bug 1088336 [5]
> > - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
> [6]
> >
> >
> > Also, other important features that we don't have patches yet:
> > - Relax VAD strategy to be les strict and avoid stop in the middle of
> > speech when speaking low volume phonemes [7]
> > - Integrate or develop a grapheme to phoneme algorithm to realtime
> > generator when compiling grammars [8]
> > - Inlcude and build models for other languages [9]
> > - Continuous and wordspotting recognition [10]
> >
> > The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
> > has more detailed info [13].
> >
> > At this comment you can see a cpu usage on flame while recognition is
> > happening [14]
> >
> > I wish to hear your comments.
> >
> > Thanks,
> >
> > Andre Natal
> >
> > [1] http://cmusphinx.sourceforge.net/
> > [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
> > [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
> > [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
> > [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
> > [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
> > [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
> > [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
> > [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
> > https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
> > [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
> > [11] https://github.com/andrenatal/gecko-dev
> > [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
> (Jump
> > to 12:00)
> > [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
> > [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
> > ___
> > dev-platform mailing list
> > dev-platform@lists.mozilla.org
> > https://lists.mozilla.org/listinfo/dev-platform
> >
> > ___
> > dev-platform mailing list
> > dev-platform@lists.mozilla.org
> > https://lists.mozilla.org/listinfo/dev-platform
>
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-09 Thread Andre Natal

Hi Marco.

SpeechRTC was my first tentative with the platform. At early 2013 neither I
had enough knowledge about gecko internals as even b2g was at very early
stage (in the very beggining, Steven Lee needed to send me patches to gum
work properly), so the fastest path was capture and stream online. The
great part is that opus is pretty efficient plus nodejs + a speech server
wrapping pocketsphinx turned the whole roundtrip really fast.

But I knew that was not ideal for command and control / grammar, then I
started to research a direct port of pocketsphinx using emscripten. Did
work but three reasons made me move to a full cpp version:

1) the whole speech api frontend in gecko was ready to roll only waiting a
backend, and this, as we know was built in cpp;

2) my tests ran very well, but on peak [2] for example, performed slower
than on low end devices running android [3]

3) with emscripten, the model loading inside decoder's creation at each
reload ended very slow and I couldn't figure out how to keep the decoder
instance between tabs and reloads while in cpp this happens only once, due
Gecko's architecture
On Oct 31, 2014 12:27 AM, "Marco Chen"  wrote:

> Hi Andre,
>
> It is a nice work and expect the voice recognition on B2G.
>
> Beside this final result, I am also interesting in the reason of you
> migrate from SpeechRTC -> emscripten -> Web Speech API.
> Could you also share what is the factor triggered these transition? Then
> that can be the lesson learn for us.
>
> ex: SpeechRTC -> voice recognition can't be performed on local.
>  emscripten -> performance issue? or license issue? or ?
>
> Thanks,
> Sincerely yours.
>
> --
> *From: *"Andre Natal" 
> *To: *dev-platform@lists.mozilla.org, "Sandip Kamat" ,
> "Olli.Pettay" 
> *Sent: *Friday, October 31, 2014 7:18:06 AM
> *Subject: *Intent to ship: Web Speech API - Speech Recognition with
> Pocketsphinx
>
> I've been researching speech recognition in Firefox for two years. First
> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
> [1] embedded in Gecko C++ layer, project that I had the luck to develop for
> Google Summer of Code with the mentoring of Olli Pettay, Guilherme
> Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
> Sandip Kamat.
>
> The implementation already works in B2G, Fennec and all FF desktop
> versions, and the first language supported will be english. The API and
> implementation are in conformity with W3C standard [2]. The preference to
> enable it is: media.webspeech.service.default = pocketsphinx
>
> The required patches for achieve this are:
>
>  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
>  - Embed english models. Bug 1065911 [4]
>  - Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
> Bug 1088336 [5]
>  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]
>
>
> Also, other important features that we don't have patches yet:
>  - Relax VAD strategy to be les strict and avoid stop in the middle of
> speech when speaking low volume phonemes [7]
>  - Integrate or develop a grapheme to phoneme algorithm to realtime
> generator when compiling grammars [8]
>  - Inlcude and build models for other languages [9]
>  - Continuous and wordspotting recognition [10]
>
> The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
> has more detailed info [13].
>
> At this comment you can see a cpu usage on flame while recognition is
> happening [14]
>
> I wish to hear your comments.
>
> Thanks,
>
> Andre Natal
>
> [1] http://cmusphinx.sourceforge.net/
> [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
> [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
> [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
> [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
> [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
> [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
> [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
> https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
> [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
> [11] https://github.com/andrenatal/gecko-dev
> [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
> (Jump
> to 12:00)
> [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
> [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-09 Thread Andre Natal

Sorry, I forgot the links:

2 - Speechrtc offline on Firefox OS (Peak): http://youtu.be/FXKXhrRDEb8

3 - Continuous speech recognition on android with poc…:
http://youtu.be/3lTtCFaQF2A
 On Nov 9, 2014 11:12 AM, "Andre Natal"  wrote:

> Hi Marco.
>
> SpeechRTC was my first tentative with the platform. At early 2013 neither
> I had enough knowledge about gecko internals as even b2g was at very early
> stage (in the very beggining, Steven Lee needed to send me patches to gum
> work properly), so the fastest path was capture and stream online. The
> great part is that opus is pretty efficient plus nodejs + a speech server
> wrapping pocketsphinx turned the whole roundtrip really fast.
>
> But I knew that was not ideal for command and control / grammar, then I
> started to research a direct port of pocketsphinx using emscripten. Did
> work but three reasons made me move to a full cpp version:
>
> 1) the whole speech api frontend in gecko was ready to roll only waiting a
> backend, and this, as we know was built in cpp;
>
> 2) my tests ran very well, but on peak [2] for example, performed slower
> than on low end devices running android [3]
>
> 3) with emscripten, the model loading inside decoder's creation at each
> reload ended very slow and I couldn't figure out how to keep the decoder
> instance between tabs and reloads while in cpp this happens only once, due
> Gecko's architecture
> On Oct 31, 2014 12:27 AM, "Marco Chen"  wrote:
>
>> Hi Andre,
>>
>> It is a nice work and expect the voice recognition on B2G.
>>
>> Beside this final result, I am also interesting in the reason of you
>> migrate from SpeechRTC -> emscripten -> Web Speech API.
>> Could you also share what is the factor triggered these transition? Then
>> that can be the lesson learn for us.
>>
>> ex: SpeechRTC -> voice recognition can't be performed on local.
>>  emscripten -> performance issue? or license issue? or ?
>>
>> Thanks,
>> Sincerely yours.
>>
>> --
>> *From: *"Andre Natal" 
>> *To: *dev-platform@lists.mozilla.org, "Sandip Kamat" ,
>> "Olli.Pettay" 
>> *Sent: *Friday, October 31, 2014 7:18:06 AM
>> *Subject: *Intent to ship: Web Speech API - Speech Recognition with
>> Pocketsphinx
>>
>> I've been researching speech recognition in Firefox for two years. First
>> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
>> [1] embedded in Gecko C++ layer, project that I had the luck to develop
>> for
>> Google Summer of Code with the mentoring of Olli Pettay, Guilherme
>> Gonçalves, Steven Lee, Randell Jesup plus others and with the management
>> of
>> Sandip Kamat.
>>
>> The implementation already works in B2G, Fennec and all FF desktop
>> versions, and the first language supported will be english. The API and
>> implementation are in conformity with W3C standard [2]. The preference to
>> enable it is: media.webspeech.service.default = pocketsphinx
>>
>> The required patches for achieve this are:
>>
>>  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
>>  - Embed english models. Bug 1065911 [4]
>>  - Change SpeechGrammarList to store grammars inside SpeechGrammar
>> objects.
>> Bug 1088336 [5]
>>  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
>> [6]
>>
>>
>> Also, other important features that we don't have patches yet:
>>  - Relax VAD strategy to be les strict and avoid stop in the middle of
>> speech when speaking low volume phonemes [7]
>>  - Integrate or develop a grapheme to phoneme algorithm to realtime
>> generator when compiling grammars [8]
>>  - Inlcude and build models for other languages [9]
>>  - Continuous and wordspotting recognition [10]
>>
>> The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
>> has more detailed info [13].
>>
>> At this comment you can see a cpu usage on flame while recognition is
>> happening [14]
>>
>> I wish to hear your comments.
>>
>> Thanks,
>>
>> Andre Natal
>>
>> [1] http://cmusphinx.sourceforge.net/
>> [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
>> [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
>> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
>> [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
>> [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
>> [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
>> [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
>> [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
>> [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
>> [11] https://github.com/andrenatal/gecko-dev
>> [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
>> (Jump
>> to 12:00)
>> [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
>> [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
>> ___
>> dev-platform mailing list
>> dev-platform@lists.mozilla.org
>> http

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-14 Thread Sandip Kamat

Hi Andre, I suggest let's update the wiki for these sizes (as well as other 
questions in this thread) so we can use that as a central place of info. 

-Sandip 

- Original Message -

> From: "Andre Natal" 
> To: "smaug" 
> Cc: "Sandip Kamat" , dev-platform@lists.mozilla.org
> Sent: Saturday, November 8, 2014 8:50:44 PM
> Subject: Re: Intent to ship: Web Speech API - Speech Recognition with
> Pocketsphinx

> Hi Olli,

> > How much does Pocketsphinx increase binary size? or download size?

> In the past was suggested to avoid ship the models with packages, but yes to
> create a preferences panel in the apps to allow the user to download the
> models he wants to.

> About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb
> [1]. I don't know which type of compression the build system does when
> compiling/packaging, but should be efficient enough.

> [1]
> MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
> /usr/local/lib/libsphinxbase.a
> 2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39
> /usr/local/lib/libsphinxbase.a
> MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
> /usr/local/lib/libpocketsphinx.a
> 2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52
> /usr/local/lib/libpocketsphinx.a

> > When the pref is enabled, how much does it use memory on desktop, what
> > about
> > on b2g?
> 

> On b2g, it uses memory only after the decoder be activated and loaded the
> models. I did a profile in Zte Open C and here is the report [2] and here
> the exact snapshot [3]. Seems ~ 21 mb is used after load the models.

> In desktop mac os Nightly, the memory usage was of ~11mb.

> [2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
> [3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0

> > > -Olli
> > 
> 

> > > On 10/31/2014 01:18 AM, Andre Natal wrote:
> > 
> 

> > > > I've been researching speech recognition in Firefox for two years.
> > > > First
> > > 
> > 
> 
> > > > SpeechRTC, then emscripten, and now Web Speech API with CMU
> > > > pocketsphinx
> > > 
> > 
> 
> > > > [1] embedded in Gecko C++ layer, project that I had the luck to develop
> > > > for
> > > 
> > 
> 
> > > > Google Summer of Code with the mentoring of Olli Pettay, Guilherme
> > > 
> > 
> 
> > > > Gonçalves, Steven Lee, Randell Jesup plus others and with the
> > > > management
> > > > of
> > > 
> > 
> 
> > > > Sandip Kamat.
> > > 
> > 
> 

> > > > The implementation already works in B2G, Fennec and all FF desktop
> > > 
> > 
> 
> > > > versions, and the first language supported will be english. The API and
> > > 
> > 
> 
> > > > implementation are in conformity with W3C standard [2]. The preference
> > > > to
> > > 
> > 
> 
> > > > enable it is: media.webspeech.service. default = pocketsphinx
> > > 
> > 
> 

> > > > The required patches for achieve this are:
> > > 
> > 
> 

> > > > - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
> > > 
> > 
> 
> > > > - Embed english models. Bug 1065911 [4]
> > > 
> > 
> 
> > > > - Change SpeechGrammarList to store grammars inside SpeechGrammar
> > > > objects.
> > > 
> > 
> 
> > > > Bug 1088336 [5]
> > > 
> > 
> 
> > > > - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
> > > > [6]
> > > 
> > 
> 

> > > > Also, other important features that we don't have patches yet:
> > > 
> > 
> 
> > > > - Relax VAD strategy to be les strict and avoid stop in the middle of
> > > 
> > 
> 
> > > > speech when speaking low volume phonemes [7]
> > > 
> > 
> 
> > > > - Integrate or develop a grapheme to phoneme algorithm to realtime
> > > 
> > 
> 
> > > > generator when compiling grammars [8]
> > > 
> > 
> 
> > > > - Inlcude and build models for other languages [9]
> > > 
> > 
> 
> > > > - Continuous and wordspotting recognition [10]
> > > 
> > 
> 

> > > > The wip repo is here [11] and this Air Mozilla video [12] plus this
> > > > wiki
> > > 
> > 
> 
> > > > has more detailed info [13].
> > > 
&g

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-14 Thread Sandip Kamat

Hi Olli, In general for FxOS devices, the thought is to let the OEMs decide 
which language models they would like to ship with, preloaded. That way there 
is a partner choice based on regions, but also the users could directly 
download the packages they like. For now, since we are very early stage, we 
just have English support. We need help to build and test other language models 
in parallel. 

Sandip 

- Original Message -

> From: "Andre Natal" 
> To: "smaug" 
> Cc: "Sandip Kamat" , dev-platform@lists.mozilla.org
> Sent: Saturday, November 8, 2014 8:50:44 PM
> Subject: Re: Intent to ship: Web Speech API - Speech Recognition with
> Pocketsphinx

> Hi Olli,

> > How much does Pocketsphinx increase binary size? or download size?

> In the past was suggested to avoid ship the models with packages, but yes to
> create a preferences panel in the apps to allow the user to download the
> models he wants to.

> About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb
> [1]. I don't know which type of compression the build system does when
> compiling/packaging, but should be efficient enough.

> [1]
> MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
> /usr/local/lib/libsphinxbase.a
> 2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39
> /usr/local/lib/libsphinxbase.a
> MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
> /usr/local/lib/libpocketsphinx.a
> 2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52
> /usr/local/lib/libpocketsphinx.a

> > When the pref is enabled, how much does it use memory on desktop, what
> > about
> > on b2g?
> 

> On b2g, it uses memory only after the decoder be activated and loaded the
> models. I did a profile in Zte Open C and here is the report [2] and here
> the exact snapshot [3]. Seems ~ 21 mb is used after load the models.

> In desktop mac os Nightly, the memory usage was of ~11mb.

> [2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
> [3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0

> > > -Olli
> > 
> 

> > > On 10/31/2014 01:18 AM, Andre Natal wrote:
> > 
> 

> > > > I've been researching speech recognition in Firefox for two years.
> > > > First
> > > 
> > 
> 
> > > > SpeechRTC, then emscripten, and now Web Speech API with CMU
> > > > pocketsphinx
> > > 
> > 
> 
> > > > [1] embedded in Gecko C++ layer, project that I had the luck to develop
> > > > for
> > > 
> > 
> 
> > > > Google Summer of Code with the mentoring of Olli Pettay, Guilherme
> > > 
> > 
> 
> > > > Gonçalves, Steven Lee, Randell Jesup plus others and with the
> > > > management
> > > > of
> > > 
> > 
> 
> > > > Sandip Kamat.
> > > 
> > 
> 

> > > > The implementation already works in B2G, Fennec and all FF desktop
> > > 
> > 
> 
> > > > versions, and the first language supported will be english. The API and
> > > 
> > 
> 
> > > > implementation are in conformity with W3C standard [2]. The preference
> > > > to
> > > 
> > 
> 
> > > > enable it is: media.webspeech.service. default = pocketsphinx
> > > 
> > 
> 

> > > > The required patches for achieve this are:
> > > 
> > 
> 

> > > > - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
> > > 
> > 
> 
> > > > - Embed english models. Bug 1065911 [4]
> > > 
> > 
> 
> > > > - Change SpeechGrammarList to store grammars inside SpeechGrammar
> > > > objects.
> > > 
> > 
> 
> > > > Bug 1088336 [5]
> > > 
> > 
> 
> > > > - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
> > > > [6]
> > > 
> > 
> 

> > > > Also, other important features that we don't have patches yet:
> > > 
> > 
> 
> > > > - Relax VAD strategy to be les strict and avoid stop in the middle of
> > > 
> > 
> 
> > > > speech when speaking low volume phonemes [7]
> > > 
> > 
> 
> > > > - Integrate or develop a grapheme to phoneme algorithm to realtime
> > > 
> > 
> 
> > > > generator when compiling grammars [8]
> > > 
> > 
> 
> > > > - Inlcude and build models for other languages [9]
> > > 
> > 
> 
> > > > - Continuous and wordspotting recognition [10]
> >

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-18 Thread Andre Natal

Chris,

I was discussing with sphinx leaders and we can build models from
audiobooks as well.

This approach saves a lot of time and enhances the quality since the
narrative is well accurate and clear.

We are currently defining a way to create hindi and brazilian portuguese
models.

Thanks

Andre
On Oct 30, 2014 5:47 PM, "Chris Hofmann"  wrote:

> On 10/30/14 5:24 PM, smaug wrote:
>
>> On 10/31/2014 02:21 AM, smaug wrote:
>>
>>> Intent to ship is too strong for this.
>>> We need to first have implementation landed and tested ;)
>>>
>>> I wouldn't ship the implementation in desktop FF without plenty of more
>>> testing.
>>>
>>>
>> But I guess the question is what people think about shipping the
>> pocketspinx + API, even if disabled by default.
>>
>> Andre, we need some numbers here. How much does Pocketsphinx increase
>> binary size? or download size?
>> When the pref is enabled, how much does it use memory on desktop, what
>> about on b2g?
>>
>>
>>  This is important work and the competition is ramping quicky after many
> years of promises about this year being the year of voice recognition.  We
> will probably fall behind quickly if we don't get something going here in
> the next year.
>
> Can you also talk a bit about what the plan and set of challenges look
> like for expanding the supported languages, and how these would impact the
> numbers ollie has asked for?
>
> The place we really need this is b2g, but phones are only shipping in
> international markets right now so english only is not all that helpful.
>
> -chofmann
>
>
>>>
>>> -Olli
>>>
>>>
>>> On 10/31/2014 01:18 AM, Andre Natal wrote:
>>>
 I've been researching speech recognition in Firefox for two years. First
 SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
 [1] embedded in Gecko C++ layer, project that I had the luck to develop
 for
 Google Summer of Code with the mentoring of Olli Pettay, Guilherme
 Gonçalves, Steven Lee, Randell Jesup plus others and with the
 management of
 Sandip Kamat.

 The implementation already works in B2G, Fennec and all FF desktop
 versions, and the first language supported will be english. The API and
 implementation are in conformity with W3C standard [2]. The preference
 to
 enable it is: media.webspeech.service.default = pocketsphinx

 The required patches for achieve this are:

   - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
   - Embed english models. Bug 1065911 [4]
   - Change SpeechGrammarList to store grammars inside SpeechGrammar
 objects.
 Bug 1088336 [5]
   - Creation of a SpeechRecognitionService for Pocketsphinx. Bug
 1051148 [6]


 Also, other important features that we don't have patches yet:
   - Relax VAD strategy to be les strict and avoid stop in the middle of
 speech when speaking low volume phonemes [7]
   - Integrate or develop a grapheme to phoneme algorithm to realtime
 generator when compiling grammars [8]
   - Inlcude and build models for other languages [9]
   - Continuous and wordspotting recognition [10]

 The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
 has more detailed info [13].

 At this comment you can see a cpu usage on flame while recognition is
 happening [14]

 I wish to hear your comments.

 Thanks,

 Andre Natal

 [1] http://cmusphinx.sourceforge.net/
 [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
 https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
 [11] https://github.com/andrenatal/gecko-dev
 [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
 (Jump
 to 12:00)
 [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
 [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14


>>>
>> ___
>> dev-platform mailing list
>> dev-platform@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-platform
>>
>
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

16 matches

Site Navigation

Mail list logo

Footer information