Re: NSSpeechRecognizer and Speech Recognition calibration

2008-12-27 Thread Ricky Sharp


On Dec 26, 2008, at 4:56 AM, Christopher Corbell wrote:

I'm working on an accessibility app for the visually impaired and  
was hoping

to use NSSpeechRecognizer.

I've found it extremely difficult to get NSSpeechRecognizer to behave
predictably on my system.  Does anyone on the list have experience  
with this
class  success with the Speech Recognition system preference  
panel?  Any

tips or tricks?

I find that that calibration dialog for the Speech Recognition  
settings
doesn't work at all for me.  I'm using a pretty standard external  
microphone
(built-in to a Logitech Webcam) with an intel Mac Mini.  I can see  
my signal
just fine and I'm speaking clearly in as accent-neutral a way as I  
can, and

still none of the test sentences ever highlights.  Is a headset mic
typically required, or is there some other gotcha here?


It must be your particular setup.  I've been doing SR ever since it  
debuted (Mac OS 8.x days) and have not had trouble when words/phrases  
are unique enough (as yours clearly are).


When I give NSSpeechRecognizer a very small and unambiguous command  
set, I
find it badly misses the mark.  For example I might have Play,  
Next, and
Stop in my command set, and it will interpret Next as Play,  
but it
will never interpret Play as a command - pretty unusable, I'm  
hoping it's

just a calibration issue.


Since the calibration dialog isn't working for you, it's not  
surprising that it's getting your phrases confused.  Make sure to get  
your setup working in the calibration area first.


One last note - is there any way to do proper dictation with this  
class or
will it only recognize the preset command list you give it?  I'm  
thinking
for example of prompting for a file name to save to, or a term to  
search on

- it would be nice to have true dictation, otherwise I'll resort to
providing an alphabet as a command set so the user can spell it out
(assuming I can get that to work).


No.  And, you definitely do _not_ want to add letters to your language  
model.  English letters have too many cases where sounds are extremely  
similar: 'B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z' for probably the  
largest set.


When I worked on numeric input, I had to offer two modes (two  
different speech models driven by user-preference).  For example,  
'sixteen' and 'sixty' were often confused.  This got better over time  
though, but still not 100%.  For users that had trouble, they could  
switch to the other model in which they needed to speak individual  
digits instead: 'one six' and 'six zero'.  Now the phrases were unique  
enough to remove any confusion.


You really only have two options: (1) The user has a 3rd-party  
dictation solution or (2) your solution uses words/phrases for letter  
input.  For example the military alphabet (alpha, bravo, charlie,  
etc.) which was designed to work over very low-quality audio situations.


___
Ricky A. Sharp mailto:rsh...@instantinteractive.com
Instant Interactive(tm)   http://www.instantinteractive.com



___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: NSSpeechRecognizer and Speech Recognition calibration

2008-12-26 Thread Michael Ash
On Fri, Dec 26, 2008 at 5:56 AM, Christopher Corbell
chriscorb...@gmail.com wrote:
 I'm working on an accessibility app for the visually impaired and was hoping
 to use NSSpeechRecognizer.

 I've found it extremely difficult to get NSSpeechRecognizer to behave
 predictably on my system.  Does anyone on the list have experience with this
 class  success with the Speech Recognition system preference panel?  Any
 tips or tricks?

 I find that that calibration dialog for the Speech Recognition settings
 doesn't work at all for me.  I'm using a pretty standard external microphone
 (built-in to a Logitech Webcam) with an intel Mac Mini.  I can see my signal
 just fine and I'm speaking clearly in as accent-neutral a way as I can, and
 still none of the test sentences ever highlights.  Is a headset mic
 typically required, or is there some other gotcha here?

 When I give NSSpeechRecognizer a very small and unambiguous command set, I
 find it badly misses the mark.  For example I might have Play, Next, and
 Stop in my command set, and it will interpret Next as Play, but it
 will never interpret Play as a command - pretty unusable, I'm hoping it's
 just a calibration issue.

I'm afraid I don't know a lot about this stuff, but since nobody else
has answered yet, I'll give it a shot.

From what you describe I'd guess that your gain is not set well. You
need a good strong signal when you speak, but it absolutely must not
clip. If it's too quiet then I think that the recognizer won't realize
when you're speaking, or won't be able to understand what you're
saying when you do. If it's clipping then it will get a lot of
distortion. Try to make it so that your speech just reaches the top of
the green area.

I've never used speech recognition extensively but I have played
around with it a fair amount, and with a decent microphone that's
properly calibrated, I haven't really had much trouble with it
misunderstanding me, certainly not to the degree which you describe.

You might also try recording yourself and then playing it back. Listen
to see how it sounds. If it's distorted or otherwise of low quality
then that will hurt recognition badly.

 One last note - is there any way to do proper dictation with this class or
 will it only recognize the preset command list you give it?  I'm thinking
 for example of prompting for a file name to save to, or a term to search on
 - it would be nice to have true dictation, otherwise I'll resort to
 providing an alphabet as a command set so the user can spell it out
 (assuming I can get that to work).

Alas, no dictation. The great thing about Apple's speech recognition
is that it's speaker-independent and requires no training. However the
state of the art for such things is that dictation requires training
to a particular speaker's voice before it can work. So Apple's
recognizer, by virtue of its capabilities in other areas, loses out on
this capability. You'll have to go with your alphabet command set or
some other idea that works with a predefined set of recognizable
phrases.

Mike
___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com