[AI] The Future Of Voice Recognition

shahnaz Tue, 13 Mar 2012 08:06:40 -0700


Voice-activated assistants are playing an increasingly prominent role
in the technology world, with Apple's introduction of Siri for the
iPhone 4S and Google's (rumored) work on a Siri competitor for Android
phones.

Voice-activated technology isn't new—it's just getting better because
of increasingly powerful processors and cloud services, advancements
in natural language processing, and improved algorithms for
recognizing voice. We spoke with Nuance Communications, maker of
Dragon software and one of the biggest names in voice recognition
technologies, about why voice is becoming more popular and what
advancements we can expect in the future.

Peter Mahoney, Nuance chief marketing officer and general manager of
the Dragon desktop business, told Ars one of the most significant
improvements coming in the next few years is a far more conversational
voice-activated assistant that remembers everything you say. This
should create better responses to casual questions.

"I think you'll see systems that are more conversational, that have
the ability to ask more sophisticated followup questions and adapt to
the individual," Mahoney said. "They'll be able to remember what
you're talking about. Talking to one of these assistants today is like
dealing with someone with no short-term memory. They don't remember
what you just said. These systems over time are going to get better
and better short-term and long-term memory."

As an example, Mahoney said if you ask Nuance's Dragon Go! app for
iPhone to make a reservation in an Italian restaurant tomorrow night
at 8pm for four people but don't like the results, you basically have
to start over. "You'd like to be able to say 'how about Thai?' instead
of trying to repeat the same thing over again, or 'how about next
Thursday?' You should be able to follow up and the system should be
able to remember your conversation. They don't do that that well
today."

Siri is taking steps toward providing a natural, conversation-like
experience with voice-activated assistants, as Jacqui Cheng noted in
the Ars iPhone 4S review. "When given direct and clear tasks, Siri
performs well, and it's nice not having to memorize a strict list of
commands," Cheng wrote. "The best part about Siri is the fact that you
can (or should be able to, anyway) speak to it like you would speak to
a person without having to conform to a special speaking syntax—the
number one turn-off for 'regular' people using voice control
features."

Still, the Ars review noted some shortcomings. Siri often
misinterpreted casually spoken commands, making it easier in many
cases to perform the tasks manually.

The limits of Siri's conversational abilities may be seen in a video
by Macworld's Jason Snell. Like the example Mahoney gave, Snell asked
Siri "where can I have lunch?" After receiving the results, including
12 nearby restaurants, Snell asked "how about downtown?" Siri's
response: "I don't know what you mean by 'how about downtown?'"

Snell said the same request worked on a previous attempt. "Sometimes
the Siri software figures out what you mean and its kind of like
magic. Other times it doesn't really work that well," he said. "I
think that's one of the reasons Apple called this a beta."

The maker of Dragon has many products for desktops and mobile phones,
as well as industry-specific software for in-car navigation systems,
health care settings, and more. There have been rumors that Nuance
technology even powers Apple's Siri on the back end. Nuance told us it
can't comment on "specific capabilities or devices." However, the
company did confirm that "Apple licenses Nuance's voice technology for
use in some of its products."

Headquartered in Burlington, Massachusetts, Nuance has more than 1,000
engineers around the world. Research and development is divided into
several categories, Mahoney explained. There's acoustic modeling, for
processing audio and mapping it to sounds and words. Language modeling
experts, including linguists, help build systems capable of
understanding the structure of language and grammar. Natural language
processing experts help extract meaningful information from the data
gathered by Nuance services.

Nuance's simplest products are available in more than 70 languages.
But the more complex desktop applications, such as Dragon
NaturallySpeaking for PC, only support about a half-dozen.

"For each of the languages you need a good understanding of what puts
the language together, how it's built and how the sounds translate
into words," Mahoney said.

Apple's embrace of voice with Siri has increased consumer interest in
the technology. "Apple's so strong with user experience that when they
embrace voice as a core differentiator, that says to a lot of people
it might be good enough, because Apple wouldn't do it if it's not
great," Mahoney said.

Siri and Nuance's Dragon Go! for iPhone and Android aren't that
different in the technology they use on the back end. They are
different implementations, Mahoney said. "What the Siri application
does is it tries to interpret what you're asking for and brings you
through a very structured set of potential results Apple can deliver
to you," he said. "It's very neatly controlled. It tends to be a great
experience."

Dragon Go! connects users to results from more than 200 Web properties
covering the most likely searches to provide information on
restaurants, music and entertainment, local businesses, and other
topics. For restaurant queries, Dragon takes users to Yelp for reviews
and OpenTable to make reservations.

Nuance has built up its user experience with various iterations over
the years, but it's still a manual process. "Most of the systems
require a human to define the different categories of information
you're going to support," Mahoney said. "Some person has to decide
what are the kinds of things people will ask… You can fool any one of
these systems if you work hard enough because they can't answer every
single kind of question."

Over the next few years, we'll see voice recognition technologies
learn from their users and improve themselves without manual
intervention, Mahoney said. "As more machine learning capability gets
implemented and more of these systems sort of build themselves, you'll
see better and better coverage. The systems will learn from use about
what kinds of things they need to cover, and they'll get smarter over
time."

http://arstechnica.com/business/news/2012/03/future-of-voice-recognition-assistants-that-remember-everything-you-say.ars

Search for old postings at:
http://www.mail-archive.com/accessindia@accessindia.org.in/

To unsubscribe send a message to
accessindia-requ...@accessindia.org.in
with the subject unsubscribe.

To change your subscription to digest mode or make any other changes, please
visit the list home page at
http://accessindia.org.in/mailman/listinfo/accessindia_accessindia.org.in

[AI] The Future Of Voice Recognition

Reply via email to