Voice-activated assistants are playing an increasingly prominent role in the technology world, with Apple's introduction of Siri for the iPhone 4S and Google's (rumored) work on a Siri competitor for Android phones.
Voice-activated technology isn't new—it's just getting better because of increasingly powerful processors and cloud services, advancements in natural language processing, and improved algorithms for recognizing voice. We spoke with Nuance Communications, maker of Dragon software and one of the biggest names in voice recognition technologies, about why voice is becoming more popular and what advancements we can expect in the future. Peter Mahoney, Nuance chief marketing officer and general manager of the Dragon desktop business, told Ars one of the most significant improvements coming in the next few years is a far more conversational voice-activated assistant that remembers everything you say. This should create better responses to casual questions. "I think you'll see systems that are more conversational, that have the ability to ask more sophisticated followup questions and adapt to the individual," Mahoney said. "They'll be able to remember what you're talking about. Talking to one of these assistants today is like dealing with someone with no short-term memory. They don't remember what you just said. These systems over time are going to get better and better short-term and long-term memory." As an example, Mahoney said if you ask Nuance's Dragon Go! app for iPhone to make a reservation in an Italian restaurant tomorrow night at 8pm for four people but don't like the results, you basically have to start over. "You'd like to be able to say 'how about Thai?' instead of trying to repeat the same thing over again, or 'how about next Thursday?' You should be able to follow up and the system should be able to remember your conversation. They don't do that that well today." Siri is taking steps toward providing a natural, conversation-like experience with voice-activated assistants, as Jacqui Cheng noted in the Ars iPhone 4S review. "When given direct and clear tasks, Siri performs well, and it's nice not having to memorize a strict list of commands," Cheng wrote. "The best part about Siri is the fact that you can (or should be able to, anyway) speak to it like you would speak to a person without having to conform to a special speaking syntax—the number one turn-off for 'regular' people using voice control features." Still, the Ars review noted some shortcomings. Siri often misinterpreted casually spoken commands, making it easier in many cases to perform the tasks manually. The limits of Siri's conversational abilities may be seen in a video by Macworld's Jason Snell. Like the example Mahoney gave, Snell asked Siri "where can I have lunch?" After receiving the results, including 12 nearby restaurants, Snell asked "how about downtown?" Siri's response: "I don't know what you mean by 'how about downtown?'" Snell said the same request worked on a previous attempt. "Sometimes the Siri software figures out what you mean and its kind of like magic. Other times it doesn't really work that well," he said. "I think that's one of the reasons Apple called this a beta." The maker of Dragon has many products for desktops and mobile phones, as well as industry-specific software for in-car navigation systems, health care settings, and more. There have been rumors that Nuance technology even powers Apple's Siri on the back end. Nuance told us it can't comment on "specific capabilities or devices." However, the company did confirm that "Apple licenses Nuance's voice technology for use in some of its products." Headquartered in Burlington, Massachusetts, Nuance has more than 1,000 engineers around the world. Research and development is divided into several categories, Mahoney explained. There's acoustic modeling, for processing audio and mapping it to sounds and words. Language modeling experts, including linguists, help build systems capable of understanding the structure of language and grammar. Natural language processing experts help extract meaningful information from the data gathered by Nuance services. Nuance's simplest products are available in more than 70 languages. But the more complex desktop applications, such as Dragon NaturallySpeaking for PC, only support about a half-dozen. "For each of the languages you need a good understanding of what puts the language together, how it's built and how the sounds translate into words," Mahoney said. Apple's embrace of voice with Siri has increased consumer interest in the technology. "Apple's so strong with user experience that when they embrace voice as a core differentiator, that says to a lot of people it might be good enough, because Apple wouldn't do it if it's not great," Mahoney said. Siri and Nuance's Dragon Go! for iPhone and Android aren't that different in the technology they use on the back end. They are different implementations, Mahoney said. "What the Siri application does is it tries to interpret what you're asking for and brings you through a very structured set of potential results Apple can deliver to you," he said. "It's very neatly controlled. It tends to be a great experience." Dragon Go! connects users to results from more than 200 Web properties covering the most likely searches to provide information on restaurants, music and entertainment, local businesses, and other topics. For restaurant queries, Dragon takes users to Yelp for reviews and OpenTable to make reservations. Nuance has built up its user experience with various iterations over the years, but it's still a manual process. "Most of the systems require a human to define the different categories of information you're going to support," Mahoney said. "Some person has to decide what are the kinds of things people will ask… You can fool any one of these systems if you work hard enough because they can't answer every single kind of question." Over the next few years, we'll see voice recognition technologies learn from their users and improve themselves without manual intervention, Mahoney said. "As more machine learning capability gets implemented and more of these systems sort of build themselves, you'll see better and better coverage. The systems will learn from use about what kinds of things they need to cover, and they'll get smarter over time." http://arstechnica.com/business/news/2012/03/future-of-voice-recognition-assistants-that-remember-everything-you-say.ars Search for old postings at: http://www.mail-archive.com/accessindia@accessindia.org.in/ To unsubscribe send a message to accessindia-requ...@accessindia.org.in with the subject unsubscribe. To change your subscription to digest mode or make any other changes, please visit the list home page at http://accessindia.org.in/mailman/listinfo/accessindia_accessindia.org.in