Re: [Audyssey] First Natural-Sounding Synthesized Voice in the World'
i think it is in the new voice search for android http://www.youtube.com/watch?v=XLyuWEWqYqQ On 4 Jul 2012, at 14:09, Ron Kolesar wrote: > To bad we can't get this new technology for our computers. > But it is a first for the blind and it does sound interesting at the least. > > > > > Ron and current Leader Dog boz who states > "that a service dog beats a cane paws down any day of the week." > -Original Message- From: Phil Vlasak > Sent: Wednesday, July 04, 2012 9:00 AM > To: Gamers Discussion list > Subject: [Audyssey] First Natural-Sounding Synthesized Voice in the World' > > Android Director: 'We Have the First Natural-Sounding Synthesized Voice in > the World' > July 4, 2012 | > Hugo Barra, Android's director of product management, was cool and composed > as he shared Android's latest killer features. > giving Google a voice is very use case-driven. If you're in a situation > where you're asking a question with your voice, there's a significant chance > you're in a somewhat constrained environment. You're on the go, you're > rushing. You might be in the car. You're carrying something else with your > hands. You can't really pause to look at your screen or type. > > So speaking it back to you seems pretty natural, right? That's how humans > communicate. But we also wanted to do that only when we had a text-to-speech > engine that was extremely high quality. And what you hear today, if you ask > Google a question on Jelly Bean, is quite spectacular. There isn't a > text-to-speech engine, as we call them, that has accuracy as high as that. > > We have built a text-to-speech engine that's networked-based, meaning it > uses a very large amount of data to compose a spoken answer. You know, > purely from a synthesis perspective - forget about answering questions - it > takes a very large amount of data to generate a synthesized audio of someone > speaking. But we also have a matching engine that sits on the device. It's > the exact same voice but with a very different computational technique. You'll > always hear the same voice whether it's speaking back to you in a connected > use-case, in which it comes from the server, or a disconnected offline > use-case, in which it would just be synthesized on the device. > > Wired: What makes a good voice? Did you model it after someone? > > Barra: I actually come from speech recognition, and I worked in speech in > general for a very long time. So don't let me talk about this all day. But > it's a very, very intricate process. And it starts with finding a voice > talent. > > Wired: A real person? > > Barra: Finding a person who has a voice that just nails it. And in this day > and age, it's actually a very different voice talent than the voice talents > that power most of the voice technology that exists today. A lot of today's > voice technology comes from the companies you'd expect Nuance and Microsoft > and others. That technology is built for a telephony world, for a customer > service environment where you need this posh, powerful voice a branding > approach to things. > > We set out to create the very first conversational voice, and I think we > nailed that. I think we have the very first high-quality, natural-sounding, > conversational, synthesized voice in the entire world. > > Between a bunch of designers, engineers and speech scientists, we sat down > and tried to describe the personality of the person, the personality of the > voice that we were trying to create. We wrote down "friendly" [as a product > goal] and there were literally 15 different ways to describe what friendly > means. So that was the brief that we gave to a casting agency, and they came > back with 10 candidates. We recorded those 10 candidates, and we did a bunch > of blind tests with all sorts of different people, and we voted it down to > two people. And then we recorded more of those people, and we did some tests > and we decided "OK, we're going to go with this one person." > > I don't actually know her name. In fact, no one knows her name. > > Wired: It's a secret? > > Barra: It's supposed to be. It's not something that you publicize because it > needs to be the voice of Google. And then you create the voice, you collect > a lot of data. What we did is an industry first. > > Wired: While it does sound more human-like, it doesn't have a lot of > personality in the sense that it doesn't say funny things back to you. It > doesn't deliver jokes. > > Barra: So nothing to do with the voice itself, but what it says an
Re: [Audyssey] First Natural-Sounding Synthesized Voice in the World'
To bad we can't get this new technology for our computers. But it is a first for the blind and it does sound interesting at the least. Ron and current Leader Dog boz who states "that a service dog beats a cane paws down any day of the week." -Original Message- From: Phil Vlasak Sent: Wednesday, July 04, 2012 9:00 AM To: Gamers Discussion list Subject: [Audyssey] First Natural-Sounding Synthesized Voice in the World' Android Director: 'We Have the First Natural-Sounding Synthesized Voice in the World' July 4, 2012 | Hugo Barra, Android's director of product management, was cool and composed as he shared Android's latest killer features. giving Google a voice is very use case-driven. If you're in a situation where you're asking a question with your voice, there's a significant chance you're in a somewhat constrained environment. You're on the go, you're rushing. You might be in the car. You're carrying something else with your hands. You can't really pause to look at your screen or type. So speaking it back to you seems pretty natural, right? That's how humans communicate. But we also wanted to do that only when we had a text-to-speech engine that was extremely high quality. And what you hear today, if you ask Google a question on Jelly Bean, is quite spectacular. There isn't a text-to-speech engine, as we call them, that has accuracy as high as that. We have built a text-to-speech engine that's networked-based, meaning it uses a very large amount of data to compose a spoken answer. You know, purely from a synthesis perspective - forget about answering questions - it takes a very large amount of data to generate a synthesized audio of someone speaking. But we also have a matching engine that sits on the device. It's the exact same voice but with a very different computational technique. You'll always hear the same voice whether it's speaking back to you in a connected use-case, in which it comes from the server, or a disconnected offline use-case, in which it would just be synthesized on the device. Wired: What makes a good voice? Did you model it after someone? Barra: I actually come from speech recognition, and I worked in speech in general for a very long time. So don't let me talk about this all day. But it's a very, very intricate process. And it starts with finding a voice talent. Wired: A real person? Barra: Finding a person who has a voice that just nails it. And in this day and age, it's actually a very different voice talent than the voice talents that power most of the voice technology that exists today. A lot of today's voice technology comes from the companies you'd expect Nuance and Microsoft and others. That technology is built for a telephony world, for a customer service environment where you need this posh, powerful voice a branding approach to things. We set out to create the very first conversational voice, and I think we nailed that. I think we have the very first high-quality, natural-sounding, conversational, synthesized voice in the entire world. Between a bunch of designers, engineers and speech scientists, we sat down and tried to describe the personality of the person, the personality of the voice that we were trying to create. We wrote down "friendly" [as a product goal] and there were literally 15 different ways to describe what friendly means. So that was the brief that we gave to a casting agency, and they came back with 10 candidates. We recorded those 10 candidates, and we did a bunch of blind tests with all sorts of different people, and we voted it down to two people. And then we recorded more of those people, and we did some tests and we decided "OK, we're going to go with this one person." I don't actually know her name. In fact, no one knows her name. Wired: It's a secret? Barra: It's supposed to be. It's not something that you publicize because it needs to be the voice of Google. And then you create the voice, you collect a lot of data. What we did is an industry first. Wired: While it does sound more human-like, it doesn't have a lot of personality in the sense that it doesn't say funny things back to you. It doesn't deliver jokes. Barra: So nothing to do with the voice itself, but what it says and how it says it? Wired: Exactly. Is that something you guys were looking to add in the future, or is that something you wanted to leave out? Barra: It's very deliberately not making jokes with you. Google is a neutral party it's not your friend, secretary or sister. It's not your mom. It's not your girlfriend or boyfriend. It is an information retrieval entity. You ask, we respond. And it's very important that this entity be impartial, and adding jokes and other mannerisms to the voice would take away from that. It's
Re: [Audyssey] First Natural-Sounding Synthesized Voice in the World'
shame no sample of this supposed voice all hype to be honest by the sound of it On 4 Jul 2012, at 14:00, "Phil Vlasak" wrote: > Android Director: 'We Have the First Natural-Sounding Synthesized Voice in > the World' > July 4, 2012 | > Hugo Barra, Android's director of product management, was cool and composed > as he shared Android's latest killer features. > giving Google a voice is very use case-driven. If you're in a situation > where you're asking a question with your voice, there's a significant chance > you're in a somewhat constrained environment. You're on the go, you're > rushing. You might be in the car. You're carrying something else with your > hands. You can't really pause to look at your screen or type. > > So speaking it back to you seems pretty natural, right? That's how humans > communicate. But we also wanted to do that only when we had a text-to-speech > engine that was extremely high quality. And what you hear today, if you ask > Google a question on Jelly Bean, is quite spectacular. There isn't a > text-to-speech engine, as we call them, that has accuracy as high as that. > > We have built a text-to-speech engine that's networked-based, meaning it uses > a very large amount of data to compose a spoken answer. You know, purely from > a synthesis perspective - forget about answering questions - it takes a very > large amount of data to generate a synthesized audio of someone speaking. But > we also have a matching engine that sits on the device. It's the exact same > voice but with a very different computational technique. You'll always hear > the same voice whether it's speaking back to you in a connected use-case, in > which it comes from the server, or a disconnected offline use-case, in which > it would just be synthesized on the device. > > Wired: What makes a good voice? Did you model it after someone? > > Barra: I actually come from speech recognition, and I worked in speech in > general for a very long time. So don't let me talk about this all day. But > it's a very, very intricate process. And it starts with finding a voice > talent. > > Wired: A real person? > > Barra: Finding a person who has a voice that just nails it. And in this day > and age, it's actually a very different voice talent than the voice talents > that power most of the voice technology that exists today. A lot of today's > voice technology comes from the companies you'd expect Nuance and Microsoft > and others. That technology is built for a telephony world, for a customer > service environment where you need this posh, powerful voice a branding > approach to things. > > We set out to create the very first conversational voice, and I think we > nailed that. I think we have the very first high-quality, natural-sounding, > conversational, synthesized voice in the entire world. > > Between a bunch of designers, engineers and speech scientists, we sat down > and tried to describe the personality of the person, the personality of the > voice that we were trying to create. We wrote down "friendly" [as a product > goal] and there were literally 15 different ways to describe what friendly > means. So that was the brief that we gave to a casting agency, and they came > back with 10 candidates. We recorded those 10 candidates, and we did a bunch > of blind tests with all sorts of different people, and we voted it down to > two people. And then we recorded more of those people, and we did some tests > and we decided "OK, we're going to go with this one person." > > I don't actually know her name. In fact, no one knows her name. > > Wired: It's a secret? > > Barra: It's supposed to be. It's not something that you publicize because it > needs to be the voice of Google. And then you create the voice, you collect a > lot of data. What we did is an industry first. > > Wired: While it does sound more human-like, it doesn't have a lot of > personality in the sense that it doesn't say funny things back to you. It > doesn't deliver jokes. > > Barra: So nothing to do with the voice itself, but what it says and how it > says it? > > Wired: Exactly. Is that something you guys were looking to add in the future, > or is that something you wanted to leave out? > > Barra: It's very deliberately not making jokes with you. Google is a neutral > party it's not your friend, secretary or sister. It's not your mom. It's not > your girlfriend or boyfriend. It is an information retrieval entity. You ask, > we respond. And it's very important that this entity be impartial, and adding > jokes and other mannerisms to the voice would take away from that. > > It's something that we've talked about, and it's pretty clear. There hasn't > been a single person in the company who thinks we should have gone the other > direction. > > http://www.wired.com/gadgetlab/2012/07/google-android-hugo-barra-interview/all/ > > > --- > Gamers mailing list __ Gamers@audyssey.org > If you want
[Audyssey] First Natural-Sounding Synthesized Voice in the World'
Android Director: 'We Have the First Natural-Sounding Synthesized Voice in the World' July 4, 2012 | Hugo Barra, Android's director of product management, was cool and composed as he shared Android's latest killer features. giving Google a voice is very use case-driven. If you're in a situation where you're asking a question with your voice, there's a significant chance you're in a somewhat constrained environment. You're on the go, you're rushing. You might be in the car. You're carrying something else with your hands. You can't really pause to look at your screen or type. So speaking it back to you seems pretty natural, right? That's how humans communicate. But we also wanted to do that only when we had a text-to-speech engine that was extremely high quality. And what you hear today, if you ask Google a question on Jelly Bean, is quite spectacular. There isn't a text-to-speech engine, as we call them, that has accuracy as high as that. We have built a text-to-speech engine that's networked-based, meaning it uses a very large amount of data to compose a spoken answer. You know, purely from a synthesis perspective - forget about answering questions - it takes a very large amount of data to generate a synthesized audio of someone speaking. But we also have a matching engine that sits on the device. It's the exact same voice but with a very different computational technique. You'll always hear the same voice whether it's speaking back to you in a connected use-case, in which it comes from the server, or a disconnected offline use-case, in which it would just be synthesized on the device. Wired: What makes a good voice? Did you model it after someone? Barra: I actually come from speech recognition, and I worked in speech in general for a very long time. So don't let me talk about this all day. But it's a very, very intricate process. And it starts with finding a voice talent. Wired: A real person? Barra: Finding a person who has a voice that just nails it. And in this day and age, it's actually a very different voice talent than the voice talents that power most of the voice technology that exists today. A lot of today's voice technology comes from the companies you'd expect Nuance and Microsoft and others. That technology is built for a telephony world, for a customer service environment where you need this posh, powerful voice a branding approach to things. We set out to create the very first conversational voice, and I think we nailed that. I think we have the very first high-quality, natural-sounding, conversational, synthesized voice in the entire world. Between a bunch of designers, engineers and speech scientists, we sat down and tried to describe the personality of the person, the personality of the voice that we were trying to create. We wrote down "friendly" [as a product goal] and there were literally 15 different ways to describe what friendly means. So that was the brief that we gave to a casting agency, and they came back with 10 candidates. We recorded those 10 candidates, and we did a bunch of blind tests with all sorts of different people, and we voted it down to two people. And then we recorded more of those people, and we did some tests and we decided "OK, we're going to go with this one person." I don't actually know her name. In fact, no one knows her name. Wired: It's a secret? Barra: It's supposed to be. It's not something that you publicize because it needs to be the voice of Google. And then you create the voice, you collect a lot of data. What we did is an industry first. Wired: While it does sound more human-like, it doesn't have a lot of personality in the sense that it doesn't say funny things back to you. It doesn't deliver jokes. Barra: So nothing to do with the voice itself, but what it says and how it says it? Wired: Exactly. Is that something you guys were looking to add in the future, or is that something you wanted to leave out? Barra: It's very deliberately not making jokes with you. Google is a neutral party it's not your friend, secretary or sister. It's not your mom. It's not your girlfriend or boyfriend. It is an information retrieval entity. You ask, we respond. And it's very important that this entity be impartial, and adding jokes and other mannerisms to the voice would take away from that. It's something that we've talked about, and it's pretty clear. There hasn't been a single person in the company who thinks we should have gone the other direction. http://www.wired.com/gadgetlab/2012/07/google-android-hugo-barra-interview/all/ --- Gamers mailing list __ Gamers@audyssey.org If you want to leave the list, send E-mail to gamers-unsubscr...@audyssey.org. You can make changes or update your subscription via the web, at http://mail.audyssey.org/mailman/listinfo/gamers_audyssey.org. All messages are archived and can be searched and read at http://www.mail-archive.com/gamers@audyssey.or