RE: Speech API Community Group
The proposal mentions that the specification of a network speech protocol is out of scope. This makes sense given that protocols are the domain of the IETF. But I'd like to confirm that the use of network speech services are in scope for this CG. Would you mind amending the proposal to make this explicit? Thanks From: Glen Shires [mailto:gshi...@google.com] Sent: Tuesday, April 03, 2012 8:13 AM To: public-xg-htmlspe...@w3.org; public-webapps@w3.org Subject: Speech API Community Group We at Google have proposed the formation of a new Speech API Community Group to pursue a JavaScript Speech API. We encourage you to join and support this effort. [1] We believe that forming a Community Group has the following advantages: - It's quick, efficient and minimizes unnecessary process overhead. - We believe it will allow us, as a group, to reach consensus in an efficient manner. - We hope it will expedite interoperable implementations in multiple browsers. (A good example is the Web Media Text Tracks CG, where multiple implementations are happening quickly.) Google plans to supply an implementation and a test suite for this specification, and will commit to serve as editor. We hope that others will support this CG as they had stated support for the similar WebApps CfC. [2] Bjorn Bringert Satish Sampath Glen Shires [1] http://www.w3.org/community/groups/proposed/#speech-api [2] http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0315.html
RE: Speech API Community Group
It matters to the application author that they can select a service that works best for them. Relying on browser or OS configurations would not suffice for real-world speech applications. I don't see how we can properly specify the process of selection without the mention of network services. Hence the language request. From: Jerry Carter [mailto:je...@jerrycarter.org] Sent: Tuesday, April 03, 2012 11:46 AM To: Young, Milan Cc: Glen Shires; public-xg-htmlspe...@w3.org; public-webapps@w3.org Subject: Re: Speech API Community Group On Apr 3, 2012, at 11:48 AM, Young, Milan wrote: The proposal mentions that the specification of a network speech protocol is out of scope. This makes sense given that protocols are the domain of the IETF. But I'd like to confirm that the use of network speech services are in scope for this CG. Would you mind amending the proposal to make this explicit? I don't see why any such declaration is necessary. From the perspective of the application author or of the application user, it matters very little where the speech-to-text operation occurs so long as the result is delivered promptly. There is no reason that local, network-based, or hybrid solutions would be unable to provide adequate performance. I believe the current language in the proposal is appropriate. -=- Jerry
RE: Speech API Community Group
The problem is that the community group has an ambiguous “charter”, and at least some folks would like this clarified before joining. Being that Speech-XG and Webapps are the two most relevant lists, I don’t know where else the discussion would take place. I believe that all of this could be cleared up by a simple statement from the CG chair (Glen Shires) that FPR7 and FPR12 are in scope. There is “STRONG” interest in this domain and an editor has already volunteered (Jim Barnett). Seems like a simple decision. Thanks From: Charles Pritchard [mailto:ch...@jumis.com] Sent: Tuesday, April 03, 2012 1:29 PM To: Michael Bodell Cc: Jerry Carter; Raj (Openstream); Young, Milan; Jim; Glen Shires; public-xg-htmlspe...@w3.org; public-webapps@w3.org Subject: Re: Speech API Community Group I'd like to encourage everyone interested in the Speech API to join the mailing list: http://lists.w3.org/Archives/Public/public-speech-api/ For those interested in more hands-on interaction, there's the CG: http://www.w3.org/community/speech-api/ For some archived mailing list discussion, browse the old XG list: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/ It seems like we can move this chatter over to public-speech-api and off of the webapps list. -Charles On 4/3/2012 1:08 PM, Michael Bodell wrote: A little bit of historical context and resource references might be helpful for some on the email thread. While this is still an early stage for a community group, if one will happen, it actually isn’t early for the community as a group to talk about this. In many ways we’ve already done the initial incubation and community discussion and investigation for this space in the HTML Speech XG. This lead to the XG’s use case and requirements document: http://www.w3.org/2005/Incubator/htmlspeech/live/requirements.html which were then refined to a prioritized requirement list after soliciting community input: http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#prioritized As I read it, Milan and Jim and Raj’s requirements discussed are part of FPR7 [Web apps should be able to request speech service different from default] and FPR12 [Speech services that can be specified by web apps must include network speech services], both of which were voted to have “Strong Interest” by the community. Further work from these requirements led to the community coming up with a proposal, which is ready now to be taken to a standards track process, that was published in the XG final report: http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/ Hopefully we can all properly leverage the work the community has already done. Michael Bodell Co-chair HTML Speech XG From: Jerry Carter [mailto:je...@jerrycarter.org] Sent: Tuesday, April 03, 2012 12:50 PM To: Raj (Openstream); Milan Young; Jim Cc: Glen Shires; public-xg-htmlspe...@w3.orgmailto:public-xg-htmlspe...@w3.org; public-webapps@w3.orgmailto:public-webapps@w3.org Subject: Re: Speech API Community Group We can discuss this in terms of generalities without any resolution, so let me offer two more concrete use cases: My friend Jóse is working on a personal site to track teams and player statistics at the Brazil 2014 World Cup. He recognizes that the browser will define a default language through the HTTP Accept-Language header, but knows that speakers may code switch in their requests (e.g. Spanish + English or Portuguese + English or ) or be better served by using native pronunciations (Jesus = /heːzus/ vs. /ˈdʒiːzəs/). Hence, he requires a resource that can provide support for Spanish, English, and Portuguese and that can also support multiple simultaneous languages. These are two solid requirements. A browser encountering the page might (1) be able to satisfy these requirements, (2) require user permission before accessing such a resource, or (3) be unable to meet the request. My colleague Jim has another application for which hundreds of hours have been invested to optimize the performance for a specify recognition resource. Security considerations further restrict the physical location of conforming resources. His page requires a very specific resource. These are two solid requirements. A browser encountering the page might (1) be able to satisfy these requirements, (2) require user permission before accessing such a resource, or (3) be unable to meet the request. There are indeed commercial requirements around the capabilities of resources. We are in full agreement. It is important to be able to list requirements for conforming resources and to ensure that the browser is enforcing those requirements. That stated, the application author does no care where such a conforming resource resides so long as it is available to the targeted user population. The user does not care where the resource resides so long as it works well and does not cost too much to use. The trick within
RE: to add Speech API to Charter; deadline January 19
That's exactly the right question to ask. Please take a look at: http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#requirements I am also in support of Olli's statement that we may not be able to spec/implement the complete XG recommendation in one pass. But decisions made toward the definition of that initial feature set should be made in a democratic forum. I feel the best way to do that is to start where the last democratic forum left off, and whittle down from there as schedule requires. Thank you -Original Message- From: Dave Bernard [mailto:dbern...@intellectiongroup.com] Sent: Friday, January 13, 2012 8:14 AM To: 'Deborah Dahl'; 'Satish S'; Young, Milan Cc: 'Arthur Barstow'; 'public-webapps'; public-xg-htmlspe...@w3.org Subject: RE: to add Speech API to Charter; deadline January 19 Deborah- Is there a draft priority list in existence? I like the idea of getting good enough out there sooner, especially as an implementer with real projects in the space. Dave -Original Message- From: Deborah Dahl [mailto:d...@conversational-technologies.com] Sent: Friday, January 13, 2012 10:43 AM To: 'Satish S'; 'Young, Milan' Cc: 'Arthur Barstow'; 'public-webapps'; public-xg-htmlspe...@w3.org Subject: RE: to add Speech API to Charter; deadline January 19 Olli has a good point that it makes sense to implement the SpeechAPI in pieces. That doesn't mean that the WebApps WG only has to look at one proposal in deciding how to proceed with the work. Another option would be to start off the Speech API work in the Web Apps group with both proposals (the Google proposal and the SpeechXG report) and let the editors prioritize the order that the different aspects of the API are worked out and published as specs. -Original Message- From: Satish S [mailto:sat...@google.com] Sent: Thursday, January 12, 2012 5:01 PM To: Young, Milan Cc: Arthur Barstow; public-webapps; public-xg-htmlspe...@w3.org Subject: Re: to add Speech API to Charter; deadline January 19 Milan, It looks like we fundamentally agree on several things: * That we'd like to see the JavaScript Speech API included in the WebApps' charter.* That we believe the wire protocol is best suited for another organization, such as IETF.* That we believe the markup bindings may be excluded. Our only difference seems to be whether to start with the extensive Javascript API proposed in [1] or the simplified subset of it proposed in [2] which supports majority of the use cases in the XG's Final Report. Art Barstow asked for a relatively specific proposal and provided some precedence examples regarding the level of detail. [3] Olli Pettay wrote in [4] Since from practical point of view the API+protocol XG defined is a huge thing to implement at once, it makes sense to implement it in pieces. Starting with a baseline that supports the majority of use cases will accelerate implementation, interoperability testing, standardization and ultimately developer adoption. Cheers Satish [1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/[2] http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att- 1696/speechapi.html[3] http://lists.w3.org/Archives/Public/public- webapps/2011OctDec/1474.html[4] http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0068.htm l On Thu, Jan 12, 2012 at 5:46 PM, Young, Milan milan.yo...@nuance.com wrote: I've made the point a few times now, and would appreciate a response. Why are we preferring to seed WebApps speech with [2] when we already have [3] that represents industry consensus as of a month ago (Google not withstanding)? Proceeding with [2] would almost surely delay the resulting specification as functionality would patched and haggled over to meet consensus. My counter proposal is to open the HTML/speech marriage in WebApps essentially where we left off at [3]. The only variants being: 1) Dropping the markup bindings in sections 7.1.2/7.1.3 because its primary supporter has since expressed non-interest, and 2) Spin the protocol specification in 7.2 out to the IETF. If I need to formalize all of this in a document, please let me know. Thank you [3] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/ -Original Message- From: Arthur Barstow [mailto:art.bars...@nokia.com] Sent: Thursday, January 12, 2012 4:31 AM To: public-webapps Cc: public-xg-htmlspe...@w3.org Subject: CfC: to add Speech API to Charter; deadline January 19 Glen Shires and some others at Google proposed [1] that WebApps add Speech API to WebApps' charter and they put forward the Speech Javascript API Specification [2] as as a starting point. Members of Mozilla and Nuance have voiced various levels of support for this proposal. As such, this is a Call for Consensus to add Speech API to WebApps' charter. Positive response to this CfC
RE: to add Speech API to Charter; deadline January 19
I've made the point a few times now, and would appreciate a response. Why are we preferring to seed WebApps speech with [2] when we already have [3] that represents industry consensus as of a month ago (Google not withstanding)? Proceeding with [2] would almost surely delay the resulting specification as functionality would patched and haggled over to meet consensus. My counter proposal is to open the HTML/speech marriage in WebApps essentially where we left off at [3]. The only variants being: 1) Dropping the markup bindings in sections 7.1.2/7.1.3 because its primary supporter has since expressed non-interest, and 2) Spin the protocol specification in 7.2 out to the IETF. If I need to formalize all of this in a document, please let me know. Thank you [3] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/ -Original Message- From: Arthur Barstow [mailto:art.bars...@nokia.com] Sent: Thursday, January 12, 2012 4:31 AM To: public-webapps Cc: public-xg-htmlspe...@w3.org Subject: CfC: to add Speech API to Charter; deadline January 19 Glen Shires and some others at Google proposed [1] that WebApps add Speech API to WebApps' charter and they put forward the Speech Javascript API Specification [2] as as a starting point. Members of Mozilla and Nuance have voiced various levels of support for this proposal. As such, this is a Call for Consensus to add Speech API to WebApps' charter. Positive response to this CfC is preferred and encouraged and silence will be considered as agreeing with the proposal. The deadline for comments is January 19 and all comments should be sent to public-webapps at w3.org. -AB [1] http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html [2] http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s peechapi.html
RE: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization
The HTML Speech XG worked for over a year prioritizing use cases against timelines and packaged all of that into a recommendation complete with IDLs and examples. So while I understand that WebApps may not have the time to review the entirety of this work, it's hard to see how dissecting it would speed the process of understanding. Perhaps a better approach would be to find half an hour to present to select members of WebApps the content of the recommendation and the possible relevance to their group. Does that sound reasonable? Thanks From: Glen Shires [mailto:gshi...@google.com] Sent: Wednesday, January 04, 2012 11:15 PM To: public-webapps@w3.org Cc: public-xg-htmlspe...@w3.org; Arthur Barstow; Dan Burnett Subject: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization As Dan Burnett wrote below: The HTML Speech Incubator Group [1] has recently wrapped up its work on use cases, requirements, and proposals for adding automatic speech recognition (ASR) and text-to-speech (TTS) capabilities to HTML. The work of the group is documented in the group's Final Report. [2] The members of the group intend this work to be input to one or more working groups, in W3C and/or other standards development organizations such as the IETF, as an aid to developing full standards in this space. Because that work was so broad, Art Barstow asked (below) for a relatively specific proposal. We at Google are proposing that a subset of it be accepted as a work item by the Web Applications WG. Specifically, we are proposing this Javascript API [3], which enables web developers to incorporate speech recognition and synthesis into their web pages. This simplified subset enables developers to use scripting to generate text-to-speech output and to use speech recognition as an input for forms, continuous dictation and control, and it supports the majority of use-cases in the Incubator Group's Final Report. We welcome your feedback and ask that the Web Applications WG consider accepting this Javascript API [3] as a work item. [1] charter: http://www.w3.org/2005/Incubator/htmlspeech/charter [2] report: http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/ [3] API: http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s peechapi.html Bjorn Bringert Satish Sampath Glen Shires On Thu, Dec 22, 2011 at 11:38 AM, Glen Shires gshi...@google.com wrote: Milan, The IDLs contained in both documents are in the same format and order, so it's relatively easy to compare the two side http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#sp eechreco-section -by-side http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/ speechapi.html#api_description . The semantics of the attributes, methods and events have not changed, and both IDLs link directly to the definitions contained in the Speech XG Final Report. As you mention, we agree that the protocol portions of the Speech XG Final Report are most appropriate for consideration by a group such as IETF, and believe such work can proceed independently, particularly because the Speech XG Final Report has provided a roadmap for these to remain compatible. Also, as shown in the Speech XG Final Report - Overview http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#in troductory , the Speech Web API is not dependent on the Speech Protocol and a Default Speech service can be used for local or remote speech recognition and synthesis. Glen Shires On Thu, Dec 22, 2011 at 10:32 AM, Young, Milan milan.yo...@nuance.com wrote: Hello Glen, The proposal says that it contains a simplified subset of the JavaScript API. Could you please clarify which elements of the HTMLSpeech recommendation's JavaScript API were omitted? I think this would be the most efficient way for those of us familiar with the XG recommendation to evaluate the new proposal. I'd also appreciate clarification on how you see the protocol being handled. In the HTMLSpeech group we were thinking about this as a hand-in-hand relationship between W3C and IETF like WebSockets. Is this still your (and Google's) vision? Thanks From: Glen Shires [mailto:gshi...@google.com] Sent: Thursday, December 22, 2011 11:14 AM To: public-webapps@w3.org; Arthur Barstow Cc: public-xg-htmlspe...@w3.org; Dan Burnett Subject: Re: HTML Speech XG Completes, seeks feedback for eventual standardization We at Google believe that a scripting-only (Javascript) subset of the API defined in the Speech XG Incubator Group Final Report is of appropriate scope for consideration by the WebApps WG. The enclosed scripting-only subset supports the majority of the use-cases and samples in the XG proposal. Specifically, it enables web-pages to generate speech output and to use speech recognition as an input for forms, continuous dictation and control. The Javascript API will allow
RE: HTML Speech XG Completes, seeks feedback for eventual standardization
Hello Glen, The proposal says that it contains a simplified subset of the JavaScript API. Could you please clarify which elements of the HTMLSpeech recommendation's JavaScript API were omitted? I think this would be the most efficient way for those of us familiar with the XG recommendation to evaluate the new proposal. I'd also appreciate clarification on how you see the protocol being handled. In the HTMLSpeech group we were thinking about this as a hand-in-hand relationship between W3C and IETF like WebSockets. Is this still your (and Google's) vision? Thanks From: Glen Shires [mailto:gshi...@google.com] Sent: Thursday, December 22, 2011 11:14 AM To: public-webapps@w3.org; Arthur Barstow Cc: public-xg-htmlspe...@w3.org; Dan Burnett Subject: Re: HTML Speech XG Completes, seeks feedback for eventual standardization We at Google believe that a scripting-only (Javascript) subset of the API defined in the Speech XG Incubator Group Final Report is of appropriate scope for consideration by the WebApps WG. The enclosed scripting-only subset supports the majority of the use-cases and samples in the XG proposal. Specifically, it enables web-pages to generate speech output and to use speech recognition as an input for forms, continuous dictation and control. The Javascript API will allow web pages to control activation and timing and to handle results and alternatives. We welcome your feedback and ask that the Web Applications WG consider accepting this as a work item. Bjorn Bringert Satish Sampath Glen Shires On Tue, Dec 13, 2011 at 11:39 AM, Glen Shires gshi...@google.com wrote: We at Google believe that a scripting-only (Javascript) subset of the API defined in the Speech XG Incubator Group Final Report [1] is of appropriate scope for consideration by the WebApps WG. A scripting-only subset supports the majority of the use-cases and samples in the XG proposal. Specifically, it enables web-pages to generate speech output and to use speech recognition as an input for forms, continuous dictation and control. The Javascript API will allow web pages to control activation and timing and to handle results and alternatives As Dan points out above, we envision that different portions of the Incubator Group Final Report are applicable to different working groups in W3C and/or other standards development organizations such as the IETF. This scripting API subset does not preclude other groups from pursuing standardization of relevant HTML markup or underlying transport protocols, and indeed the Incubator Group Final Report defines a potential roadmap such that such additions can be compatible. To make this more concrete, Google will provide to this mailing list a specific proposal extracted from the Incubator Group Final Report, that includes only those portions we believe are relevant to WebApps, with links back to the Incubator Report as appropriate. Bjorn Bringert Satish Sampath Glen Shires [1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/ On Tue, Dec 13, 2011 at 5:32 AM, Dan Burnett dburn...@voxeo.com wrote: Thanks for the info, Art. To be clear, I personally am *NOT* proposing adding any specs to WebApps, although others might. My email below as a Chair of the group is merely to inform people of this work and ask for feedback. I expect that your information will be useful for others who might wish for some of this work to continue in WebApps. -- dan On Dec 13, 2011, at 7:06 AM, Arthur Barstow wrote: Hi Dan, WebApps already has a relatively large number of specs in progress (see [PubStatus]) and the group has agreed to add some additional specs (see [CharterChanges]). As such, please provide a relatively specific proposal about the features/specs you and other proponents would like to add to WebApps. Regarding the level of detail for your proposal, I think a reasonable precedence is something like the Gamepad and Pointer/MouseLock proposals (see [CharterChanges]). (Perhaps this could be achieved by identifying specific sections in the XG's Final Report?) -Art Barstow [PubStatus] http://www.w3.org/2008/webapps/wiki/PubStatus#API_Specifications [CharterChanges] http://www.w3.org/2008/webapps/wiki/CharterChanges#Additions_Agreed On 12/12/11 5:25 PM, ext Dan Burnett wrote: Dear WebApps people, The HTML Speech Incubator Group [1] has recently wrapped up its work on use cases, requirements, and proposals for adding automatic speech recognition (ASR) and text-to-speech (TTS) capabilities to HTML. The work of the group is documented in the group's Final Report. [2] The members of the group intend this work to be input to one or more working groups, in W3C and/or other standards development organizations such as the IETF, as an aid to developing full standards in this space. Whether the W3C work happens in a new Working Group or an existing one, we are interested in collecting feedback on the Incubator Group's