Re: [whatwg] Fetch, MSE, and MIX
Ryan, Proposals like this might allow video-intensive sites to migrate to HTTPS sooner than otherwise and are thus very welcome. This one was originally suggested by Anne Van Kesteren, I believe. Or at least something very similar. However, this particular proposal suffers (IIUC) from the disadvantage that users are likely to be presented with mixed content warnings. That's not an acceptable user experience for a professional commercial service. I understand the reasons that mixed content warnings are presented: the properties of the HTTP media requests do not align with the user expectation of privacy and security which is set by the presence of the green padlock or other UI indications of secure transport. A viable interim solution - without such warnings - either needs to avoid setting this expectation or to include additional measures such that the warnings were not necessary. If the latter, we'd need to evaluate whether such measures were worthwhile as an interim step or whether the investment would be better spent on the move to HTTPS proper. …Mark On Thu, Feb 19, 2015 at 9:06 PM, Ryan Sleevi sle...@google.com wrote: Cross-posting, as this touches on the Fetch [1] spec, Media Source Extensions [2], and Mixed Content [3]. This does cross-post WHATWG and W3C, apologies if this is a mortal sin. TL;DR Proposal first: - Amend MIX in [4] to add fetch as an optionally-blockable-request-context * This means that fetch() can now return HTTP content from HTTPS pages. The implications of this, however, are described below, if you can handle reading it all. - Amend MSE in [5] to introduce a new method, appendResponse(Response response), which accepts a Response [6] class - In MSE, define a Response Append Loop similar to the Stream Append Loop [7], that calls the consume body algorithm [8] on the internal response [9] of Response to yield an ArrayBuffer, then executes the buffer append [10] algorithm on the SourceBuffer MUCH longer justification why: As it stands, audio/video/source tags today are optionally blockable content, as noted in [4]. Thus, an HTTPS page may set the source to HTTP content and load the content (although typically with user-agent indication). MSE poses itself as a spec to offer much greater control to site authors than audio/video, as noted in its use cases, and as a result, has seen a rapid adoption among a number of popular video streaming sites. Most notably, the ability to do adaptive streaming with MSE helps provide a better quality, better performing experience for users. Finally, in some user agents, MSE is a pre-requisite for the use of Encrypted Media Extensions [11]. However, there are limitations to using MSE that don't exist with video/audio. The most notable of these is that in order to implement the adaptive streaming capabilities, most sites make use of XMLHttpRequest to request portions of media content, which can then be supplied to the SourceBuffer. Based on the feedback that MSE provides the script author, it can then adjust the XHRs they make to use a lower bitrate media source, to drop segments, etc. When using XHR, the site author loses the ability to mix HTTPS pages with HTTP media, as XHR is (rightfully so) treated as blocked content. The justification for why XHR does this is that it returns the full buffer to the page author. In practice, we saw many sites then taking that buffer and making security decisions on it - whether it be clearly bad things such as eval()ing the content to more subtle things like adjusting UI or links. All of these undermine all of the security guarantees that HTTPS tries to provide, and thus XHR is blocked. The result is that if an HTTPS site wants to use MSE with XHR, all of the content needs to be served via HTTPS. We've already seen some providers complain that this is prohibitively expensive in their current networks [12], although it may be solvable in time, as demonstrated by other video sharing sites [13]. In a choice between using MSE - which offers a better user experience over video/audio by reducing bandwidth and improving quality - and using HTTPS - which offers better privacy and security controls - sites are likely to choose solutions that reduce their costs rather than protect their users, a reasonable but unfortunate business reality. I'm hoping to find a way to close that gap - to allow sites to use MSE (and potentially EME) via HTTPS documents, while still sourcing their media content via HTTP. This may seem counter-intuitive, and a step back from the efforts of the Chrome security team, but I think it is actually consistent with our goals and our past comments. In particular, this solution tries to provide a means and incentive for sites to adopt MSE (improving user experience) AND to begin migrating to HTTPS; first with their main document, and then, in time, all of their media content. This won't protect adversaries from knowing what
Re: [whatwg] Fetch, MSE, and MIX
On Fri, Mar 6, 2015 at 3:00 PM, Ryan Sleevi sle...@google.com wrote: On Fri, Mar 6, 2015 at 2:43 PM, Mark Watson wats...@netflix.com wrote: Ryan, Proposals like this might allow video-intensive sites to migrate to HTTPS sooner than otherwise and are thus very welcome. This one was originally suggested by Anne Van Kesteren, I believe. Or at least something very similar. However, this particular proposal suffers (IIUC) from the disadvantage that users are likely to be presented with mixed content warnings. That's not an acceptable user experience for a professional commercial service. Well, it's an accurate presentation of the security state of said commercial service. That is, it is actively disclosing your activities to anyone on the network, for any number of purposes - profiling, tracking, analysis, etc. I wish we could say these were purely the result of paranoia or theoretical concerns, but we have ample evidence that they are real, practical, and concerning. Given how many professional commercial services' offerings (at least in the context of MSE) are often in business competition with the transit providers, I do find it somewhat surprising that this is seen as a desirable state - that is, where the commercial services disclose their users' viewing habits, preferences, and profiles to those who would most benefit from the competitive insight. But I digress... I didn't say the mixed content warnings were not accurate and I didn't say that disclosing viewing habits was desirable. I understand the reasons that mixed content warnings are presented: the properties of the HTTP media requests do not align with the user expectation of privacy and security which is set by the presence of the green padlock or other UI indications of secure transport. A viable interim solution - without such warnings - either needs to avoid setting this expectation or to include additional measures such that the warnings were not necessary. If the latter, we'd need to evaluate whether such measures were worthwhile as an interim step or whether the investment would be better spent on the move to HTTPS proper. Well, it's between unwise to impossible to meaningfully address the privacy concerns, even if you were to attempt to address the security concerns. That is, you're still disclosing activity to the network observer, and that fundamentally undermines the confidentiality goal of HTTPS. As it stands, the current design of MSE (and, therefore, EME, as some EME implementations require MSE) incentivizes media sites towards HTTP as the path of least resistance / least UI. However, in doing so, it also leaves users open to serious privacy and security risks, both in the local level (EME) but also at the overall platform level (cookies, local storage, etc). I would dispute that EME per se necessarily opens users to serious privacy and security risks. If done wrong, sure, but this is true of anything. I'm hopeful that UA vendors will provide EME solutions that are relatively benign in this respect and I know a number who are trying very hard to do so. It's also worth considering how proposals like [1][2] will affect the overall UI state, such that HTTP would appear, to the user, worse than mixed content (as the active injection of arbitrary code is far worse than the tracking disclosure). Or to consider how deprecation plans for powerful features, such as those in [3][4], might see the usage of features (such as EME) over HTTP be presented with access permissions or negative UI. In this world in which the platform reasonably and rightfully is moving away from HTTP - both for powerful features (as per [3]/[4]), but also as the Everything is fine, nothing to worry about state (as per [1]/[2]) - whether this provides a meaningful interim step between the Actively insecure and Meaningfully secured from network attacks status. All very worthy of consideration, as you say. Again, if browsers do provide a viable interim step, that could get sites improved security sooner than they otherwise would. …Mark [1] http://www.chromium.org/Home/chromium-security/marking-http-as-non-secure [2] https://lists.w3.org/Archives/Public/public-webappsec/2014Dec/0062.html [3] https://lists.w3.org/Archives/Public/public-webappsec/2015Feb/0431.html [4] https://groups.google.com/a/chromium.org/d/topic/blink-dev/2LXKVWYkOus/discussion
Re: [whatwg] Expose XMLHttpRequest [Fetch?] priority
All, FYI ... see below about priorities in XmlHttpRequest. Might be useful for Cadmium (seems some browsers support something already). In JS-ASE on NRDP, we are expressing priority through the model of download tracks. Each request is prioritized relative to other requests on the same download track (the priority order is the same order as the requests are issued), but is not prioritized at all vs requests on other tracks. This aligns with the NRDP implementation. An ideal way to specify priority for streaming would be to have a large number of priority values and give each request the start PTS as its priority (or zero for headers). ... Mark -- Forwarded message -- From: Chad Austin caus...@gmail.com Date: Wed, Oct 1, 2014 at 10:54 AM Subject: [whatwg] Expose XMLHttpRequest [Fetch?] priority To: wha...@whatwg.org Hi all, I posted the following message to WebApps, but Anne van Kesteren suggested that I instead post to WHATWG, and generalize my request to anything that supports Fetch. When reading below, feel free to interpret XMLHttpRequest in the broadest sense. The proposal follows: *** I would like to see a priority field added to XMLHttpRequest. Mike Belshe's proposal here is a great start: http://www.mail-archive.com/public-webapps@w3.org/msg08218.html *Motivation* Browsers already prioritize network requests. By giving XMLHttpRequest access to the same machinery, the page or application can reduce overall latency and make better use of available bandwidth. I wrote about our specific use case (efficiently streaming hundreds of 3D assets into WebGL) in detail at http://chadaustin.me/2014/08/web-platform-limitations-xmlhttprequest-priority/ Gecko appears to support a general 32-bit priority: http://lxr.mozilla.org/mozilla-central/source/xpcom/threads/nsISupportsPriority.idl and http://lxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpBaseChannel.cpp#45 Chrome appears to be limited to five priorities: https://code.google.com/p/chromium/codesearch#chromium/src/net/base/request_priority.hsq=package:chromiumtype=csrcl=1411964872 but seems to have a fairly general priority queue implementation, so increasing the number of priorities is likely straightforward. https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/loader/resource_scheduler.ccsq=package:chromiumtype=csrcl=1411964872l=206 SPDY exposes 3 bits of priority per stream. *Proposal* Add a numeric priority property to XMLHttpRequest. It is a 3-bit integer from 0 to 7. Default to 3. 0 is most important. Why integers and not strings, as others have proposed? Because priority arithmetic is convenient. For example, in our use case, we might say The top bit is set by whether an asset is high-resolution or low-resolution. Low-resolution assets would be loaded first. The bottom two bits are used to group request priorities by object. The 3D scene might be the most important resource, followed by my avatar, followed by closer objects, followed by farther objects. Note that, with a very simple use case, we've just consumed all three bits. There's some vague argument that having fewer priorities makes implementing prioritization easier, but as we've seen, the browsers just have a priority queue anyway. Allow priorities to change after send() is called. The browser may ignore this change. It could also ignore the priority property entirely. I propose XMLHttpRequest priority not be artificially limited to a range of priorities relative to other resources the browser might initiate. That is, the API should expose the full set of priorities the browser supports. If my application wants to prioritize an XHR over some browser-initiated request, it should be allowed to do so. The more control over priority available, the better a customer experience can be built. For example, at the logical extreme, fine-grained priority levels and mutable priority values would allow dynamically streaming and reprioritizing texture mip levels as objects approach the camera. If there's enough precision, the application could set priority of an object to the distance from the camera. Or, in a non-WebGL scenario, an image load's priority could be set to the distance from the current viewport. I believe this proposal is very easy to implement: just plumb the priority value through to the prioritizing network layer browsers already implement. What will it take to get this added to the spec? -- Chad Austin http://chadaustin.me
Re: [whatwg] Expose XMLHttpRequest [Fetch?] priority
For streaming video, where data is requested block-by-block, it would be nice to be able to set priority proportional to the time until the requested video block is needed for rendering. Or at least for relative priorities between requests to reflect this. I think this means that either priorities need to be mutable, so I can increase the priority as time passes and the need for the block becomes more urgent, or I need a large priority space, so that, for example, I can just give each block a fixed priority value equal to it's absolute position in the playback sequence. ...Mark On Wed, Oct 1, 2014 at 10:54 AM, Chad Austin caus...@gmail.com wrote: Hi all, I posted the following message to WebApps, but Anne van Kesteren suggested that I instead post to WHATWG, and generalize my request to anything that supports Fetch. When reading below, feel free to interpret XMLHttpRequest in the broadest sense. The proposal follows: *** I would like to see a priority field added to XMLHttpRequest. Mike Belshe's proposal here is a great start: http://www.mail-archive.com/public-webapps@w3.org/msg08218.html *Motivation* Browsers already prioritize network requests. By giving XMLHttpRequest access to the same machinery, the page or application can reduce overall latency and make better use of available bandwidth. I wrote about our specific use case (efficiently streaming hundreds of 3D assets into WebGL) in detail at http://chadaustin.me/2014/08/web-platform-limitations-xmlhttprequest-priority/ Gecko appears to support a general 32-bit priority: http://lxr.mozilla.org/mozilla-central/source/xpcom/threads/nsISupportsPriority.idl and http://lxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpBaseChannel.cpp#45 Chrome appears to be limited to five priorities: https://code.google.com/p/chromium/codesearch#chromium/src/net/base/request_priority.hsq=package:chromiumtype=csrcl=1411964872 but seems to have a fairly general priority queue implementation, so increasing the number of priorities is likely straightforward. https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/loader/resource_scheduler.ccsq=package:chromiumtype=csrcl=1411964872l=206 SPDY exposes 3 bits of priority per stream. *Proposal* Add a numeric priority property to XMLHttpRequest. It is a 3-bit integer from 0 to 7. Default to 3. 0 is most important. Why integers and not strings, as others have proposed? Because priority arithmetic is convenient. For example, in our use case, we might say The top bit is set by whether an asset is high-resolution or low-resolution. Low-resolution assets would be loaded first. The bottom two bits are used to group request priorities by object. The 3D scene might be the most important resource, followed by my avatar, followed by closer objects, followed by farther objects. Note that, with a very simple use case, we've just consumed all three bits. There's some vague argument that having fewer priorities makes implementing prioritization easier, but as we've seen, the browsers just have a priority queue anyway. Allow priorities to change after send() is called. The browser may ignore this change. It could also ignore the priority property entirely. I propose XMLHttpRequest priority not be artificially limited to a range of priorities relative to other resources the browser might initiate. That is, the API should expose the full set of priorities the browser supports. If my application wants to prioritize an XHR over some browser-initiated request, it should be allowed to do so. The more control over priority available, the better a customer experience can be built. For example, at the logical extreme, fine-grained priority levels and mutable priority values would allow dynamically streaming and reprioritizing texture mip levels as objects approach the camera. If there's enough precision, the application could set priority of an object to the distance from the camera. Or, in a non-WebGL scenario, an image load's priority could be set to the distance from the current viewport. I believe this proposal is very easy to implement: just plumb the priority value through to the prioritizing network layer browsers already implement. What will it take to get this added to the spec? -- Chad Austin http://chadaustin.me
[whatwg] Media queries for multichannel audio ?
Is there anything like this ? I'd like a media element to be able to select the multi-channel version of some audio only when the device will output multiple channels and select a stereo version otherwise (rather than downmixing). This would save network bandwidth as well as providing better quality (if the custom-mixed stereo audio is likely better than the end-device down-mixed version). Seems like this would be handled in the resource selection algorithm by checking if the media attribute of the source matches the environment, but I can't find how to specific audio characteristics in a media query. …Mark
Re: [whatwg] Adding a btoa overload that takes Uint8Array
Whilst on the topic of base64, has anyone considered adding support for base64url ? …Mark On Mar 4, 2013, at 10:29 AM, Kenneth Russell wrote: On Mon, Mar 4, 2013 at 10:04 AM, Joshua Bell jsb...@chromium.org wrote: On Mon, Mar 4, 2013 at 9:09 AM, Boris Zbarsky bzbar...@mit.edu wrote: The problem I'm trying to solve is sending Unicode text to consumers who need base64-encoded input. Right now the only sane way to do it (and I quote sane for obvious reasons) is something like the example at https://developer.mozilla.org/**en-US/docs/DOM/window.btoa#** Unicode_Stringshttps://developer.mozilla.org/en-US/docs/DOM/window.btoa#Unicode_Strings It seems like it would be better if the output of a TextEncoder could be passed directly to btoa. But for that we need an overload of btoa that takes a Uint8Array. FYI, I believe the last iteration on this topic ended with this message: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-June/036372.html i.e. consensus that base64 should stay out of the Encoding API, but that it would be nice to have some form of base64 / Typed Array conversion API. But there were no concrete proposals beyond my strawman in that post. So: agreed, have at it! Yes, adding an overload of btoa taking Uint8Array sounds good.
Re: [whatwg] Declarative unload data
Hi Boris, On Aug 22, 2012, at 5:14 PM, Boris Zbarsky wrote: On 8/22/12 4:53 PM, Mark Watson wrote: Also, we've considered heartbeat type solutions, which whilst better than nothing are vulnerable to an attack in which the heartbeat messages are blocked. I'd like to understand this better. Would such an attack not also work on XHR? It would, but the effect would be different. Blocked heartbeats would cause the server to think that streaming had stopped, when in fact it was continuing. The service underestimates how much streaming there is. Blocked 'stop' messages would cause the server to think that streaming was continuing, when in fact it had stopped. The service overestimates how much streaming there is. It so happens that for our business model, underestimating is much worse than overestimating. For a different business model, it might be the opposite. …Mark (I realize there are other issues with a heartbeat ping; just wanted to make sure I understand this particular issue properly.) -Boris
[whatwg] Sychronous operations in onclose handlers
Hi everyone, I heard that there was some discussion of banning the use of synchronous operations within document onclose handlers. Whilst it is obviously bad to hold up the closing of a document - especially indefinitely - I wondered what the solution was for applications that need to perform some final communication with an application server - say using XHR - when the document is closed ? Is it possible that events could continue to be handled for some short time after the onclose completes ? Something else ? I understand that there are cases where this is not possible - such as sudden loss of power - but that does't mean it isn't useful to be able to perform such operations in the normal, common, case. …Mark
Re: [whatwg] Declarative unload data
All, I'd like to add a use-case to this thread: We have a video streaming service available to paying subscribers. Users navigate to our page and choose a video they would like to watch. A Javascript application on the page contacts an application server to authenticate and authorize the user. The application server provides the script with information needed to stream the video and then the video is streamed using the video element. The video content is stored on standard web servers, so video streaming can continue without further interaction with the application server. For various business reasons we need to know, at any given time and as far as possible, which subscribers are streaming. So we need to know when the user stops streaming the video. That's the use case. It's easy to solve in the case that the user simply stops the video and remains on our page. The problem is when the user closes the page or browser during streaming. Clearly, we cannot know 100% when users stop streaming, due to events like browser crashes or sudden loss of power. It is sufficient for us to know most of the time. It would be insufficient, however, if we could not cover the case of users closing windows or the browser. Also, we would like to receive confirmation at the client that the server has received the final message before deleting state on the client. This is so that, if this confirmation is not received, we can resend the final message next time we have a chance. Also, we've considered heartbeat type solutions, which whilst better than nothing are vulnerable to an attack in which the heartbeat messages are blocked. We have been thinking of solutions along the lines of using XHR during in an onclose handler. With current browser implementations we would need to use XHR in synchronous mode. We would also need any operations we need to prepare the request to be synchronous (for example WebCrypto operations we plan to use for security). It would simplify things if we were able to use the normal async operations - that is, if processing could continue after onclose for sufficient time to complete the operations we need. I'd be very happy to hear what people on this list think of the problem and how we might solve it. Thanks, Mark Watson On May 7, 2012, at 11:25 PM, Jonas Sicking wrote: On Mon, May 7, 2012 at 12:30 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: On Mon, May 7, 2012 at 9:05 PM, Jonas Sicking jo...@sicking.cc wrote: On Mon, May 7, 2012 at 8:59 AM, Boris Zbarsky bzbar...@mit.edu wrote: On 5/7/12 11:53 AM, Tab Atkins Jr. wrote: Yes, definitely (unless you set .withCredentials on it or something, like the XHR attribute). Hold on. If you _do_ set withCredentials, you should be required to pass the credentials in or something. Under no circumstances would prompting for credentials for a request associated with an already-unloaded page be OK from my point of view There seems to be some confusion here regarding how withCredentials works. First of all withCredentials is a CORS thing. CORS requests *never* pop up an authentication dialog. (There is also the question of if we want to support CORS here, I suspect we do). But I totally agree with Boris that we can't ever pop up security dialogs for a site that the user has left. I definitely agree that we never pop up an auth dialog for an unloadHandler request. That's just silly. If I'm understanding XHR's withCredentials flag, it just sends the *existing* ambient credentials, to apply against HTTP auth (along with cookies and such). It doesn't prompt you for anything if you don't already have ambient credentials for a given site, right? Correct. / Jonas
Re: [whatwg] Proposal for a MediaSource API that allows sending media data to a HTMLMediaElement
On Aug 12, 2011, at 10:01 AM, Aaron Colwell wrote: Hi Mark, comments inline... On Thu, Aug 11, 2011 at 9:46 AM, Mark Watson wats...@netflix.commailto:wats...@netflix.com wrote: I think it would be good if the API recognized the fact that the media data may becoming from several different original files/streams (e.g. different bitrates) as the player adapts to network or other conditions. I agree. I intend to document this when I spec out the format of the byte stream that is passed into this API. Initially I'm focusing on WebM which requires this type of functionality if the Vorbis initialization data ever needs to change during playback. My intuition says that Ogg MP4 will require similar solutions. The different files may have different initialization information (Info and Tracks in WebM, Movie Box in mp4 etc.), which could be provided either in the first append call for each stream or with a separate API call. But subsequently you need to know which initialization information is relevant for each appended block. An integer streamId in the append call would be sufficient - the absolute value has no meaning - it would just associate data from the same stream across calls. Since I'm using WebM for the byte stream I don't need to add explicit streamIds to the API or data. StreamIDs are already in the byte stream. Ogg bitstream serial numbers, and MP4 track numbers should serve the same purpose. I may have inadvertently overloaded stream id. And I'm assuming that the different bitrates essentially come from different media files. If you use the track id in mp4 (or it's equivalent in WebM) then you require that there is a level of coordination in the creation of the different bitrate files: they must all use distinct track ids. To add a new bitrate you need you need to know what track ids were used in the old ones and pick a distinct one. When people get it wrong you have a difficult-to-detect failure mode. The alternatives are: (a) to require that all streams have the same or compatible initialization information or (b) to pass the initialization information every time you change streams (a) has the disadvantage of constraining encoding, and making adding new streams more dependent on the details of how the existing streams were encoded/packaged (b) is ok, except that it is nice for the player to know this data is from the same stream you were playing a while ago - it can re-use some previously established state - rather than every stream change being 'out of the blue'. I'm leaning toward (b) right now. Any time a change in stream parameters is needed new INFO TRACKS elements will be appended before the media data from the new source. This is similar to how Ogg chaining works. I don't think we need unique IDs for marking this state. The media engine can look at the new codec config data and see if it matches anything it has seen before. If so then it can simply reuse whatever resources it see fit. Another thing to note is that just because we append this data every time a stream switch occurs, it doesn't mean we have to transfer that data across the network each time. JavaScript can cache this data and simply append it when necessary. That's fine for me. It needs to be clear in the API that this is the expected mode of operation. We can word this in a way that is independent of media format. A separate comment is that practically we have found it very useful for the media player to know the maximum resolution, frame rate and codec level/profile that will be used, which may be different from the resolution and codec/level/profile of the first stream. I agree that this info is useful, but it isn't clear to me that this API needs to support that. Existing APIs like canPlayType()http://www.w3.org/TR/html5/video.html#dom-navigator-canplaytype could be used to determine whether specific codec parameters are supported. Other DOM APIs could be used to determine max screen size. This could all be used to prune the candidate streams sent to the MediaSource API. True, but I wasn't thinking so much of determining whether playback is supported, but of warning the media pipeline of what might be coming so that it can dimension various resources appropriately. This may just be a matter of feeding the header for the highest resolution/profile stream first, even if you don't feed any media data for that stream. It's possible some players will not support switching resolution to a resolution higher than that established at the start of playback (at least we have found that to be the case with some embedded media pipelines today). ...Mark Aaron
Re: [whatwg] Proposal for a MediaSource API that allows sending media data to a HTMLMediaElement
Hi Aaron, I think it would be good if the API recognized the fact that the media data may becoming from several different original files/streams (e.g. different bitrates) as the player adapts to network or other conditions. The different files may have different initialization information (Info and Tracks in WebM, Movie Box in mp4 etc.), which could be provided either in the first append call for each stream or with a separate API call. But subsequently you need to know which initialization information is relevant for each appended block. An integer streamId in the append call would be sufficient - the absolute value has no meaning - it would just associate data from the same stream across calls. The alternatives are: (a) to require that all streams have the same or compatible initialization information or (b) to pass the initialization information every time you change streams (a) has the disadvantage of constraining encoding, and making adding new streams more dependent on the details of how the existing streams were encoded/packaged (b) is ok, except that it is nice for the player to know this data is from the same stream you were playing a while ago - it can re-use some previously established state - rather than every stream change being 'out of the blue'. A separate comment is that practically we have found it very useful for the media player to know the maximum resolution, frame rate and codec level/profile that will be used, which may be different from the resolution and codec/level/profile of the first stream. ...Mark On Jul 11, 2011, at 11:42 AM, Aaron Colwell wrote: Hi, Based on comments in the File API Streaming Blobshttp://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-January/029973.html thread and my Extending HTML 5 video for adaptive streaminghttp://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/032277.html thread, I decided on taking a stab at writing a MediaSource API spechttp://html5-mediasource-api.googlecode.com/svn/trunk/draft-spec/mediasource-draft-spec.html for streaming data to a media tag. Please take a look at the spechttp://html5-mediasource-api.googlecode.com/svn/trunk/draft-spec/mediasource-draft-spec.htmland provide some feedback. I've tried to start with the simplest thing that would work and hope to expand from there if need be. For now, I'm intentionally not trying to solve the generic streaming file case because I believe there might be media specific requirements around handling seeking especially if we intend to support non-packetized media streams like WAV. If the feedback is generally positive on this approach, I'll start working on patches for WebKit Chrome so people can experiment with an actual implementation. Thanks, Aaron
Re: [whatwg] Video feedback
On Jun 9, 2011, at 4:32 PM, Eric Carlson wrote: On Jun 9, 2011, at 12:02 AM, Silvia Pfeiffer wrote: On Thu, Jun 9, 2011 at 4:34 PM, Simon Pieters sim...@opera.com wrote: On Thu, 09 Jun 2011 03:47:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.com wrote: For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. OK, all this indicates to me that we probably want a metadatachanged event to indicate there has been a change and that JS may need to check some of its assumptions. We already have durationchange. Duration is metadata. If we want to support changes to width/height, and the script is interested in when that happens, maybe there should be a dimensionchange event (but what's the use case for changing width/height mid-stream?). Does the spec support changes to text tracks mid-stream? It's not about what the spec supports, but what real-world streams provide. I don't think it makes sense to put an event on every single type of metadata that can change. Most of the time, when you have a stream change, many variables will change together, so a single event is a lot less events to raise. It's an event that signifies that the media framework has reset the video/audio decoding pipeline and loaded a whole bunch of new stuff. You should imagine it as a concatenation of different media resources. And yes, they can have different track constitution and different audio sampling rate (which the audio API will care about) etc etc. In addition, it is possible for a stream to lose or gain an audio track. In this case the dimensions won't change but a script may want to react to the change in audioTracks. The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I agree with Silvia, a more generic metadata changed event makes more sense. Yes, and it should support the case in which text tracks are added/removed too. Also, as Eric (C) pointed out, one of the things which can change is which of several available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. ...Mark eric
Re: [whatwg] Video feedback
On Jun 20, 2011, at 10:42 AM, Silvia Pfeiffer wrote: On Mon, Jun 20, 2011 at 6:29 PM, Mark Watson wats...@netflix.commailto:wats...@netflix.com wrote: On Jun 9, 2011, at 4:32 PM, Eric Carlson wrote: On Jun 9, 2011, at 12:02 AM, Silvia Pfeiffer wrote: On Thu, Jun 9, 2011 at 4:34 PM, Simon Pieters sim...@opera.commailto:sim...@opera.com wrote: On Thu, 09 Jun 2011 03:47:49 +0200, Silvia Pfeiffer silviapfeiff...@gmail.commailto:silviapfeiff...@gmail.com wrote: For commercial video providers, the tracks in a live stream change all the time; this is not limited to audio and video tracks but would include text tracks as well. OK, all this indicates to me that we probably want a metadatachanged event to indicate there has been a change and that JS may need to check some of its assumptions. We already have durationchange. Duration is metadata. If we want to support changes to width/height, and the script is interested in when that happens, maybe there should be a dimensionchange event (but what's the use case for changing width/height mid-stream?). Does the spec support changes to text tracks mid-stream? It's not about what the spec supports, but what real-world streams provide. I don't think it makes sense to put an event on every single type of metadata that can change. Most of the time, when you have a stream change, many variables will change together, so a single event is a lot less events to raise. It's an event that signifies that the media framework has reset the video/audio decoding pipeline and loaded a whole bunch of new stuff. You should imagine it as a concatenation of different media resources. And yes, they can have different track constitution and different audio sampling rate (which the audio API will care about) etc etc. In addition, it is possible for a stream to lose or gain an audio track. In this case the dimensions won't change but a script may want to react to the change in audioTracks. The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I understood that to relate to a change of cues only, since it is on the tracklist. I.e. it's an aggregate event from the oncuechange event of a cue inside the track. I didn't think it would relate to a change of existence of that track. Note that the even is attached to the TrackList, not the TrackList[], so it cannot be raised when a track is added or removed, only when something inside the TrackList changes. Are we talking about the same thing ? There is no TrackList array and TrackList is only used for audio/video, not text, so I don't understand the comment about cues. I'm talking about http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#tracklist which is the base class for MultipleTrackList and ExclusiveTrackList used to represent all the audio and video tracks (respectively). One instance of the object represents all the tracks, so I would assume that a change in the number of tracks is a change to this object. I agree with Silvia, a more generic metadata changed event makes more sense. Yes, and it should support the case in which text tracks are added/removed too. Yes, it needs to be an event on the MediaElement. Also, as Eric (C) pointed out, one of the things which can change is which of seve ral available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. I don't know about the adaptive streaming situation. I think that is more about statistics/metrics rather than about change of resource. All the alternatives in an adaptive streaming resource should provide the same number of tracks and the same video dimensions, just at different bitrate/quality, no? I think of the different adaptive versions on a per-track basis (i.e. the alternatives are *within* each track), not a bunch of alternatives each of which contains several tracks. Both are possible, of course. It's certainly possible (indeed common) for different bitrate video encodings to have different resolutions - there are video encoding reasons to do this. Of course the aspect ratio should not change and nor should the dimensions on the screen (both would be a little peculiar for the user). Now, the videoWidth and videoHeight attributes of HTMLVideoElement are not the same as the resolution (for a start, they are in CSS pixels, which
Re: [whatwg] Video feedback
On Jun 20, 2011, at 11:52 AM, Silvia Pfeiffer wrote: On Mon, Jun 20, 2011 at 7:31 PM, Mark Watson wats...@netflix.com wrote: The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I understood that to relate to a change of cues only, since it is on the tracklist. I.e. it's an aggregate event from the oncuechange event of a cue inside the track. I didn't think it would relate to a change of existence of that track. Note that the even is attached to the TrackList, not the TrackList[], so it cannot be raised when a track is added or removed, only when something inside the TrackList changes. Are we talking about the same thing ? There is no TrackList array and TrackList is only used for audio/video, not text, so I don't understand the comment about cues. I'm talking about http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#tracklist which is the base class for MultipleTrackList and ExclusiveTrackList used to represent all the audio and video tracks (respectively). One instance of the object represents all the tracks, so I would assume that a change in the number of tracks is a change to this object. Ah yes, you're right: I got confused. It says Whenever the selected track is changed, the user agent must queue a task to fire a simple event named change at the MultipleTrackList object. This means it fires when the selectedIndex is changed, i.e. the user chooses a different track for rendering. I still don't think it relates to changes in the composition of tracks of a resource. That should be something different and should probably be on the MediaElement and not on the track list to also cover changes in text tracks. Fair enough. Also, as Eric (C) pointed out, one of the things which can change is which of several available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. I don't know about the adaptive streaming situation. I think that is more about statistics/metrics rather than about change of resource. All the alternatives in an adaptive streaming resource should provide the same number of tracks and the same video dimensions, just at different bitrate/quality, no? I think of the different adaptive versions on a per-track basis (i.e. the alternatives are *within* each track), not a bunch of alternatives each of which contains several tracks. Both are possible, of course. It's certainly possible (indeed common) for different bitrate video encodings to have different resolutions - there are video encoding reasons to do this. Of course the aspect ratio should not change and nor should the dimensions on the screen (both would be a little peculiar for the user). Now, the videoWidth and videoHeight attributes of HTMLVideoElement are not the same as the resolution (for a start, they are in CSS pixels, which are square), but I think it quite likely that if the resolution of the video changes than the videoWidth and videoHeight might change. I'd be interested to hear how existing implementations relate resolution to videoWidth and videoHeight. Well, if videoWidth and videoHeight change and no dimensions on the video are provided through CSS, then surely the video will change size and the display will shrink. That would be a terrible user experience. For that reason I would suggest that such a change not be made in alternative adaptive streams. That seems backwards to me! I would say For that reason I would suggest that dimensions are provided through CSS or through the width and height attributes. Alternatively, we change the specification of the video element to accommodate this aspect of adaptive streaming (for example, the videoWidth and videoHeight could be defined to be based on the highest resolution bitrate being considered.) There are good video encoding reasons for different bitrates to be encoded at different resolutions which are far more important than any reasons not to do either of the above. Different video dimensions should be provided through the source element and @media attribute, but within an adaptive stream, the alternatives should be consistent because the target device won't change. I guess this is a discussion for another thread... :-) Possibly ;-) The device knows much
Re: [whatwg] Video feedback
On Jun 20, 2011, at 5:28 PM, Silvia Pfeiffer wrote: On Tue, Jun 21, 2011 at 12:07 AM, Mark Watson wats...@netflix.com wrote: On Jun 20, 2011, at 11:52 AM, Silvia Pfeiffer wrote: On Mon, Jun 20, 2011 at 7:31 PM, Mark Watson wats...@netflix.com wrote: The TrackList object has an onchanged event, which I assumed would fire when any of the information in the TrackList changes (e.g. tracks added or removed). But actually the spec doesn't state when this event fires (as far as I could tell - unless it is implied by some general definition of events called onchanged). Should there be some clarification here ? I understood that to relate to a change of cues only, since it is on the tracklist. I.e. it's an aggregate event from the oncuechange event of a cue inside the track. I didn't think it would relate to a change of existence of that track. Note that the even is attached to the TrackList, not the TrackList[], so it cannot be raised when a track is added or removed, only when something inside the TrackList changes. Are we talking about the same thing ? There is no TrackList array and TrackList is only used for audio/video, not text, so I don't understand the comment about cues. I'm talking about http://www.whatwg.org/specs/web-apps/current-work/multipage/the-iframe-element.html#tracklist which is the base class for MultipleTrackList and ExclusiveTrackList used to represent all the audio and video tracks (respectively). One instance of the object represents all the tracks, so I would assume that a change in the number of tracks is a change to this object. Ah yes, you're right: I got confused. It says Whenever the selected track is changed, the user agent must queue a task to fire a simple event named change at the MultipleTrackList object. This means it fires when the selectedIndex is changed, i.e. the user chooses a different track for rendering. I still don't think it relates to changes in the composition of tracks of a resource. That should be something different and should probably be on the MediaElement and not on the track list to also cover changes in text tracks. Fair enough. Also, as Eric (C) pointed out, one of the things which can change is which of several available versions of the content is being rendered (for adaptive bitrate cases). This doesn't necessarily change any of the metadata currently exposed on the video element, but nevertheless it's information that the application may need. It would be nice to expose some kind of identifier for the currently rendered stream and have an event when this changes. I think that a stream-format-supplied identifier would be sufficient. I don't know about the adaptive streaming situation. I think that is more about statistics/metrics rather than about change of resource. All the alternatives in an adaptive streaming resource should provide the same number of tracks and the same video dimensions, just at different bitrate/quality, no? I think of the different adaptive versions on a per-track basis (i.e. the alternatives are *within* each track), not a bunch of alternatives each of which contains several tracks. Both are possible, of course. It's certainly possible (indeed common) for different bitrate video encodings to have different resolutions - there are video encoding reasons to do this. Of course the aspect ratio should not change and nor should the dimensions on the screen (both would be a little peculiar for the user). Now, the videoWidth and videoHeight attributes of HTMLVideoElement are not the same as the resolution (for a start, they are in CSS pixels, which are square), but I think it quite likely that if the resolution of the video changes than the videoWidth and videoHeight might change. I'd be interested to hear how existing implementations relate resolution to videoWidth and videoHeight. Well, if videoWidth and videoHeight change and no dimensions on the video are provided through CSS, then surely the video will change size and the display will shrink. That would be a terrible user experience. For that reason I would suggest that such a change not be made in alternative adaptive streams. That seems backwards to me! I would say For that reason I would suggest that dimensions are provided through CSS or through the width and height attributes. Alternatively, we change the specification of the video element to accommodate this aspect of adaptive streaming (for example, the videoWidth and videoHeight could be defined to be based on the highest resolution bitrate being considered.) There are good video encoding reasons for different bitrates to be encoded at different resolutions which are far more important than any reasons not to do either of the above. Different video dimensions should be provided through the source element and @media attribute, but within an adaptive stream, the alternatives
Re: [whatwg] How to handle multitrack media resources in HTML
Sent from my iPhone On Apr 11, 2011, at 8:55 AM, Eric Carlson eric.carl...@apple.com wrote: On Apr 10, 2011, at 12:36 PM, Mark Watson wrote: In the case of in-band tracks it may still be the case that they are retrieved independently over the network. This could happen two ways: - some file formats contain headers which enable precise navigation of the file, for example using HTTP byte ranges, so that the tracks could be retrieved independently. mp4 files would be an example. I don't know that anyone does this, though. QuickTime has supported tracks with external media samples in .mov files for more than 15 years. This type of file is most commonly used during editing, but they are occasionally found on the net. I was also thinking of a client which downloads the MOOV box and then uses the tables there to construct byte range requests for specific tracks. - in the case of adaptive streaming based on a manifest, the different tracks may be in different files, even though they appear as in-band tracks from an HTML perspective. In these cases it *might* make sense to expose separate buffer and network states for the different in-band tracks in just the same way as out-of-band tracks. I strongly disagree. Having different tracks APIs for different container formats will be extremely confusing for developers, and I don't think it will add anything. A UA that chooses to support non-self contained media files should account for all samples when reporting readyState and networkState. Fair enough. I did say 'might' :-) eric
Re: [whatwg] How to handle multitrack media resources in HTML
On Apr 7, 2011, at 11:54 PM, Ian Hickson wrote: On Thu, 10 Feb 2011, Silvia Pfeiffer wrote: One particular issue that hasn't had much discussion here yet is the issue of how to deal with multitrack media resources or media resources that have associated synchronized audio and video resources. I'm concretely referring to such things as audio descriptions, sign language video, and dubbed audio tracks. We require an API that can expose such extra tracks to the user and to JavaScript. This should be independent of whether the tracks are actually inside the media resource or are given as separate resources, I think there's a big difference between multiple tracks inside one resource and multiple tracks spread amongst multiple resources: in the former case, one would need a single set of network state APIs (load algorithm, ready state, network state, dimensions, buffering state, etc), whereas in the second case we'd need N set of these APIs, one for each media resource. Given that the current mechanism for exposing the load state of a media resource is a media element (video, audio), I think it makes sense to reuse these elements for loading each media resource even in a multitrack scenario. Thus I do not necessarily agree that exposing extra tracks should be done in a way that as independent of whether the tracks are in-band or out-of-band. In the case of in-band tracks it may still be the case that they are retrieved independently over the network. This could happen two ways: - some file formats contain headers which enable precise navigation of the file, for example using HTTP byte ranges, so that the tracks could be retrieved independently. mp4 files would be an example. I don't know that anyone does this, though. - in the case of adaptive streaming based on a manifest, the different tracks may be in different files, even though they appear as in-band tracks from an HTML perspective. In these cases it *might* make sense to expose separate buffer and network states for the different in-band tracks in just the same way as out-of-band tracks. In fact the distinction between in-band and out-of-band tracks is mainly how you discover them: out-of-band the author is assumed to know about by some means of their own, in-band can be discovered by loading the metadata part of a single initial resource. ...Mark
Re: [whatwg] Media elements statistics
All, I added some material to the wiki page based on our experience here at Netflix and based on the metrics defined in MPEG DASH for adaptive streaming. I'd love to here what people think. Statistics about presentation/rendering seem to be covered, but what should also be considered are network performance statistics, which become increasingly difficult to collect from the server when sessions are making use of multiple servers, possibly across multiple CDNs. Another aspect important for performance management is error reporting. Some thoughts on that on the page. ...Mark On Mar 31, 2011, at 7:07 PM, Robert O'Callahan wrote: On Fri, Apr 1, 2011 at 1:33 PM, Chris Pearce ch...@pearce.org.nz wrote: On 1/04/2011 12:22 p.m., Steve Lacey wrote: Chris - in the mozilla stats, I agree on the need for a frame count of frames that actually make it the the screen, but am interested in why we need both presented and painted? Wouldn't just a simple 'presented' (i.e. presented to the user) suffice? We distinguish between painted and presented so we have a measure of the latency in our rendering pipeline. It's more for our benefit as browser developers than for web developers. Yeah, just to be clear, we don't necessarily think that everything in our stats API should be standardized. We should wait and see what authors actually use. Rob -- Now the Bereans were of more noble character than the Thessalonians, for they received the message with great eagerness and examined the Scriptures every day to see if what Paul said was true. [Acts 17:11]
[whatwg] Indicating and selecting tracks
Hi everyone, I have been looking at how the video element might work in an adaptive streaming context where the available media are specified with some kind of manifest file (e.g. MPEG DASH Media Presentation Description) rather than in HTML. In this context there may be choices available as to what to present, many but not all related to accessibility: - multiple audio languages - text tracks in multiple languages - audio description of video - video with open captions (in various languages) - video with sign language - audio with directors commentary - etc. It seems natural that for text tracks, loading the manifest could cause the video element to be populated with associated track elements, allowing the application to discover the choices and activate/deactivate the tracks. But this seems just for text tracks. I know discussions are underway on what to do for other media types, but my question is whether it would be better to have a consistent solution for selection amongst the available media that applies for all media types ? Thanks, Mark Watson Netflix