On Tue, Feb 28, 2012 at 12:11 AM, Simon Pieters < sim...@opera.com > wrote:
> > I think WebSocket should do the same, for the same reason. > > > Have you filed a bug? > > (No, not until this conversation moves along a bit further.) > On Tue, Feb 28, 2012 at 8:26 AM, Jonas Sicking <jo...@sicking.cc> > wrote: > > I agree that it "scrambles" the data. But no more than the HTML > > parser error recovery does. And if an unexpected exception is > > thrown > > then the > > > result is likely dataloss which is not obviously better than > > > scrambling part of the data. > > I'd say it's weaker than "scrambles", actually, at least with > human-readable text. Replacing one character with U+FFFD usually > results in an isolated glitch that a reader can recover from. (I do > this regularly when reading the HTML spec, which uses characters not > widely supported, in particular "Steps in synchronous sections are > marked with ?.") > Also, even if you're attentive to handling these errors, most of the > time you don't want to. In my experience, it's very uncommon to want > to explicitly handle very rare errors like "the user pasted in an > unpaired surrogate". There's rarely anything useful you can do, > except to walk through the string and change the unpaired surrogates > to U+FFFD, so you can move on. I'd rather just get U+FFFD to begin > with. OK, I've updated the Editor's Draft to reflect this. Essentially, I take Anne's advice about first converting the DOMString to a sequence of Unicode characters using the algorithm defined in WebIDL (namely this one: http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode). This actually seems to take care of unmatched surrogates from UTF-16 when you use a UTF-8 decoding on the Unicode characters following the algorithmic conversion, and so we may have what we need here. This is the 29th February Editor's Draft (ensure you shift-reload if necessary): http://dev.w3.org/2006/webapi/FileAPI/ I'd appreciate a review. If this passes muster, we may be one step further along the way to deprecating BlobBuilder, which only stipulated writing out as UTF-8 when the DOMString was appended to the Blob. -- A*