Re: String to ArrayBuffer

Charles Pritchard Wed, 11 Jan 2012 19:52:33 -0800

On 1/11/2012 4:22 PM, Boris Zbarsky wrote:

On 1/11/12 6:03 PM, Charles Pritchard wrote:
Is there any instance in practice where DOMString as exposed to the
scripting environment is not implemented as a unicode string?
I don't know what you mean by that.
The point is, it's trivial to construct JS strings that containarbitrary sequences of 16-bit units (using fromCharCode or \uescapes). Nothing anywhere in JS or the DOM per se enforces thatstrings are valid UTF-16 (which is the way that an actual Unicodestring would be encoded as a JS string).



My [wrong] understanding was that DOMString referred to valid unicode.

WebIDL:

"The DOMString type corresponds to the set of all possible sequences of16 bit unsigned integer code units. Such sequences are commonlyinterpreted as UTF-16 encoded strings [RFC2781] although this is notrequired... Nothing in this specification requires a DOMString value tobe a valid UTF-16 string."

http://www.w3.org/TR/WebIDL/#idl-DOMString

DOM3:

"The DOMString type is used to store [Unicode] characters as a sequenceof 16-bit units using UTF-16 as defined in [Unicode] and Amendment 1 of[ISO/IEC 10646]." There are some normalization notes, but otherwise,it's close enough to saying it stores Unicode, but it can handle all16bit combinations.

http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-C74D1578

For "historic reasons" WindowBase64 throws an error if input is notwithin Unicode range.

http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#atob

I realize that internally, DOMString may be implemented as a 16 bit
integer + length;
Not just internally. The JS spec and the DOM spec both explicitly saythat this is what strings are: an array of 16-bit integers.

WebIDL and DOM define "DOMString", of course. JS defines "The StringType" in 8.4. They are intended to be the same.

http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

"The String type is the set of all finite ordered sequences of zero ormore 16-bit unsigned integer values .... When a String contains actualtextual data, each element is considered to be a single UTF-16 codeunit. Whether or not this is the actual storage format of a String, thecharacters within a String are numbered by their initial code unitelement position as though they were represented using UTF-16."

Browsers do the same thing with WindowBase64, though it's specified as
DOMString, in practice (as the notes say), it's unicode.
http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#atob
If you look at the actual processing model, you take the input arrayof 16-bit integers, throw if any is not in the set { 0x2B, 0x2F, 0x30} union [0x41,0x5A] union [0x61,0x6A] and then treat the rest as ASCIIdata (which at that point it is).
It defines this in terms of "Unicode" but that's just because any JSstring that satisfies the above constraints can be considered a"Unicode" string if one wishes.
Web Storage, also, only works with unicode.
I'm not familiar with the relevant part of Web Storage. Can you citethe relevant part please?

The character code conversion gets weird. If you'd explain this in theproper terms, I'd appreciate it.


Load a binary resource via the old charset hack.

Save the resulting string into localStorage. There are some conversionissues. I am not using the right vocabulary.I know the list has seen the issue before, and I'll bet someone here canexplain it succinctly.


Example:
// Image files are easiest to try this with.
https://developer.mozilla.org/En/XMLHttpRequest/Using_XMLHttpRequest#Receiving_binary_data_in_older_browsers
// From the article:
function load_binary_resource(url) {
  var req = new XMLHttpRequest();
  req.open('GET', url, false);

//XHR binary charset opt by Marcus Granado 2006[http://mgran.blogspot.com]

  req.overrideMimeType('text\/plain; charset=x-user-defined');
  req.send(null);
  if (req.status != 200) return '';
  return req.responseText;
}
var x = load_binary_resource('imageurl.png');
localStorage.fail = x;
localStorage.fail == x.fail; // will return false.

Re: String to ArrayBuffer

Reply via email to