Most content actually only tries to access characters of a string like this:
for (var i = 0; i < str.length(); i++) { str[i]; } While a naive implementation using UTF-8 encoding strings would be O(n^2) if the previous lookup result was cached it is possible to achieve a reasonably fast O(n) behaviour on such a loop. It feels like some kind of iterator would be more efficient but I don't think iterators would "feel right" in ECMAScript. You can encode unmatched surrogates in UTF-8 (although they may have to be removed before the string is passed to the browser DOM code) so it may be possible to simply always encode strings in UTF-8 allowing for much simpler sharing of strings between code that wants UTF-8 support and code that is using the old model at the expense of more complex behaviour where UTF-16 surrogates are referenced. Issues only arise in code that tries to treat a string as an array of 16-bit integers, and I don't think we should be particularly bothered by performance of code which misuses strings in this fashion (but clearly this should still work without opt-in to new string handling). I think this is a nicer and more flexible model than string representations being dependent on which heap they came from - all issues related to encoding can be contained in the String object implementation. While this is being discussed, for any new string handling I think we should make any invalid strings (according to the rules in Unicode) cause some kind of exception on creation. -- Andrew Oakley _______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss