On Tue, May 17, 2011 at 11:09 AM, Shawn Steele <shawn.ste...@microsoft.com>wrote:
> I would much prefer changing "UCS-2" to "UTF-16", thus formalizing that > surrogate pairs are permitted. That'd be very difficult to break any > existing code and would still allow representation of everything reasonable > in Unicode. > > That would enable Unicode, and allow extending string literals and regular > expressions for convenience with the U+10FFFF style notation (which would be > equivalent to the surrogate pair). The character code manipulation > functions could be similarly augmented without breaking anything (and maybe > not needing different names?) > > You might want to qualify the UTF-16 as allowing, but strongly > discouraging, lone surrogates for those people who didn't realize their > binary data wasn't a string. > > The sole disadvantage would be that iterating through a string would > require consideration of surrogates, same as today. The same caution is > also necessary to avoid splitting Ä (U+0041 U+0308) into its component A > and ̈ parts. I wouldn't be opposed to some sort of helper functions or > classes that aided in walking strings, preferably with options to walk the > graphemes (or whatever), not just the surrogate pairs. FWIW: we have such a > helper for surrogates in .Net and "nobody uses them". The most common > feedback is that it's not that helpful because it doesn't deal with the > graphemes. > Hmm... I proposed break iterators for 'character/grapheme', word, line and sentence as a part of i18n API, but it's "shot down" (at least for version 0.5). Are you open to adding them now ? Once this discussion is settled and the proposal to support the full unicode range is in place, we can revisit the issue. Jungshik > - Shawn > > shawn.ste...@microsoft.com > Senior Software Design Engineer > Microsoft Windows > > _______________________________________________ > es-discuss mailing list > es-discuss@mozilla.org > https://mail.mozilla.org/listinfo/es-discuss >
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss