Re: The Case Against Autodecode

tsbockman via Digitalmars-d Thu, 02 Jun 2016 23:12:07 -0700

On Thursday, 2 June 2016 at 21:00:17 UTC, tsbockman wrote:

However, this document is very old - from Unicode 3.0 and theyear 2000:
While there are no surrogate characters in Unicode 3.0(outside of private use characters), future versions ofUnicode will contain them...
Perhaps level 1 has since been redefined?


I found the latest (unofficial) draft version:
    http://www.unicode.org/reports/tr18/tr18-18.html

Relevant changes:

* Level 1 is to be redefined as working on code points, not codeunits:

A fundamental requirement is that Unicode text be interpretedsemantically by code point, not code units.

* Level 2 (graphemes) is explicitly described as a "defaultlevel":

This is still a default level—independent of country orlanguage—but provides much better support for end-userexpectations than the raw level 1...

* All mention of level 2 being slow has been removed. The onlyreason given for making it toggle-able is for compatibility withlevel 1 algorithms:

Level 2 support matches much more what user expectations arefor sequences of Unicode characters. It is stilllocale-independent and easily implementable. However, forcompatibility with Level 1, it is useful to have some sort ofsyntax that will turn Level 2 support on and off.

Re: The Case Against Autodecode

Reply via email to