On Thursday, 2 June 2016 at 21:00:17 UTC, tsbockman wrote:
However, this document is very old - from Unicode 3.0 and the
year 2000:
While there are no surrogate characters in Unicode 3.0
(outside of private use characters), future versions of
Unicode will contain them...
Perhaps level 1 has since been redefined?
I found the latest (unofficial) draft version:
http://www.unicode.org/reports/tr18/tr18-18.html
Relevant changes:
* Level 1 is to be redefined as working on code points, not code
units:
A fundamental requirement is that Unicode text be interpreted
semantically by code point, not code units.
* Level 2 (graphemes) is explicitly described as a "default
level":
This is still a default level—independent of country or
language—but provides much better support for end-user
expectations than the raw level 1...
* All mention of level 2 being slow has been removed. The only
reason given for making it toggle-able is for compatibility with
level 1 algorithms:
Level 2 support matches much more what user expectations are
for sequences of Unicode characters. It is still
locale-independent and easily implementable. However, for
compatibility with Level 1, it is useful to have some sort of
syntax that will turn Level 2 support on and off.