On Thursday, 2 June 2016 at 21:38:02 UTC, default0 wrote:
On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote:
1) It does not say that level 2 should be opt-in; it says that
level 2 should be toggle-able. Nowhere does it say which of
level 1 and 2 should be the default.
2) It says that working with graphemes is slower than UTF-16
code UNITS (level 1), but says nothing about streaming
decoding of code POINTS (what we have).
3) That document is from 2000, and its claims about
performance are surely extremely out-dated, anyway. Computers
and the Unicode standard have both changed much since then.
1) Right because a special toggleable syntax is definitely not
"opt-in".
It is not "opt-in" unless it is toggled off by default. The only
reason it doesn't talk about toggling in the level 1 section, is
because that section is written with the assumption that many
programs will *only* support level 1.
2) Several people in this thread noted that working on
graphemes is way slower (which makes sense, because its yet
another processing you need to do after you decoded - therefore
more work - therefore slower) than working on code points.
And working on code points is way slower than working on code
units (the actual level 1).
3) Not an argument - doing more work makes code slower.
What do you think I'm arguing for? It's not graphemes-by-default.
What I actually want to see: permanently deprecate the
auto-decoding range primitives. Force the user to explicitly
specify whichever of `by!dchar`, `byCodePoint`, or `byGrapheme`
their specific algorithm actually needs. Removing the implicit
conversions between `char`, `wchar`, and `dchar` would also be
nice, but isn't really necessary I think.
That would be a standards-compliant solution (one of several
possible). What we have now is non-standard, at least going by
the old version Walter linked.