On 02-Aug-12 12:44, Walter Bright wrote:
On 8/2/2012 1:38 AM, Jonathan M Davis wrote:
On Thursday, August 02, 2012 01:14:30 Walter Bright wrote:
On 8/2/2012 12:43 AM, Jonathan M Davis wrote:
It is for ranges in general. In the general case, a range of UTF-8 or
UTF-16 makes no sense whatsoever. Having range-based functions which
understand the encodings and optimize accordingly can be very
beneficial
(which happens with strings but can't happen with general ranges
without
the concept of a variably-length encoded range like we have with
forward
range or random access range), but to actually have a range of UTF-8 or
UTF-16 just wouldn't work. Range-based functions operate on
elements, and
doing stuff like filter or map or reduce on code units doesn't make any
sense at all.

Yes, it can work.

How?

Keep a 6 character buffer in your consumer. If you read a char with the
high bit set, start filling that buffer and then decode it.

4 bytes is enough.

Since Unicode 5(?) the range of codepoints was defined to be 0...0x10FFFF specifically so that it could be encoded in 4 bytes of UTF-8.


P.S. Looks like I'm too late for this party ;)


--
Dmitry Olshansky

Reply via email to