On 8/2/2012 1:26 PM, Jonathan M Davis wrote:
On Thursday, August 02, 2012 01:44:18 Walter Bright wrote:
On 8/2/2012 1:38 AM, Jonathan M Davis wrote:
On Thursday, August 02, 2012 01:14:30 Walter Bright wrote:
On 8/2/2012 12:43 AM, Jonathan M Davis wrote:
It is for ranges in general. In the general case, a range of UTF-8 or
UTF-16 makes no sense whatsoever. Having range-based functions which
understand the encodings and optimize accordingly can be very beneficial
(which happens with strings but can't happen with general ranges without
the concept of a variably-length encoded range like we have with forward
range or random access range), but to actually have a range of UTF-8 or
UTF-16 just wouldn't work. Range-based functions operate on elements,
and
doing stuff like filter or map or reduce on code units doesn't make any
sense at all.

Yes, it can work.

How?

Keep a 6 character buffer in your consumer. If you read a char with the high
bit set, start filling that buffer and then decode it.

And how on earth is that going to work as a range?

1. read a character from the range
2. if the character is the start of a multibyte character, put the character in the buffer
3. keep reading from the range until you've got the whole of the multibyte 
character
4. convert that 6 (or 4) character buffer into a dchar

Remember, its the consumer doing the decoding, not the input range.

I agree that we should be making string operations more efficient by taking code
units into account, but I completely disagree that we can do that generically.

The requirement I listed was that the input range present UTF8 characters. Not any random character type.

Reply via email to