Re: std.d.lexer requirements

Walter Bright Thu, 02 Aug 2012 15:20:24 -0700

On 8/2/2012 1:26 PM, Jonathan M Davis wrote:

On Thursday, August 02, 2012 01:44:18 Walter Bright wrote:

On 8/2/2012 1:38 AM, Jonathan M Davis wrote:

On Thursday, August 02, 2012 01:14:30 Walter Bright wrote:

On 8/2/2012 12:43 AM, Jonathan M Davis wrote:

It is for ranges in general. In the general case, a range of UTF-8 or
UTF-16 makes no sense whatsoever. Having range-based functions which
understand the encodings and optimize accordingly can be very beneficial
(which happens with strings but can't happen with general ranges without
the concept of a variably-length encoded range like we have with forward
range or random access range), but to actually have a range of UTF-8 or
UTF-16 just wouldn't work. Range-based functions operate on elements,
and
doing stuff like filter or map or reduce on code units doesn't make any
sense at all.


Yes, it can work.


How?


Keep a 6 character buffer in your consumer. If you read a char with the high
bit set, start filling that buffer and then decode it.


And how on earth is that going to work as a range?


1. read a character from the range

2. if the character is the start of a multibyte character, put the character inthe buffer

3. keep reading from the range until you've got the whole of the multibyte 
character
4. convert that 6 (or 4) character buffer into a dchar

Remember, its the consumer doing the decoding, not the input range.

I agree that we should be making string operations more efficient by taking code
units into account, but I completely disagree that we can do that generically.

The requirement I listed was that the input range present UTF8 characters. Notany random character type.

Re: std.d.lexer requirements

Reply via email to