Re: std.d.lexer requirements

Walter Bright Thu, 02 Aug 2012 01:45:22 -0700

On 8/2/2012 1:38 AM, Jonathan M Davis wrote:

On Thursday, August 02, 2012 01:14:30 Walter Bright wrote:

On 8/2/2012 12:43 AM, Jonathan M Davis wrote:

It is for ranges in general. In the general case, a range of UTF-8 or
UTF-16 makes no sense whatsoever. Having range-based functions which
understand the encodings and optimize accordingly can be very beneficial
(which happens with strings but can't happen with general ranges without
the concept of a variably-length encoded range like we have with forward
range or random access range), but to actually have a range of UTF-8 or
UTF-16 just wouldn't work. Range-based functions operate on elements, and
doing stuff like filter or map or reduce on code units doesn't make any
sense at all.


Yes, it can work.


How?

Keep a 6 character buffer in your consumer. If you read a char with the high bitset, start filling that buffer and then decode it.

Do you really think that it makes sense for a function like map or filter to
operate on individual code units? Because that's what would end up happening
with a range of code units. Your average, range-based function only makes
sense with _characters_, not code units. Functions which can operate on ranges
of code units without screwing up the encoding are a rarity.

Rare or not, they are certainly possible, and the early versions of std.stringdid just that (although they weren't using ranges, the same techniques apply).

Re: std.d.lexer requirements

Reply via email to