Re: std.d.lexer requirements

Dmitry Olshansky Thu, 02 Aug 2012 08:50:38 -0700

On 02-Aug-12 12:44, Walter Bright wrote:

On 8/2/2012 1:38 AM, Jonathan M Davis wrote:

On Thursday, August 02, 2012 01:14:30 Walter Bright wrote:

On 8/2/2012 12:43 AM, Jonathan M Davis wrote:

It is for ranges in general. In the general case, a range of UTF-8 or
UTF-16 makes no sense whatsoever. Having range-based functions which
understand the encodings and optimize accordingly can be very
beneficial
(which happens with strings but can't happen with general ranges
without
the concept of a variably-length encoded range like we have with
forward
range or random access range), but to actually have a range of UTF-8 or
UTF-16 just wouldn't work. Range-based functions operate on
elements, and
doing stuff like filter or map or reduce on code units doesn't make any
sense at all.


Yes, it can work.


How?


Keep a 6 character buffer in your consumer. If you read a char with the
high bit set, start filling that buffer and then decode it.

4 bytes is enough.

Since Unicode 5(?) the range of codepoints was defined to be0...0x10FFFF specifically so that it could be encoded in 4 bytes of UTF-8.



P.S. Looks like I'm too late for this party ;)


--
Dmitry Olshansky

Re: std.d.lexer requirements

Reply via email to