Re: std.d.lexer requirements

Walter Bright Thu, 02 Aug 2012 11:05:23 -0700

On 8/2/2012 8:46 AM, Dmitry Olshansky wrote:

Keep a 6 character buffer in your consumer. If you read a char with the
high bit set, start filling that buffer and then decode it.

4 bytes is enough.

Since Unicode 5(?) the range of codepoints was defined to be 0...0x10FFFF
specifically so that it could be encoded in 4 bytes of UTF-8.

Yeah, but I thought 6 bytes would future proof it! (Inevitably, the Unicodecommittee will add more.)


P.S. Looks like I'm too late for this party ;)


It affects you strongly, too, so I'm glad to see you join in.

Re: std.d.lexer requirements

Reply via email to