On 8/2/2012 8:46 AM, Dmitry Olshansky wrote:
Keep a 6 character buffer in your consumer. If you read a char with the
high bit set, start filling that buffer and then decode it.
4 bytes is enough.
Since Unicode 5(?) the range of codepoints was defined to be 0...0x10FFFF
specifically so that it could be encoded in 4 bytes of UTF-8.
Yeah, but I thought 6 bytes would future proof it! (Inevitably, the Unicode
committee will add more.)
P.S. Looks like I'm too late for this party ;)
It affects you strongly, too, so I'm glad to see you join in.