Re: UTF-8 decode broken for supplementary characters?

Robert Muir Wed, 01 Sep 2010 05:47:28 -0700

On Wed, Sep 1, 2010 at 5:43 AM, Deven You <[email protected]> wrote:


> I have run the test on Linux, and got the same error. Seems it is due to
> our
> UTF-8 decoder. I will do more debugging to narrow down the root cause. Any
> one is familiar with UTF-8? I hope I can get some help.
>
>
Looks like the problem is in UTF_8's decodeLoop where it does:

cArr[outIndex++] = (char) jchar;

and similar in the non-array case where it does:

out.put((char) jchar);

in this case, jchar is the correct value of my codepoint (0x1d11e), but is
being truncated to 'char'. instead it needs to be split into surrogates.

-- 
Robert Muir
[email protected]

Re: UTF-8 decode broken for supplementary characters?

Reply via email to