Re: Which CoderResult for malformed surrogate pairs ?

Martin Buchholz Wed, 10 Sep 2008 12:55:48 -0700

There is another reason, aside from our Beloved Compatibility,
to prefer returning length == 1.  It is likely that the calling code will
delete the malformed chars and present the rest to a human.
The second char *might* be valid, so why hide it?


Martin

On Wed, Sep 10, 2008 at 08:22, Ulf Zibis <[EMAIL PROTECTED]> wrote:
> Hi Martin,
>
> thanks for the quick first answer.
>
> You are right, both chars could be corrupt.
> IMO, if CoderResult.malformedForLength(2) would be returned, this would
> be more informative, and the SW developer could decide by himself, if he
> would consider the CoderResult.length().
> Why having this differentiation by length, if nobody makes use of it?
> There is no other cause, which would entail a length other than 1 from
> CharsetEncoder.
>
> So do you think, it would be against spec to return a
> CoderResult.malformedForLength(2) in such cases, even if
> CoderResult.malformedForLength(1) isn't a bug.
>
> BTW:
> The chance to erroneously receive a high surrogate in range
> \uD800..\uDBFF is 1.56 %
> The chance to erroneously receive a char out of range \uDC00..\uDFFF
> after a correct high surrogate is 99.84 %
>
> -Ulf
>
>
> Am 09.09.2008 23:58, Martin Buchholz schrieb:
>>
>> I think when encountering a single high surrogate,
>> it is correct to return a length of either 1 or 2.
>> A thought experiment: a cosmic ray that mangled exactly one char
>> could have caused this situation if the original sequence was
>> of length either 1 or 2, depending on which char was mangled.
>>
>> Not a Defect.
>>
>> Martin
>>
>>
>
>

Re: Which CoderResult for malformed surrogate pairs ?

Reply via email to