There is another reason, aside from our Beloved Compatibility, to prefer returning length == 1. It is likely that the calling code will delete the malformed chars and present the rest to a human. The second char *might* be valid, so why hide it?
Martin On Wed, Sep 10, 2008 at 08:22, Ulf Zibis <[EMAIL PROTECTED]> wrote: > Hi Martin, > > thanks for the quick first answer. > > You are right, both chars could be corrupt. > IMO, if CoderResult.malformedForLength(2) would be returned, this would > be more informative, and the SW developer could decide by himself, if he > would consider the CoderResult.length(). > Why having this differentiation by length, if nobody makes use of it? > There is no other cause, which would entail a length other than 1 from > CharsetEncoder. > > So do you think, it would be against spec to return a > CoderResult.malformedForLength(2) in such cases, even if > CoderResult.malformedForLength(1) isn't a bug. > > BTW: > The chance to erroneously receive a high surrogate in range > \uD800..\uDBFF is 1.56 % > The chance to erroneously receive a char out of range \uDC00..\uDFFF > after a correct high surrogate is 99.84 % > > -Ulf > > > Am 09.09.2008 23:58, Martin Buchholz schrieb: >> >> I think when encountering a single high surrogate, >> it is correct to return a length of either 1 or 2. >> A thought experiment: a cosmic ray that mangled exactly one char >> could have caused this situation if the original sequence was >> of length either 1 or 2, depending on which char was mangled. >> >> Not a Defect. >> >> Martin >> >> > >