On 09/29/2011 02:16 PM, Ulf Zibis wrote:
Please use spaces with ternary operators: Lines 155, 216

For short you could use sr instead srcRemaining, consistent to sa, sp, sl.

 420         // returns -1 if there is malformed byte(s) and the
better:
 420         // returns -1 if there is/are malformed byte(s) and the

 466                             sp -=3;
There should be a space:  sp -= 3;

Webrev has been updated accordingly.


 280                     if (Character.isSurrogate(c))
281 return malformedForLength(src, sp, dst, dp, 3); Shouldn't we return cr.length() = 1to allow remaining 2 bytes to be interpreted again ?


Actually I don't know the answer. My reading of D93a/D93b suggests that we might interpret it as a whole, given the bytes are actually in well-formed byte pattern range listed in Table 3.7, but "ill-formed" simply because they are surrogate value not scale value, so I would interpret the whole 3 bytes as a maximal subpart. Given D93a/b is "best practices for Using U+fffd", either way should be fine. We do have Unicode expert on the list, so maybe they can share their opinion on what is the "desired"/recommended
behavior in this case, from Standard point view?


Am 29.09.2011 05:27, schrieb Xueming Shen:
Hi,

On 9/28/2011 3:44 PM, Ulf Zibis wrote:
5. IMHO charset CESU-8 should be hosted in extended-charsets, otherwise it should be added to java.nio.StandardCharsets


We have lots of charsets provided via the "standard charset provider" (in rt.jar) but not listed in StandardCharsets.
Yes, but the reasonable to add CESU-8 to StandardCharsets was the supposed demand to treat all unicode charsets equivalent.

Otherwise there is no obstacle to host CESU-8 in extended-charsets.
IMHO, CESU-8 addresses corner case compatibility issues, but not "standard" requirements.

To put CESU-8 into "standard charset provider" (it is only an implementation details) does not mean it is a "standard" requirement, it just means it is bundled into rt.jar. The reason I put it there is to make sure it is together with the UTF-8, with the assumption is that you might need it around when using the updated UTF-8, which no longer handles those 3/6-byte
surrogates.

-Sherman


Reply via email to