Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

Xueming Shen Thu, 29 Sep 2011 15:26:50 -0700

On 09/29/2011 02:16 PM, Ulf Zibis wrote:

Please use spaces with ternary operators: Lines 155, 216


For short you could use sr instead srcRemaining, consistent to sa, sp, sl.

 420         // returns -1 if there is malformed byte(s) and the
better:
 420         // returns -1 if there is/are malformed byte(s) and the

 466                             sp -=3;
There should be a space:  sp -= 3;


Webrev has been updated accordingly.

 280                     if (Character.isSurrogate(c))
281 return malformedForLength(src, sp, dst,dp, 3);Shouldn't we return cr.length() = 1to allow remaining 2 bytes to beinterpreted again ?

Actually I don't know the answer. My reading of D93a/D93b suggests thatwe mightinterpret it as a whole, given the bytes are actually in well-formedbyte pattern rangelisted in Table 3.7, but "ill-formed" simply because they are surrogatevalue not scalevalue, so I would interpret the whole 3 bytes as a maximal subpart.Given D93a/b is"best practices for Using U+fffd", either way should be fine. We do haveUnicode experton the list, so maybe they can share their opinion on what is the"desired"/recommended

behavior in this case, from Standard point view?

Am 29.09.2011 05:27, schrieb Xueming Shen:
Hi,

On 9/28/2011 3:44 PM, Ulf Zibis wrote:
5. IMHO charset CESU-8 should be hosted in extended-charsets,otherwise it should be added to java.nio.StandardCharsets
We have lots of charsets provided via the "standard charset provider"(in rt.jar) but not listed in StandardCharsets.
Yes, but the reasonable to add CESU-8 to StandardCharsets was thesupposed demand to treat all unicode charsets equivalent.
Otherwise there is no obstacle to host CESU-8 in extended-charsets.
IMHO, CESU-8 addresses corner case compatibility issues, but not"standard" requirements.

To put CESU-8 into "standard charset provider" (it is only animplementation details) doesnot mean it is a "standard" requirement, it just means it is bundledinto rt.jar. The reasonI put it there is to make sure it is together with the UTF-8, with theassumption is that youmight need it around when using the updated UTF-8, which no longerhandles those 3/6-byte

surrogates.

-Sherman

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

Reply via email to