Re: JDK 11 RFR of JDK-8196995: java.lang.Character should not state UTF-16 encoding is used for strings

Xueming Shen Thu, 08 Feb 2018 11:14:27 -0800

On 2/8/18, 10:59 AM, joe darcy wrote:

Hello,
On 2/8/2018 3:53 AM, Alan Bateman wrote:
On 07/02/2018 22:12, joe darcy wrote:
Hello,
Text in java.lang.Character states a UTF-16 character encoding isused for java.lang.String. While was true for many years, it is notnecessarily true and not true in practice as of JDK 9 due to theimprovements from JEP 254: Compact Strings.
The statement about the encoding should be corrected.
Please review the patch below which does this. (I've formatted thepatch so that the change is text is made clear; I'll re-flow theparagraph before pushing.
I'm not sure that this is worth changing. You could replace "classes"with "API" and add a note to say that an implementation may use anmore optimization representation but I don't think it's really needed.
In response to this feedback and others, how about:

     [...] The Java
  * platform uses the UTF-16 representation in {@code char} arrays and
- * in the {@code String} and {@code StringBuffer} classes. In
+ * presents a UTF-16 model in the string-related API.
IMO anyway, I think saying "uses a UTF-16 representation for String"is at best misleading with the current implementation since 8 != 16for the compressed representation is used for all Latin-1 strings.

Well, encoding/charset is the concept of a mapping between a characterand a correspondingcode point value. We are still using the UTF16 encoding scheme torepresent a character injvm. How to represent/store that UTF16 code point value in String classis an implementationdetail. A 16-bit for "char" and a 1-byte for "latin1" (still in Unicodecharset) + 2 byte for the

rest in String class.

As I said in my previous email. The mention of 8859-1 in the JEP mightcause the confusion.At early stage of the project we were really experimenting on usingdifferent "encoding", includingutf8. But the project ended up with staying with UTF-16, with a"customized/compressed" storage

mechanism to store the UTF16 codepoint value.

-Sherman

Re: JDK 11 RFR of JDK-8196995: java.lang.Character should not state UTF-16 encoding is used for strings

Reply via email to