Re: JDK 11 RFR of JDK-8196995: java.lang.Character should not state UTF-16 encoding is used for strings

Martin Buchholz Wed, 07 Feb 2018 18:21:06 -0800

I'm not sure about this.  The LATIN-1 optimization is an implementation
detail.  The important thing is the API, and it remains char-oriented
(although we've added codepoint APIs over the years).  Strictly speaking,
you can store any 16-bit integer sequences in char[], String, or
StringBuffer - no encoding is enforced.  What is really UTF-16 is the way
that such sequences are expected to be interpret by *other* APIs, e.g.
System.out.println (and methods like codePointAt)


On Wed, Feb 7, 2018 at 2:12 PM, joe darcy <joe.da...@oracle.com> wrote:

> Hello,
>
> Text in java.lang.Character states a UTF-16 character encoding is used for
> java.lang.String. While was true for many years, it is not necessarily true
> and not true in practice as of JDK 9 due to the improvements from JEP 254:
> Compact Strings.
>
> The statement about the encoding should be corrected.
>
> Please review the patch below which does this. (I've formatted the patch
> so that the change is text is made clear; I'll re-flow the paragraph before
> pushing.
>
> Thanks,
>
> -Joe
>
> diff -r 0b1138ce244f src/java.base/share/classes/java/lang/Character.java
> --- a/src/java.base/share/classes/java/lang/Character.java    Tue Feb 06
> 10:17:31 2018 -0800
> +++ b/src/java.base/share/classes/java/lang/Character.java    Wed Feb 07
> 11:38:06 2018 -0800
> @@ -75,7 +75,7 @@
>   * <a id="supplementary">Characters</a> whose code points are greater
>   * than U+FFFF are called <em>supplementary character</em>s.  The Java
>   * platform uses the UTF-16 representation in {@code char} arrays and
> - * in the {@code String} and {@code StringBuffer} classes. In
> + * may use it elsewhere. In
>   * this representation, supplementary characters are represented as a pair
>   * of {@code char} values, the first from the <em>high-surrogates</em>
>   * range, (&#92;uD800-&#92;uDBFF), the second from the
>
>

Re: JDK 11 RFR of JDK-8196995: java.lang.Character should not state UTF-16 encoding is used for strings

Reply via email to