On 5/21/2019 2:10 PM, Jim Laskey wrote:
Updated version http://cr.openjdk.java.net/~jlaskey/8223775/webrev-02

This webrev substantially updates the API spec, which is really a topic for amber-spec-experts (keep reading to see why). Cross-posting between -dev and -spec-experts lists is not good, so maybe we can wrap this up here without prolonged discussion.

API spec looks good, but it was a surprise to learn that stripIndent performs normalization of line terminators:

"@return string with margins removed and line terminators normalized"

The processing steps in the JEP (and the JLS) are clear that normalization happens before incidental white space removal. I realize that stripIndent performs separation and joining in such a way as to produce a string that looks like it was normalized prior to stripIndent, so the @return isn't wrong, but it's still confusing to make a big deal of normalization-first only for stripIndent to suggest normalization-last.

I think we should leave the JEP alone, since it interleaves behavior with motivation and examples in order to aid the reader, but we should align the JLS with the API:

-----
The string represented by a text block is not the literal sequence of characters in the content. Instead, the string represented by a text block is the result of applying the following transformations to the content, in order:

1. _Incidental white space_ is removed and line terminators are _normalized_, as if by execution of String::stripIndent on the characters in the content. [The emphasized terms are a hint to the API spec to define the term, which is not currently the case for the second term.]

2. Escape sequences are interpreted, as in a string literal.
-----

String::indent also says "normalizes line termination characters" without defining it. Separately, String::stripIndent is not at all like the strip, stripLeading, and stripTrailing methods which sound related to it -- they would pointlessly strip the first row of white space dots from a multi-line string and leave the other rows.

Taking all this together, I think it's time to upgrade the class-level spec of String: to advertise the new methods added in 11+, and to show text blocks, and to get some terms defined for the benefit of multiple methods. I realize this wasn't on your radar, but it's inevitable -- the same thing happened for the class-level spec of Class when nestmates were introduced. So, here goes:

-----
The String class represents character strings. ~~All~~ String literals **and text blocks** in Java programs ~~, such as "abc",~~ are implemented as instances of this class.

The strings represented this class are constant; their values cannot be changed after they are created. (For mutable strings, see StringBuffer and StringBuilder.) Because instances of `String` are immutable, they can be shared. For example: ...

[The example with a char[] is quite subtle for a beginner, but I'm skipping over it right now.]

The class String includes methods for examining individual characters of the sequence, for comparing strings, for searching strings, for extracting substrings, and for creating a copy of a string with all characters translated to uppercase or to lowercase. Case mapping is based on the Unicode Standard version specified by the Character class.

Here are some examples of how strings can be used:

         System.out.println("abc");
         String cde = "cde";
         String c = "abc".substring(2,3);
         String d = cde.substring(1, 2);

Unless otherwise noted, methods for comparing Strings do not take locale into account. The Collator class provides methods for finer-grain, locale-sensitive String comparison.

Unless otherwise noted, passing a null argument to a constructor or method in this class will cause a NullPointerException to be thrown. [This doesn't fit anywhere. j.l.Character doesn't bother with it, even though its methods throw NPEs too. Maybe time to drop. We have lots more important stuff to say.]

### String concatenation

The Java language provides special support for the string concatenation operator (`+`), and for conversion of other objects to strings. For additional information on string concatenation and conversion, see The Java™ Language Specification. [Somehow this manages to skip the valueOf methods, which in conjunction with things like Integer::parseInt are worthy of a paragraph by themselves. Future work.]

Here are some examples of string concatenation:

     String cde = "cde";
     System.out.println("abc" + cde);

[These examples are dull, and don't describe their output, and need to show text blocks. Future work.]

### String processing

The strings represented by this class may span multiple lines by including _line terminators_ among their characters. A line terminator is one of the following: a line feed character LF (U+000A), a carriage return character CR (U+000D), or a carriage return followed immediately by a line feed CRLF (U+000D U+000A). [Don't want to show escape sequences like \n yet.]

A string has _normalized_ line terminators if LF is the only line terminator which appears in the string. Many methods of this class _normalize_ the strings they return by ensuring that CR and CRLF are translated to LF.

The class String also includes methods for manipulating non-alphanumeric characters in strings, such as converting escape sequences into non-graphic characters, and stripping white space. [This paragraph is a jumping off point for describing stripIndent's special relationship with text blocks.]

### Unicode

A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

The String class provides methods for dealing with Unicode code points (i.e., characters), in addition to those for dealing with Unicode code units (i.e., char values).
-----

Alex

Reply via email to