Re: RFR - JDK-8223775 String::stripIndent (Preview)

Alex Buckley Tue, 21 May 2019 16:22:01 -0700

On 5/21/2019 2:10 PM, Jim Laskey wrote:

Updated version http://cr.openjdk.java.net/~jlaskey/8223775/webrev-02

This webrev substantially updates the API spec, which is really a topicfor amber-spec-experts (keep reading to see why). Cross-posting between-dev and -spec-experts lists is not good, so maybe we can wrap this uphere without prolonged discussion.

API spec looks good, but it was a surprise to learn that stripIndentperforms normalization of line terminators:


"@return string with margins removed and line terminators normalized"

The processing steps in the JEP (and the JLS) are clear thatnormalization happens before incidental white space removal. I realizethat stripIndent performs separation and joining in such a way as toproduce a string that looks like it was normalized prior to stripIndent,so the @return isn't wrong, but it's still confusing to make a big dealof normalization-first only for stripIndent to suggest normalization-last.

I think we should leave the JEP alone, since it interleaves behaviorwith motivation and examples in order to aid the reader, but we shouldalign the JLS with the API:


-----

The string represented by a text block is not the literal sequence ofcharacters in the content. Instead, the string represented by a textblock is the result of applying the following transformations to thecontent, in order:

1. _Incidental white space_ is removed and line terminators are_normalized_, as if by execution of String::stripIndent on thecharacters in the content. [The emphasized terms are a hint to the APIspec to define the term, which is not currently the case for the secondterm.]


2. Escape sequences are interpreted, as in a string literal.
-----

String::indent also says "normalizes line termination characters"without defining it. Separately, String::stripIndent is not at all likethe strip, stripLeading, and stripTrailing methods which sound relatedto it -- they would pointlessly strip the first row of white space dotsfrom a multi-line string and leave the other rows.

Taking all this together, I think it's time to upgrade the class-levelspec of String: to advertise the new methods added in 11+, and to showtext blocks, and to get some terms defined for the benefit of multiplemethods. I realize this wasn't on your radar, but it's inevitable -- thesame thing happened for the class-level spec of Class when nestmateswere introduced. So, here goes:


-----

The String class represents character strings. ~~All~~ String literals**and text blocks** in Java programs ~~, such as "abc",~~ areimplemented as instances of this class.

The strings represented this class are constant; their values cannot bechanged after they are created. (For mutable strings, see StringBufferand StringBuilder.) Because instances of `String` are immutable, theycan be shared. For example: ...

[The example with a char[] is quite subtle for a beginner, but I'mskipping over it right now.]

The class String includes methods for examining individual characters ofthe sequence, for comparing strings, for searching strings, forextracting substrings, and for creating a copy of a string with allcharacters translated to uppercase or to lowercase. Case mapping isbased on the Unicode Standard version specified by the Character class.


Here are some examples of how strings can be used:

         System.out.println("abc");
         String cde = "cde";
         String c = "abc".substring(2,3);
         String d = cde.substring(1, 2);

Unless otherwise noted, methods for comparing Strings do not take localeinto account. The Collator class provides methods for finer-grain,locale-sensitive String comparison.

Unless otherwise noted, passing a null argument to a constructor ormethod in this class will cause a NullPointerException to be thrown.[This doesn't fit anywhere. j.l.Character doesn't bother with it, eventhough its methods throw NPEs too. Maybe time to drop. We have lots moreimportant stuff to say.]


### String concatenation

The Java language provides special support for the string concatenationoperator (`+`), and for conversion of other objects to strings. Foradditional information on string concatenation and conversion, see TheJava™ Language Specification. [Somehow this manages to skip the valueOfmethods, which in conjunction with things like Integer::parseInt areworthy of a paragraph by themselves. Future work.]


Here are some examples of string concatenation:

     String cde = "cde";
     System.out.println("abc" + cde);

[These examples are dull, and don't describe their output, and need toshow text blocks. Future work.]


### String processing

The strings represented by this class may span multiple lines byincluding _line terminators_ among their characters. A line terminatoris one of the following: a line feed character LF (U+000A), a carriagereturn character CR (U+000D), or a carriage return followed immediatelyby a line feed CRLF (U+000D U+000A). [Don't want to show escapesequences like \n yet.]

A string has _normalized_ line terminators if LF is the only lineterminator which appears in the string. Many methods of this class_normalize_ the strings they return by ensuring that CR and CRLF aretranslated to LF.

The class String also includes methods for manipulating non-alphanumericcharacters in strings, such as converting escape sequences intonon-graphic characters, and stripping white space. [This paragraph is ajumping off point for describing stripIndent's special relationship withtext blocks.]


### Unicode

A String represents a string in the UTF-16 format in which supplementarycharacters are represented by surrogate pairs (see the section UnicodeCharacter Representations in the Character class for more information).Index values refer to char code units, so a supplementary character usestwo positions in a String.

The String class provides methods for dealing with Unicode code points(i.e., characters), in addition to those for dealing with Unicode codeunits (i.e., char values).

-----

Alex

Re: RFR - JDK-8223775 String::stripIndent (Preview)

Reply via email to