On 5/21/2019 7:56 AM, Jim Laskey wrote:
Please do a code review of the new String::stripIndent instance method.

My interest is in the API spec because the JLS will make normative reference to it.


"Removes horizontal white space margins from the essential body of a
Text Block originated string, while preserving relative indentation."

Lots of concepts here: margins, "essential body", a "Text Block originated string" (not just any string?). Looks like the removal happens in-place. While "essential" is a concept from the JEP, I think the sibling concept, "incidental", is the one to lead with. Alluding to multiple lines is also good. Recommend:

"Returns a string whose value is this string, with incidental white space removed from the beginning and end of every line."

That is, 'stripIndent' isn't quite as generic as its name suggests. It implements a particular policy about "incidental" white space that we will now explain...


New paragraph: "Incidental white space is often present in a text block to align the content with the opening delimiter. For example, in the following code, dots represent incidental white space:

String html = """
..............<html>
..............    <body>
..............        <p>Hello, world</p>
..............    </body>
..............</html>
..............""";


Another new para, to link this high-falutin' incidental concept to the workmanlike name of this method: "This method treats the incidental white space as indentation to be stripped, producing a string that preserves the relative indentation of the content. Using | to visualize the start of each line of the string:

|<html>
|    <body>
|        <p>Hello, world</p>
|    </body>
|</html>


Now, the algo. "This string is first conceptually separated into lines as if by {@link String#lines()}." -- phrasing: "First, this string is conceptually ..."


"Then, the <i>minimum indentation</i> (min) is determined as follows. For each non-blank line (as defined by {@link String#isBlank()}), the
leading {@link Character#isWhitespace(int) white space} characters are
counted. The leading {@link Character#isWhitespace(int) white space} characters on the last line are are also counted even if blank.
The <i>min</i> value is the smallest of these counts."

No more "common white space prefix" like in the JEP? OK, fair enough, we're now fully rotated to "indentation" rather than "white space".


"For each non-blank line, <i>min</i> leading white space characters are
removed. Each white space character is treated as a single character. In
particular, the tab character {@code "\t"} (U+0009) is considered a
single character; it is not expanded.

The trailing white space of each line is also removed."

The trailing white space characters are just as "treated as a single character" as the leading white space characters. Recommend: "For each non-blank line, <i>min</i> leading white space characters are removed, and any trailing white space characters are removed."


Recommend handling escape sequences in a separate para. Showing "\t", a two character sequence, and then alluding to \u0009, a single character, is confusing.


"Each line is suffixed with a line feed character {@code "\n"} (U+000A),
except for the last line. The last line is suffixed with a line feed
character {@code "\n"} (U+000A) only if the original last line was blank."

The phrasing "except for the last line" is odd because it suggests the last line never gets a suffix. Except, oh, it might. And "suffixed" is an unusual word. If you're going to allude to the original string's last line, then isn't there a way to lean on it more? e.g. "A line feed character is appended to every line if the corresponding line in the original string was terminated with a line feed character." (Not saying that is correct or desirable, but is easier to visualize.)


"Finally, the lines are concatenated into a single string and returned."

Recommend: "Finally, the lines are concatenated into a string, which is the result."

Alex

Reply via email to