On Sat, 30 Jul 2022 00:36:06 GMT, Jonathan Gibbons <[email protected]> wrote:
> Please review a medium-size cleanup related to handling newlines in
> `Content`/`HtmlTree`/`Text[Builder]` trees.
> This is the 3rd and last in the recent series of cleanup tasks in the code to
> generate HTML.
>
> Until recently, the `Text`/`TextBuilder` objects were "ready-for-writing",
> meaning that characters like `<`, `&`, etc were encoded as entities, and
> newlines were represented by the platform line separator. This caused
> difficulties when counting the size of generated text, since these characters
> had to be detected and accounted for. A recent change addressed entities,
> but stayed clear of the newline issue, which is now addressed here.
>
> A related performance problem is that the method `Utils.normalizeNewlines`
> always created new strings, even when it was not necessary.
>
> With this change, newlines in `Text`, `TextBuilder` and other subtypes of
> `Content` containing text are always represented by `\n`, and are converted
> to the platform line separator as needed.
>
> I note the following assumptions:
>
> * the input may contain any flavor of newlines in the user-provided
> documentation comments
> * the output should aways use the platform line separator (`\n` on Linux and
> macOS, `\r\n` on Windows etc.)
>
> Together, these imply that some amount of the input will need to be scanned
> to normalize newlines at some point in the process. While it would be nice to
> not have to normalize newlines, it's a case or "pay now, or pay later".
>
> Notable parts of this change:
>
> * `Utils.normalizeNewlines` is moved to be a static method in `Text` and
> rewritten to not create new strings unless needed. This allows a number of
> questionable imports of `toolkit.utils.DocletConstants` to be removed from
> the `html.markup` package. While it would be possible to call this method
> automatically on all strings passed to `Text`, `TextBuilder` etc, in many
> cases it is not necessary: put another way, it only needs to be called on
> strings that have come from the user, either in documentation comments or in
> command-line options.
> * `Content.write`, and the implementations thereof, are given a new parameter
> which gives the newline sequence to use. This is set to `\n` for `toString`
> and the platform separator when writing to a file.
> * `DocletConstants.NL` goes away and is superseded by:
> * `Text.NL`: always `\n`, to be used when adding text to a `Content` tree
> (this replaces most uses of `DocletConstants.NL`)
> * `DocFile.PLATFORM_LINE_SEPARATOR`: the platform line separator, to be
> used when writing to files
> * The various `charCount` methods no longer have to worry about use of the
> platform line separator
> * Assertions are used to verify that there are no `\r` characters added into
> `Text`/`TextBuilder`/`RawHtml`/`Content`
>
> Other cleanup:
> * `DocletConstants.DEFAULT_TAB_STOP_LENGTH` is moved to `BaseOptions` which
> is the only class that uses it. This leaves just two remaining constants in
> `DocletConstants` which should probably be moved elsewhere as well, and the
> `DocletConstants` class deleted. That is a step too far for the work here.
> * Minor IDE-suggestions, for lambdas, enhanced switch, adding braces, etc
>
> One test was affected by the `DocletConstants.NL` change, but apart from
> that, this is a pure-cleanup change, with no (intentional) externally visible
> effect.
This looks quite good. Apart from the inline comments, I wondered about the
following:
- Would the methods and constant you moved to `Text` be better placed in
`Content` as they are used in other Content instances?
- There are quite a few uses of the `"\n"` string literal in a content or
pre-content context where the `NL` constant couuld be used.
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/markup/HtmlDocument.java
line 80:
> 78: writer.write(docType.text);
> 79: writer.write(newline);
> 80: docContent.write(writer, newline,true);
Missing space after comma
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/markup/RawHtml.java
line 161:
> 159: }
> 160: count++;
> 161: break;
The RawHtml constructor contains `assert Text.checkNewlines(rawHtml)` so I'm
not sure this is actually needed.
src/jdk.javadoc/share/classes/jdk/javadoc/internal/doclets/formats/html/markup/Text.java
line 51:
> 49: */
> 50: public static Text of(CharSequence content) {
> 51: assert checkNewlines(content);
Redundant assert as private constructor also contains it.
-------------
PR: https://git.openjdk.org/jdk/pull/9691