On Sat, 30 Jul 2022 00:36:06 GMT, Jonathan Gibbons <[email protected]> wrote:
> Please review a medium-size cleanup related to handling newlines in > `Content`/`HtmlTree`/`Text[Builder]` trees. > This is the 3rd and last in the recent series of cleanup tasks in the code to > generate HTML. > > Until recently, the `Text`/`TextBuilder` objects were "ready-for-writing", > meaning that characters like `<`, `&`, etc were encoded as entities, and > newlines were represented by the platform line separator. This caused > difficulties when counting the size of generated text, since these characters > had to be detected and accounted for. A recent change addressed entities, > but stayed clear of the newline issue, which is now addressed here. > > A related performance problem is that the method `Utils.normalizeNewlines` > always created new strings, even when it was not necessary. > > With this change, newlines in `Text`, `TextBuilder` and other subtypes of > `Content` containing text are always represented by `\n`, and are converted > to the platform line separator as needed. > > I note the following assumptions: > > * the input may contain any flavor of newlines in the user-provided > documentation comments > * the output should aways use the platform line separator (`\n` on Linux and > macOS, `\r\n` on Windows etc.) > > Together, these imply that some amount of the input will need to be scanned > to normalize newlines at some point in the process. While it would be nice to > not have to normalize newlines, it's a case or "pay now, or pay later". > > Notable parts of this change: > > * `Utils.normalizeNewlines` is moved to be a static method in `Text` and > rewritten to not create new strings unless needed. This allows a number of > questionable imports of `toolkit.utils.DocletConstants` to be removed from > the `html.markup` package. While it would be possible to call this method > automatically on all strings passed to `Text`, `TextBuilder` etc, in many > cases it is not necessary: put another way, it only needs to be called on > strings that have come from the user, either in documentation comments or in > command-line options. > * `Content.write`, and the implementations thereof, are given a new parameter > which gives the newline sequence to use. This is set to `\n` for `toString` > and the platform separator when writing to a file. > * `DocletConstants.NL` goes away and is superseded by: > * `Text.NL`: always `\n`, to be used when adding text to a `Content` tree > (this replaces most uses of `DocletConstants.NL`) > * `DocFile.PLATFORM_LINE_SEPARATOR`: the platform line separator, to be > used when writing to files > * The various `charCount` methods no longer have to worry about use of the > platform line separator > * Assertions are used to verify that there are no `\r` characters added into > `Text`/`TextBuilder`/`RawHtml`/`Content` > > Other cleanup: > * `DocletConstants.DEFAULT_TAB_STOP_LENGTH` is moved to `BaseOptions` which > is the only class that uses it. This leaves just two remaining constants in > `DocletConstants` which should probably be moved elsewhere as well, and the > `DocletConstants` class deleted. That is a step too far for the work here. > * Minor IDE-suggestions, for lambdas, enhanced switch, adding braces, etc > > One test was affected by the `DocletConstants.NL` change, but apart from > that, this is a pure-cleanup change, with no (intentional) externally visible > effect. This pull request has now been integrated. Changeset: 1b756bfa Author: Jonathan Gibbons <[email protected]> URL: https://git.openjdk.org/jdk/commit/1b756bfa3a1be2004af3ea12e86199ca0604c841 Stats: 246 lines in 26 files changed: 77 ins; 76 del; 93 mod 8236048: Cleanup use of Utils.normalizeNewlines Reviewed-by: hannesw ------------- PR: https://git.openjdk.org/jdk/pull/9691
