RFR: 8299080: Wrong default value of snippet lang attribute

Pavel Rappo Mon, 01 Jul 2024 06:40:08 -0700

Please review this bugfix to the way the language of a snippet is determined 
and processed.

The language of a snippet affects the form of snippet markup and enables
external syntax highlighting, such as that provided by prism.js. The language
of a snippet is
[[determined](https://docs.oracle.com/en/java/javase/22/docs/specs/javadoc/doc-comment-spec.html#snippet)](https://docs.oracle.com/en/java/javase/22/docs/specs/javadoc/doc-comment-spec.html#snippet)
as follows:

> A snippet may specify a `lang` attribute, which identifies the kind of
> content in the snippet. For an inline snippet, the default value is `java`.
> For an external snippet, the default value is derived from the extension of
> the name of the file containing the snippet's content.

There are two issues that this PR fixes. The first issue is a specification
issue. The spec says nothing about the language of a hybrid snippet, which has
features of both an inline and external snippets. It makes sense to specify
that in the absence of the `lang` attribute, the language of a hybrid snippet
is derived from the file extension. Put differently, when determining the
language, a hybrid snippet behaves like an external snippet, not like an inline
snippet.

The second issue is an implementation issue. If the `lang` attribute or the
file extension is `java` or `properties`, then the form of markup corresponds
to that language and the HTML construct modelling the snippet is attributed
with `class=language-java` or `class=language-properties` respectively. This is
expected. However, if the `lang` attribute or the file extension is neither of
those, or the `lang` attribute is default, then the form of markup is assumed
to be that of `java`, but the HTML construct modelling the snippet is not
attributed, which means that the language is not passed through to the 3rd
party syntax highlighters.

Stepping out of this PR for a moment, there is clearly a conflation between the
language of a snippet and the form of snippet markup. Those are linked and
controlled by a single knob. That and the design whereby every snippet in an
unsupported language can use markup for the Java language was purposeful: it
was considered simple and practical.

This PR proposes that the language of a snippet is determined and processed as
follows:

1. If the `lang` attribute is present, then its value is the language; if that
value is empty, then the language is undefined
2. Otherwise,
1. If the snippet is inline, then the language is `java`
2. Otherwise (i.e. the snippet is external or hybrid), the language is
determined as follows:
1. If the `class` attribute is present, then the language is `java`
2. Otherwise, the value of the `lang` attribute is assumed equal to the
extension of the file specified in the `file` attribute; if the file has no
extension or the extension cannot be determined, the language is undefined
3. If the language is `java` or `properties`, then snippet markup is processed
accordingly
4. Otherwise, snippet markup processed as if the language were `java`
5. If the language is defined as `<val>`, then HTML is attributed with
`class=language-<val>`; if the language is undefined, no such attribute is
present

-------------

Commit messages:
- Initial commit

Changes: https://git.openjdk.org/jdk/pull/19971/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19971&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8299080
Stats: 197 lines in 5 files changed: 128 ins; 28 del; 41 mod
Patch: https://git.openjdk.org/jdk/pull/19971.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/19971/head:pull/19971

PR: https://git.openjdk.org/jdk/pull/19971

RFR: 8299080: Wrong default value of snippet lang attribute

Reply via email to