Please review this bugfix to the way the language of a snippet is determined 
and processed.

The language of a snippet affects the form of snippet markup and enables 
external syntax highlighting, such as that provided by prism.js. The language 
of a snippet is 
[[determined](https://docs.oracle.com/en/java/javase/22/docs/specs/javadoc/doc-comment-spec.html#snippet)](https://docs.oracle.com/en/java/javase/22/docs/specs/javadoc/doc-comment-spec.html#snippet)
 as follows:

> A snippet may specify a `lang` attribute, which identifies the kind of 
> content in the snippet. For an inline snippet, the default value is `java`. 
> For an external snippet, the default value is derived from the extension of 
> the name of the file containing the snippet's content.

There are two issues that this PR fixes. The first issue is a specification 
issue. The spec says nothing about the language of a hybrid snippet, which has 
features of both an inline and external snippets. It makes sense to specify 
that in the absence of the `lang` attribute, the language of a hybrid snippet 
is derived from the file extension. Put differently, when determining the 
language, a hybrid snippet behaves like an external snippet, not like an inline 
snippet.

The second issue is an implementation issue. If the `lang` attribute or the 
file extension is `java` or `properties`, then the form of markup corresponds 
to that language and the HTML construct modelling the snippet is attributed 
with `class=language-java` or `class=language-properties` respectively. This is 
expected. However, if the `lang` attribute or the file extension is neither of 
those, or the `lang` attribute is default, then the form of markup is assumed 
to be that of `java`, but the HTML construct modelling the snippet is not 
attributed, which means that the language is not passed through to the 3rd 
party syntax highlighters.

Stepping out of this PR for a moment, there is clearly a conflation between the 
language of a snippet and the form of snippet markup. Those are linked and 
controlled by a single knob. That and the design whereby every snippet in an 
unsupported language can use markup for the Java language was purposeful: it 
was considered simple and practical.

This PR proposes that the language of a snippet is determined and processed as 
follows:

1. If the `lang` attribute is present, then its value is the language; if that 
value is empty, then the language is undefined
2. Otherwise, 
   1. If the snippet is inline, then the language is `java`
   2. Otherwise (i.e. the snippet is external or hybrid), the language is 
determined as follows:
      1. If the `class` attribute is present, then the language is `java`
      2. Otherwise, the value of the `lang` attribute is assumed equal to the 
extension of the file specified in the `file` attribute; if the file has no 
extension or the extension cannot be determined, the language is undefined
3. If the language is `java` or `properties`, then snippet markup is processed 
accordingly
4. Otherwise, snippet markup processed as if the language were `java`
5. If the language is defined as `<val>`, then HTML is attributed with 
`class=language-<val>`; if the language is undefined, no such attribute is 
present

-------------

Commit messages:
 - Initial commit

Changes: https://git.openjdk.org/jdk/pull/19971/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19971&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8299080
  Stats: 197 lines in 5 files changed: 128 ins; 28 del; 41 mod
  Patch: https://git.openjdk.org/jdk/pull/19971.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19971/head:pull/19971

PR: https://git.openjdk.org/jdk/pull/19971

Reply via email to