Hi Daniel,

At 2024-08-12T20:56:30-0300, Daniel Brigante wrote:
> I've been trying to make grohtml produce <sub> and <sup> html tags
> from an ms input without much success. From my current understanding,
> the grohtml device driver should detect both a vertical position
> change and a font size change from the device independent file (as it
> is described in the start_superscript/start_subscript function in the
> post-html.cpp file), that would cause the driver to emmit the <sub> or
> <sup> tag.

It's possible this feature is buggy or incomplete.

> When I try to force this behaviour with \v and \s calls in my ms file,
> I get the expected behaviour if I set the output device to ps, but
> when I set it to html I only get a <small> tag generated by the device
> driver. I also noticed that the "V" instructions, in the device
> independent file (using -Thtml and -Z options with groff) get removed
> by (probably, I think, the html preprocessor), only remaining the s
> instruction.

I don't have much useful advice here.  I suspect, but cannot prove, that
it's simply not possible to produce high-quality HTML by instrumenting
the formatter with a state machine, which is the approach that was
taken.  It's a bold claim, I know, and Werner Lemberg and Gaius Mulley
are capable developers, but grohtml never really got out of beta state--
it kind of halted in its tracks about 20 years ago--and I am tempted to
blame insurmountable design challenges for that.

What I would do is attack the problem at the macro package level.  Since
macros (or strings, as in ms's case) are often used to render super- and
subscripts, these could inject device control commands to give a hint to
the output driver what was going on.

For example, groff ms, defines strings like this.

.\" superscript
.ds par@sup-start \v'-.9m\s'\En[.s]*7u/10u'+.7m'
.als { par@sup-start
.ds par@sup-end \v'-.7m\s0+.9m'
.als } par@sup-end
.\" subscript
.ds par@sub-start \v'+.3m\s'\En[.s]*7u/10u'-.1m'
.als < par@sub-start
.ds par@sub-end \v'+.1m\s0-.3m'
.als > par@sub-end

It could just as well do this:

.ds par@sup-start \X'html: <sup>'\v'-.9m\s'\En[.s]*7u/10u'+.7m'
.ds par@sup-end   \v'-.7m\s0+.9m'\X'html: </sup>'
.ds par@sub-start \X'html: <sub>'\v'+.3m\s'\En[.s]*7u/10u'-.1m'
.ds par@sub-end   \v'+.1m\s0-.3m'\X'html: </sub>'

...which, with appropriate recognizers in post-grohtml, would, I think,
make it hard for the output driver to guess wrong.

There _is_ already a function in post-grohtml that attempts to recognize
super- and subscripts from context without the assistance of such tags.
But, if your experience is any indication, it's not reliable.

https://git.savannah.gnu.org/cgit/groff.git/tree/src/devices/grohtml/post-html.cpp?h=1.23.0#n4074

See particularly line 4099.

My speculation, based solely on my imagination and a degree of
familiarity with the code base rather than insight into Werner and
Gaius's plans (beyond what can be gleaned from their whitepaper[1]) is
that it was thought that the tedium of hacking up macro packages to
inject "higher-level" markup into the device-independent output to clue
in an HTML generator would not be necessary if only good enough
heuristics were written into GNU troff (the formatter) and grohtml (the
output driver).

That might be be true--but it would seem we never got heuristics that
were good enough.  Rendering tables, equations, and pic(1) diagrams as
PostScript and including them as raster images has proven particularly
painful.[2]

https://savannah.gnu.org/bugs/?60052
https://savannah.gnu.org/bugs/?62890

Furthermore, the level of demand for HTML production from "raw" groff
language input seems staggeringly low.  It doesn't seem that anyone
exists who wants to compose groff to produce HTML without availing
themselves of a macro package, and even if they don't want to use the
macro packages we ship, they can absolutely write their own macros to
do the sort of thing I spitballed above.

I don't mean to criticize, but in my opinion groff's unspectacular HTML
production story has led to some losses.  If it had been less ambitious
and focused on rendering man pages well as an initial goal, many ad hoc
grotty output scrapers would never have been written to produce HTML,
and mandoc(1) might not ever have happened.

Again, just my opinion--I wasn't there at the time.  Werner and/or Gaius
may very well have strong counterarguments that I simply haven't heard.
So if they weigh in, listen to them.

In any case I won't be tackling "groff html-ng" in the near future.  I'm
trying to finish feature changes to the formatter for 1.24 so it can
freeze, and, I hope, be released this calendar year.

Regards,
Branden

[1] https://www.gnu.org/software/groff/grohtml.pdf
[2] GNU eqn can already produce MathML.  It could use some automated
    tests.

Attachment: signature.asc
Description: PGP signature

Reply via email to