Hi folks,

An Arch Linux user has reported a performance regression in man page
rendering in groff 1.24.  For "small" man pages (or collections
thereof), it's not noticeable, but the reported degradation is quadratic
in large inputs, roughly twice as bad as one would expect for 25 copies
in a row of the gcc man page.

I'm not able to reproduce the problem.  Since I don't have the gcc man
page handy (it's not DFSG-free and my system is Debian-based), I used up
to 50 copies of the bash 5.3 man page, which uncompressed is...

$ wc $(man -w bash)
13495  64023 392678 /home/branden/share/man/man1/bash.1

...so 50 copies is almost 20 megabytes of input.

Interestingly the reporter has narrowed the problem down to a single
commit, and can recover groff 1.23's performance by simply pointing
groff 1.24 to 1.23's macro files.

That suggests a humdinger of a macro programming problem...but when I
look at the identified regressing commit, it's hard for me to see how.

https://cgit.git.savannah.gnu.org/cgit/groff.git/commit/?id=732b07d4998bec1cc942481e7cf4e7287050c40b

One thing you'll notice about this commit is that it substantially
_reduces_ the amount of macro code being interpreted.  Other things
being equal, you'd expect that to _improve_ performance.

Obviously other things aren't equal.  But there's no change in
cyclomatic complexity--meaning, no loops are added or removed, and more
to the point for a macro-oriented language like *roff, there's no change
to recursive macro calls.  (I don't think our man(7) package has _any_
recursive macro calls, and I don't see any in this diff.)

I have only one guess about a culprit here, and I don't think it's a
very good one.  See this bit at the end?

+.\" In continuous rendering mode, make page breaks less potent and the
+.\" page length "infinite".
+.if \n[cR] \{\
+.  rn bp an*real-bp
+.  rn an*bp bp
+.  pl \n[.R]u/1v
+.\}

That division operation gave me pause for a moment, because `.R` is
guaranteed to have a honking large value.

This seems like an unpromising site for exploration, though, for two
reasons.

1.  GNU troff doesn't implement its own divider.  It translates its own
    arithmetic language to C++ and relies on the language runtime to
    perform the actual operations.  This of course gets compiled to
    assembly langage and handed off to the CPU.  The reporter didn't
    mention what machine architecture they're using, but I'm guessing
    it's one with a hardware divider.

2.  This computation done only once per load of the macro package.  That
    means, if you use "-mandoc", it will happen once at every switch to
    man(7) to mdoc(7) and vice versa.  If no such switch occurs, or if
    all your documents use the same macro package, then it happens once,
    period.  A constant-order factor cannot create a quadratic
    performance degradation.

It's certainly possible that `.R` is still implicated somehow--it, and
the prerequisite change to adopt saturating arithmetic in GNU troff, is
the "deepest" change I've ever made to the formatter.  Possibly the
arithmetic is fine but there's something unfortunate elsewhere in the
formatter that creates needless churn when the distance to the next
vertical position trap is gigantic.  (If that's true, then if we fix
that, we'll be fixing it for many more applications than just man
pages.)  But right now I don't see any _evidence_ of that.

So, can anyone reproduce this problem and supply the necessary evidence?

The Savannah ticket has scripts for performing the stress test with the
bash man page.  And also auxiliary scripts for producing a diagram of
comparative groff 1.23 and groff 1.24 macro files' performance with
gnuplot, if you're interested in doing that.

Thanks in advance.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature

  • ... G. Branden Robinson
    • ... G. Branden Robinson
      • ... Morten Bo Johansen
        • ... G. Branden Robinson
          • ... Morten Bo Johansen
            • ... G. Branden Robinson
              • ... Morten Bo Johansen
                • ... G. Branden Robinson
              • ... Deri via discussion of the GNU roff typesetting system and related software
                • ... G. Branden Robinson

Reply via email to