[luatex] runtime performance of lua(la)tex incl. gprof

Axel Kittenberger Tue, 20 Jun 2023 06:14:39 -0700

Hello,

First let me say, I really don't want this to be a complaint. I'm just
wondering.


I considered switching my department's default compiler from pdflatex to
lualatex.
Some subtle differences were to be expected and with test documents so far
easily catered for.
The output is okay.

However what surprised me is a complicated test document which took ~150
seconds with pdflatex now takes 210 seconds with lualatex.

Trying to figure out if this is some of the many packages it does, I kept
simplifying.

--- laliptest.tex ---
\documentclass{article}
\input{plipsum}
\begin{document}
\newcount\ii
\ii=100
\loop
  \lipsum{1-100}
  \advance\ii-1
\ifnum \ii>0
\repeat
\end{document}
---------

This most simple document doesn't use any package, but plipsum which can be
replaced with plain text too. Compile time results:

pdflatex: user 0m1.920s (3.1 MB result)
lualatex: user 0m17.565s (3.8 MB result)

8 times slower.

Versions tested with:
pdfTeX 3.141592653-2.6-1.40.24 (TeX Live 2022/Debian)
This is LuaHBTeX, Version 1.15.0 (TeX Live 2022/Debian)

Since LaTeX also includes a lot of stuff already, same tests with plain TeX.

--- liptest.tex ---
\input{plipsum}
\newcount\i
\i=100
\loop
  \lipsum{1-100}
  \advance\i-1
\ifnum \i>0
\repeat
\end
---------
pdftex: user 0m1.053s (2.9 MB result)
luatex: user 0m1.943s (3.1 MB result)

This isn't as bad as the LaTeX variants, but still almost a factor two.
Searching about this online turns up results about microtype or front
loading etc.
Both cannot be an issue, since microtype is off and frontloading must be a
fixed offset, but the compile time increases linearly with document length.

This now took me a while, but I managed to compile luatex with "-gp" to
create a gprof profile and this is
the result:
----------
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 14.63      0.42     0.42  2409555     0.00     0.00  longest_match
  8.71      0.67     0.25   295700     0.00     0.00  hnj_hyphen_hyphenate
  8.19      0.91     0.24 52832741     0.00     0.00  get_sa_item
  6.62      1.10     0.19      773     0.00     0.00  deflate_slow
  3.48      1.20     0.10 30117352     0.00     0.00  char_info
  2.79      1.28     0.08    10000     0.00     0.00  ext_do_line_break
  2.79      1.36     0.08      773     0.00     0.00  compress_block
  2.09      1.42     0.06  2978422     0.00     0.00  calc_pdfpos
  2.09      1.48     0.06   515855     0.00     0.00  handle_lig_word
  1.74      1.53     0.05 14032575     0.00     0.00  char_exists
  1.74      1.58     0.05  4689611     0.00     0.00  flush_node
  1.74      1.63     0.05  2896557     0.00     0.00  output_one_char
  1.74      1.68     0.05   227877     0.00     0.00  hash_normalized
  1.74      1.73     0.05    41510     0.00     0.00  hlist_out
  1.74      1.78     0.05    23020     0.00     0.00  fix_node_list
  1.74      1.83     0.05     2319     0.00     0.00  adler32_z
  1.39      1.87     0.04   227877     0.00     0.00  hash_insert_normalized
  1.39      1.91     0.04    39615     0.00     0.00  fm_scan_line
  1.39      1.95     0.04    11510     0.00     0.00  hnj_hyphenation
  1.05      1.98     0.03  3831639     0.00     0.00  get_x_token
  1.05      2.01     0.03  2896557     0.00     0.00  get_charinfo_whd
  1.05      2.04     0.03  2382502     0.00     0.00  add_kern_before
  1.05      2.07     0.03   303962     0.00     0.00  luaS_hash
  1.05      2.10     0.03    10000     0.00     0.00  ext_post_line_break
-------
So it's not like there is one function that takes the bulk of the slowdown
as I expected (and often
happens in reality an innocent looking small thing takes so much)

longest_match() is something from zlib.

I'm just really surprised, I keep following this project for a while now,
since I consider it highly interesting and thought since I read one of the
major steps was rewriting the TeX core from somewhat idiosyncratic WEB to
C, I expected it to be even a bit faster...

And this is the profile of pdftex in comparison.
----------
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  nam
 29.48      0.51     0.51  2362906     0.00     0.00  longest_match
 13.29      0.74     0.23  5876210     0.00     0.00  zdividescaled
 11.56      0.94     0.20      775     0.00     0.00  deflate_slow
  4.62      1.02     0.08    41510     0.00     0.00  pdfhlistout
  4.62      1.10     0.08      774     0.00     0.00  compress_block
  3.47      1.16     0.06        1     0.06     1.59  maincontrol
  2.89      1.21     0.05      423     0.00     0.00  inflate_fast
  2.31      1.25     0.04   227877     0.00     0.00  hash_insert_normalized
  2.31      1.29     0.04    41510     0.00     0.00  zhpack
  1.73      1.32     0.03 17821585     0.00     0.00  zeffectivechar
  1.73      1.35     0.03   825830     0.00     0.00  zpdfprintint
  1.73      1.38     0.03   260088     0.00     0.00  read_line
  1.73      1.41     0.03   223361     0.00     0.00  pqdownheap
  1.73      1.44     0.03    39615     0.00     0.00  fm_scan_line
  1.45      1.47     0.03  1274937     0.00     0.00  zgetnode
  1.16      1.49     0.02  2896157     0.00     0.00  zadvcharwidth
  1.16      1.51     0.02   579800     0.00     0.00  ztrybreak
  1.16      1.53     0.02   227877     0.00     0.00  hash_normalized
  1.16      1.55     0.02    26742     0.00     0.00  zflushnodelist
  0.87      1.56     0.02  1274936     0.00     0.00  zfreenode
  0.58      1.57     0.01  4160738     0.00     0.00  getnext
  0.58      1.58     0.01  3419912     0.00     0.00  zgetautokern
  0.58      1.59     0.01  2896161     0.00     0.00  hasfmentry
  0.58      1.60     0.01  2896159     0.00     0.00  isscalable
  0.58      1.61     0.01  2896157     0.00     0.00  zpdfprintchar
--------
Both weren't exactly the same version as tested previously, I self compiled
the newest texlive tagged as release.
(This is LuaTeX, Version 1.16.0 (TeX Live 2023))
(pdfTeX 3.141592653-2.6-1.40.25 (TeX Live 2023))

Runtimes when compiled with -O3 are almost the same as the native debian
above, and I profiled the plain TeX variants only.

So zlib also takes a bulk, in relation even larger. So not the culprit.
Different implementation of hyphenation seems to be one factor I'd "blame"

Turning it off with \language=255 improves it:

pdftex: user 0m1.029s
luatex: user 0m1.596s

but there is still more.
which is get_sa_item()/char_info().

And reading the comments managed-sa.c, it seems main the issue is being
sparse? So I guess the way to improve that would be to be not sparse?

Anyway, that was my report to this, unfortunately I'm holding off pushing
it as the new default compiler for us, since the slowdown is a bad sell for
something which only sometimes is userful.

PS: personally I use Lua to calculate some complexer drawing for tikz, as
using a "normal" programming
language is much easier to me than doing more complicated pgf macros. But
also in the end it just generates .tex code, which I simply  feed into
pdflatex, it's only this gets complicated which files people ought to
change and which are autogenerated .tex files.

Kind regards, Axel

[luatex] runtime performance of lua(la)tex incl. gprof

Reply via email to