Hello, First let me say, I really don't want this to be a complaint. I'm just wondering.
I considered switching my department's default compiler from pdflatex to lualatex. Some subtle differences were to be expected and with test documents so far easily catered for. The output is okay. However what surprised me is a complicated test document which took ~150 seconds with pdflatex now takes 210 seconds with lualatex. Trying to figure out if this is some of the many packages it does, I kept simplifying. --- laliptest.tex --- \documentclass{article} \input{plipsum} \begin{document} \newcount\ii \ii=100 \loop \lipsum{1-100} \advance\ii-1 \ifnum \ii>0 \repeat \end{document} --------- This most simple document doesn't use any package, but plipsum which can be replaced with plain text too. Compile time results: pdflatex: user 0m1.920s (3.1 MB result) lualatex: user 0m17.565s (3.8 MB result) 8 times slower. Versions tested with: pdfTeX 3.141592653-2.6-1.40.24 (TeX Live 2022/Debian) This is LuaHBTeX, Version 1.15.0 (TeX Live 2022/Debian) Since LaTeX also includes a lot of stuff already, same tests with plain TeX. --- liptest.tex --- \input{plipsum} \newcount\i \i=100 \loop \lipsum{1-100} \advance\i-1 \ifnum \i>0 \repeat \end --------- pdftex: user 0m1.053s (2.9 MB result) luatex: user 0m1.943s (3.1 MB result) This isn't as bad as the LaTeX variants, but still almost a factor two. Searching about this online turns up results about microtype or front loading etc. Both cannot be an issue, since microtype is off and frontloading must be a fixed offset, but the compile time increases linearly with document length. This now took me a while, but I managed to compile luatex with "-gp" to create a gprof profile and this is the result: ---------- Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 14.63 0.42 0.42 2409555 0.00 0.00 longest_match 8.71 0.67 0.25 295700 0.00 0.00 hnj_hyphen_hyphenate 8.19 0.91 0.24 52832741 0.00 0.00 get_sa_item 6.62 1.10 0.19 773 0.00 0.00 deflate_slow 3.48 1.20 0.10 30117352 0.00 0.00 char_info 2.79 1.28 0.08 10000 0.00 0.00 ext_do_line_break 2.79 1.36 0.08 773 0.00 0.00 compress_block 2.09 1.42 0.06 2978422 0.00 0.00 calc_pdfpos 2.09 1.48 0.06 515855 0.00 0.00 handle_lig_word 1.74 1.53 0.05 14032575 0.00 0.00 char_exists 1.74 1.58 0.05 4689611 0.00 0.00 flush_node 1.74 1.63 0.05 2896557 0.00 0.00 output_one_char 1.74 1.68 0.05 227877 0.00 0.00 hash_normalized 1.74 1.73 0.05 41510 0.00 0.00 hlist_out 1.74 1.78 0.05 23020 0.00 0.00 fix_node_list 1.74 1.83 0.05 2319 0.00 0.00 adler32_z 1.39 1.87 0.04 227877 0.00 0.00 hash_insert_normalized 1.39 1.91 0.04 39615 0.00 0.00 fm_scan_line 1.39 1.95 0.04 11510 0.00 0.00 hnj_hyphenation 1.05 1.98 0.03 3831639 0.00 0.00 get_x_token 1.05 2.01 0.03 2896557 0.00 0.00 get_charinfo_whd 1.05 2.04 0.03 2382502 0.00 0.00 add_kern_before 1.05 2.07 0.03 303962 0.00 0.00 luaS_hash 1.05 2.10 0.03 10000 0.00 0.00 ext_post_line_break ------- So it's not like there is one function that takes the bulk of the slowdown as I expected (and often happens in reality an innocent looking small thing takes so much) longest_match() is something from zlib. I'm just really surprised, I keep following this project for a while now, since I consider it highly interesting and thought since I read one of the major steps was rewriting the TeX core from somewhat idiosyncratic WEB to C, I expected it to be even a bit faster... And this is the profile of pdftex in comparison. ---------- Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call nam 29.48 0.51 0.51 2362906 0.00 0.00 longest_match 13.29 0.74 0.23 5876210 0.00 0.00 zdividescaled 11.56 0.94 0.20 775 0.00 0.00 deflate_slow 4.62 1.02 0.08 41510 0.00 0.00 pdfhlistout 4.62 1.10 0.08 774 0.00 0.00 compress_block 3.47 1.16 0.06 1 0.06 1.59 maincontrol 2.89 1.21 0.05 423 0.00 0.00 inflate_fast 2.31 1.25 0.04 227877 0.00 0.00 hash_insert_normalized 2.31 1.29 0.04 41510 0.00 0.00 zhpack 1.73 1.32 0.03 17821585 0.00 0.00 zeffectivechar 1.73 1.35 0.03 825830 0.00 0.00 zpdfprintint 1.73 1.38 0.03 260088 0.00 0.00 read_line 1.73 1.41 0.03 223361 0.00 0.00 pqdownheap 1.73 1.44 0.03 39615 0.00 0.00 fm_scan_line 1.45 1.47 0.03 1274937 0.00 0.00 zgetnode 1.16 1.49 0.02 2896157 0.00 0.00 zadvcharwidth 1.16 1.51 0.02 579800 0.00 0.00 ztrybreak 1.16 1.53 0.02 227877 0.00 0.00 hash_normalized 1.16 1.55 0.02 26742 0.00 0.00 zflushnodelist 0.87 1.56 0.02 1274936 0.00 0.00 zfreenode 0.58 1.57 0.01 4160738 0.00 0.00 getnext 0.58 1.58 0.01 3419912 0.00 0.00 zgetautokern 0.58 1.59 0.01 2896161 0.00 0.00 hasfmentry 0.58 1.60 0.01 2896159 0.00 0.00 isscalable 0.58 1.61 0.01 2896157 0.00 0.00 zpdfprintchar -------- Both weren't exactly the same version as tested previously, I self compiled the newest texlive tagged as release. (This is LuaTeX, Version 1.16.0 (TeX Live 2023)) (pdfTeX 3.141592653-2.6-1.40.25 (TeX Live 2023)) Runtimes when compiled with -O3 are almost the same as the native debian above, and I profiled the plain TeX variants only. So zlib also takes a bulk, in relation even larger. So not the culprit. Different implementation of hyphenation seems to be one factor I'd "blame" Turning it off with \language=255 improves it: pdftex: user 0m1.029s luatex: user 0m1.596s but there is still more. which is get_sa_item()/char_info(). And reading the comments managed-sa.c, it seems main the issue is being sparse? So I guess the way to improve that would be to be not sparse? Anyway, that was my report to this, unfortunately I'm holding off pushing it as the new default compiler for us, since the slowdown is a bad sell for something which only sometimes is userful. PS: personally I use Lua to calculate some complexer drawing for tikz, as using a "normal" programming language is much easier to me than doing more complicated pgf macros. But also in the end it just generates .tex code, which I simply feed into pdflatex, it's only this gets complicated which files people ought to change and which are autogenerated .tex files. Kind regards, Axel