aganea added a comment. Thanks for working on this @russell.gallop!
I've reproduced your tests, please see below. The only difference is that I've used a ThinLTO build for stage2: -DCMAKE_CXX_FLAGS="/GS- -Xclang -O3 -fstrict-aliasing -march=skylake-avx512 -flto=thin -fwhole-program-vtables" Running with `/opt:lldltojobs=all` no `/lldltocache`. Results on a 36-core (dual mount) Xeon Gold 6140. (WinHeap vs. Scudo+options) D:\llvm-project>hyperfine -m 3 -w 1 "cd d:\llvm-project\buildninjaRelWinHeap3 && d:\llvm-project\buildninjaRelWinHeap2\bin\lld-link.exe @CMakeFiles\clang.rsp" "cd d:\llvm-project\buildninjaRelScudo3 && d:\llvm-project\buildninjaRelScudo2\bin\lld-link.exe @CMakeFiles\clang.rsp" Benchmark #1: cd d:\llvm-project\buildninjaRelWinHeap3 && d:\llvm-project\buildninjaRelWinHeap2\bin\lld-link.exe @CMakeFiles\clang.rsp Time (mean ± σ): 664.086 s ± 18.740 s [User: 0.0 ms, System: 0.0 ms] 5 Range (min … max): 647.070 s … 684.172 s 3 runs Benchmark #2: cd d:\llvm-project\buildninjaRelScudo3 && d:\llvm-project\buildninjaRelScudo2\bin\lld-link.exe @CMakeFiles\clang.rsp Time (mean ± σ): 145.619 s ± 0.140 s [User: 0.0 ms, System: 8.1 ms] 0 Range (min … max): 145.522 s … 145.779 s 3 runs Summary 'cd d:\llvm-project\buildninjaRelScudo3 && d:\llvm-project\buildninjaRelScudo2\bin\lld-link.exe @CMakeFiles\clang.rsp' ran 4.56 ± 0.13 times faster than 'cd d:\llvm-project\buildninjaRelWinHeap3 && d:\llvm-project\buildninjaRelWinHeap2\bin\lld-link.exe @CMakeFiles\clang.rsp' (Scudo+options vs. Rpmalloc) D:\llvm-project>hyperfine -m 3 -w 1 "cd d:\llvm-project\buildninjaRelRpmalloc3 && d:\llvm-project\buildninjaRelRpMalloc2\bin\lld-link.exe @CMakeFiles\clang.rsp" "cd d:\llvm-project\buildninjaRelScudo3 && d:\llvm-project\buildninjaRelScudo2\bin\lld-link.exe @CMakeFiles\clang.rsp" Benchmark #1: cd d:\llvm-project\buildninjaRelRpmalloc3 && d:\llvm-project\buildninjaRelRpMalloc2\bin\lld-link.exe @CMakeFiles\clang.rsp Time (mean ± σ): 95.423 s ± 0.830 s [User: 0.0 ms, System: 9.0 ms] 0 Range (min … max): 94.886 s … 96.380 s 3 runs Benchmark #2: cd d:\llvm-project\buildninjaRelScudo3 && d:\llvm-project\buildninjaRelScudo2\bin\lld-link.exe @CMakeFiles\clang.rsp Time (mean ± σ): 145.266 s ± 0.387 s [User: 4.9 ms, System: 7.6 ms] 6 Range (min … max): 144.894 s … 145.666 s 3 runs Summary 'cd d:\llvm-project\buildninjaRelRpmalloc3 && d:\llvm-project\buildninjaRelRpMalloc2\bin\lld-link.exe @CMakeFiles\clang.rsp' ran 1.52 ± 0.01 times faster than 'cd d:\llvm-project\buildninjaRelScudo3 && d:\llvm-project\buildninjaRelScudo2\bin\lld-link.exe @CMakeFiles\clang.rsp' (Scudo vs. Rpmalloc) D:\llvm-project>hyperfine -m 3 -w 1 "cd d:\llvm-project\buildninjaRelRpmalloc3 && d:\llvm-project\buildninjaRelRpMalloc2\bin\lld-link.exe @CMakeFiles\clang.rsp" "cd d:\llvm-project\buildninjaRelScudo3 && d:\llvm-project\buildninjaRelScudo2\bin\lld-link.exe @CMakeFiles\clang.rsp" Benchmark #1: cd d:\llvm-project\buildninjaRelRpmalloc3 && d:\llvm-project\buildninjaRelRpMalloc2\bin\lld-link.exe @CMakeFiles\clang.rsp Time (mean ± σ): 95.435 s ± 0.059 s [User: 0.0 ms, System: 8.0 ms] 0 Range (min … max): 95.385 s … 95.499 s 3 runs Benchmark #2: cd d:\llvm-project\buildninjaRelScudo3 && d:\llvm-project\buildninjaRelScudo2\bin\lld-link.exe @CMakeFiles\clang.rsp Time (mean ± σ): 270.967 s ± 1.366 s [User: 4.8 ms, System: 0.0 ms] 0 Range (min … max): 269.397 s … 271.887 s 3 runs Summary 'cd d:\llvm-project\buildninjaRelRpmalloc3 && d:\llvm-project\buildninjaRelRpMalloc2\bin\lld-link.exe @CMakeFiles\clang.rsp' ran 2.84 ± 0.01 times faster than 'cd d:\llvm-project\buildninjaRelScudo3 && d:\llvm-project\buildninjaRelScudo2\bin\lld-link.exe @CMakeFiles\clang.rsp' Summary: | | Time | Factor | | WinHeap | 664.086 s ± 18.740 s | 1.0 | | Scudo | 270.967 s ± 1.366 s | 2.45 | | Scudo+options | 145.619 s ± 0.140 s | 4.56 | | Rpmalloc | 95.423 s ± 0.830 s | 6.95 | | CPU usage: Rpmaloc - 3,944 cumulated seconds (all threads) F12940831: image.png <https://reviews.llvm.org/F12940831> Scudo+options - 6,337 cumulated seconds (all threads) F12940833: image.png <https://reviews.llvm.org/F12940833> Time spent in the allocator itself (note the different vertical scale in the graph) (a hardware CRC or AES implemention will certainly help for Scudo) Rpmalloc - 191 cumulated seconds F12940851: image.png <https://reviews.llvm.org/F12940851> Scudo+options - 1,171 cumulated seconds F12940855: image.png <https://reviews.llvm.org/F12940855> Memory usage: Rpmalloc - Peaks at 11 GB commit (19 GB mapped) F12940870: image.png <https://reviews.llvm.org/F12940870> Scudo+options - Peaks at 5 GB commit (although 4.4 TB of mapped pages!!!) F12940882: image.png <https://reviews.llvm.org/F12940882> Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D86694/new/ https://reviews.llvm.org/D86694 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits