jurahul wrote: I did 2 sets of experiments, but data wise I am inconclusive if this causes a real compile time regression.
1. Build MLIR verbose and capture all mlir-gen command lines to a file: ninja -C build check-mlir --verbose | tee build_log.txt grep "NATIVE/bin/mlir-tblgen " build_log.txt | cut -d ' ' -f 2- > mlir-tablegen-commands.txt 2. Build both baseline and new versions of LLVM/MLIR in 2 different paths "upstream_clean" and "upstream_llvm" 3. Use attached script to run these captured commands with --time-phases and measure total time. 4. Establish baseline variance, by running the script comparing baseline to itself. Total time 4.2302 4.2573 0.6406 So baseline variance is 0.6%, with each command running 20 times. Note that for individual targets, the variance is quite high for some of them, upto 100%. 5. Establish "new" variance, by running script to compare new to itself Total time 4.2829 4.2531 -0.6958 Again, 0.6% variance. 6. Run baseline against new: Total time 4.1745 4.2864 2.6806 So this seems to give 2.6% regression. However, the individual data is quite noisy. For example, for individual samples, the variance can be quite high, upto 100%. 7. Add a FormatVariadic benchmark to test format() with 1-5 substitutions (which covers the common usage in LLVM), and run baseline and new: ./build/benchmarks/FormatVariadic --benchmark_repetitions=20 Baseline: BM_FormatVariadic_mean 1063 ns 1063 ns 20 New: BM_FormatVariadic_mean 1097 ns 1097 ns 20 This is ~3.2% regression in just formatv. The benchmark I added was: ```C++ #include "benchmark/benchmark.h" #include "llvm/Support/FormatVariadic.h" using namespace llvm; // Benchmark intrinsic lookup from a variety of targets. static void BM_FormatVariadic(benchmark::State &state) { for (auto _ : state) { // Exercise formatv() with several valid replacement options. formatv("{0}", 1).str(); formatv("{0}{1}", 1, 1).str(); formatv("{0}{1}{2}", 1, 1, 1).str(); formatv("{0}{1}{2}{3}", 1, 1, 1, 1).str(); formatv("{0}{1}{2}{3}{4}", 1, 1, 1, 1, 1).str(); } } BENCHMARK(BM_FormatVariadic); BENCHMARK_MAIN(); ``` The compile time data collected from mlir-tblgen runs is quite noisy for individual targets, though the aggregated results seem stable, but I wonder if that means that its not really capturing small compile time delta correctly. As an example: ``` lir/Dialect/MemRef/IR/MemRefOps.cpp.inc 0.0106 0.0119 12.2642% mlir/include/mlir/IR/BuiltinOps.cpp.inc 0.0048 0.0042 -12.5000% ``` So within the same run, for one target its +12% and for another its -12%. The other line of thinking is that this validation is an aid to developers, so enabling it just in Debug builds may be good enough to catch issues. I am attaching the script and the capture mlit-tblgen commands used in the script below [mlir-tablegen-commands.txt](https://github.com/user-attachments/files/16770614/mlir-tablegen-commands.txt) [ct_formatv.txt](https://github.com/user-attachments/files/16770618/ct_formatv.txt) https://github.com/llvm/llvm-project/pull/105745 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits