Hi, On 2026-01-28 07:56:46 +0000, Pierre Ducroquet wrote: > Here is a rebased version of the patch with a rewrite of the comment. Thank > you again for your previous review. FYI, I've tried adding other passes but > none had a similar benefits over cost ratio. The benefits could rather be in > changing from O3 to an extensive list of passes.
I agree that we should have a better list of passes. I'm a bit worried that having an explicit list of passes that we manage ourselves is going to be somewhat of a pain to maintain across llvm versions, but ... WRT passes that might be worth having even with -O0 - running duplicate function merging early on could be quite useful, particularly because we won't inline the deform routines anyway. > > I did some benchmarks on some TPCH queries (1 and 4) and I got these > > results. Note that for these tests I set jit_optimize_above_cost=1000000 > > so that it force to use the default<O0> pass with simplifycfg. FYI, you can use -1 to just disble it, instead of having to rely on a specific cost. > > > > Master Q1: > > Timing: Generation 1.553 ms (Deform 0.573 ms), Inlining 0.052 ms, > > Optimization 95.571 ms, Emission 58.941 ms, Total 156.116 ms > > Execution Time: 38221.318 ms > > > > Patch Q1: > > Timing: Generation 1.477 ms (Deform 0.534 ms), Inlining 0.040 ms, > > Optimization 95.364 ms, Emission 58.046 ms, Total 154.927 ms > > Execution Time: 38257.797 ms > > > > Master Q4: > > Timing: Generation 0.836 ms (Deform 0.309 ms), Inlining 0.086 ms, > > Optimization 5.098 ms, Emission 6.963 ms, Total 12.983 ms > > Execution Time: 19512.134 ms > > > > Patch Q4: > > Timing: Generation 0.802 ms (Deform 0.294 ms), Inlining 0.090 ms, > > Optimization 5.234 ms, Emission 6.521 ms, Total 12.648 ms > > Execution Time: 16051.483 ms > > > > > > For Q4 I see a small increase on Optimization phase but we have a good > > performance improvement on execution time. For Q1 the results are almost > > the same. These queries are all simple enough that I'm not sure this is a particularly good benchmark for optimization speed. In particular, the deform routines don't have to deal with a lot of columns and there aren't a lot of functions (although I guess that shouldn't really matter WRT simplifycfg). Greetings, Andres Freund
