Hi

Here is a rebased version of the patch with a rewrite of the comment.
Thank you again for your previous review.
FYI, I've tried adding other passes but none had a similar benefits over cost 
ratio. The benefits could rather be in changing from O3 to an extensive list of 
passes.


Le jeudi 22 janvier 2026 à 10:41 PM, Matheus Alcantara 
<[email protected]> a écrit :

> On Thu Jan 22, 2026 at 5:27 PM -03, Pierre Ducroquet wrote:
> 
> > > The patch needs a rebase due to e5d99b4d9ef.
> > > 
> > > You've added the "simplifycfg" only when the "jit_optimize_above_cost"
> > > is not triggered which will use the default<O0> and mem2reg passes, the
> > > 
> > > default<O3> pass already include "simplifycfg"?
> > > 
> > > With e5d99b4d9ef being committed, should we add "simplifycfg" when
> > > PGJIT_INLINE bit is set since it also use the default<O0> and mem2reg
> > > 
> > > passes?
> > 
> > Hi
> > 
> > Thank you, here is a rebased version of the patch.
> > To answer your questions:
> > - O3 already includes simplifycfg, so no need to modify O3
> > - any code generated by our llvmjit provider, esp. tuple deforming, is 
> > heavily dependent on simplifycfg, so when O0 is the basis we should always 
> > add this pass
> 
> 
> Thanks for confirming.
> 
> I did some benchmarks on some TPCH queries (1 and 4) and I got these
> results. Note that for these tests I set jit_optimize_above_cost=1000000
> so that it force to use the default<O0> pass with simplifycfg.
> 
> 
> Master Q1:
> Timing: Generation 1.553 ms (Deform 0.573 ms), Inlining 0.052 ms, 
> Optimization 95.571 ms, Emission 58.941 ms, Total 156.116 ms
> Execution Time: 38221.318 ms
> 
> Patch Q1:
> Timing: Generation 1.477 ms (Deform 0.534 ms), Inlining 0.040 ms, 
> Optimization 95.364 ms, Emission 58.046 ms, Total 154.927 ms
> Execution Time: 38257.797 ms
> 
> Master Q4:
> Timing: Generation 0.836 ms (Deform 0.309 ms), Inlining 0.086 ms, 
> Optimization 5.098 ms, Emission 6.963 ms, Total 12.983 ms
> Execution Time: 19512.134 ms
> 
> Patch Q4:
> Timing: Generation 0.802 ms (Deform 0.294 ms), Inlining 0.090 ms, 
> Optimization 5.234 ms, Emission 6.521 ms, Total 12.648 ms
> Execution Time: 16051.483 ms
> 
> 
> For Q4 I see a small increase on Optimization phase but we have a good
> performance improvement on execution time. For Q1 the results are almost
> the same.
> 
> I did not find any major regression using simplifycfg pass and I think
> that it make sense to enable since it generate better IR code for LLVM
> to compile without too much costs. +1 for this patch.
> 
> Perhaps we could merge the comments on if/else block to include the
> simplifycfg, what do you think?
> 
> + /*
> + * Determine the LLVM pass pipeline to use. For OPT3 we use the standard
> + * suite. For lower optimization levels, we explicitly include mem2reg to
> + * promote stack variables, simplifycfg to clean up the control flow , and
> + * optionally the inliner if the flag is set. Note that default<O0> already
> 
> + * includes the always-inline pass.
> + */
> if (context->base.flags & PGJIT_OPT3)
> 
> passes = "default<O3>";
> 
> else if (context->base.flags & PGJIT_INLINE)
> 
> - /* if doing inlining, but no expensive optimization, add inline pass */
> passes = "default<O0>,mem2reg,simplifycfg,inline";
> 
> else
> - /* default<O0> includes always-inline pass */
> 
> passes = "default<O0>,mem2reg,simplifycfg";
> 
> 
> --
> Matheus Alcantara
> EDB: https://www.enterprisedb.com
> 
From 4f75fcc65137a757afac980dd9fb9718bc8dc6eb Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet <[email protected]>
Date: Wed, 7 Jan 2026 15:43:19 +0100
Subject: [PATCH 1/2] llvmjit: always use the simplifycfg pass

The simplifycfg pass will remove empty or unreachable LLVM basic blocks,
and merge blocks together when possible.
This is important because the tuple  deforming code will generate a lot of
basic blocks, and previously with O0 we did not run this pass, thus creating
this kind of (amd64) machine code:
   0x723382b781c1:      jmp    0x723382b781c3
   0x723382b781c3:      jmp    0x723382b781eb
   0x723382b781c5:      mov    -0x20(%rsp),%rax
   0x723382b781..:      ...    .....
   0x723382b781e7:      mov    %cx,(%rax)
   0x723382b781ea:      ret
   0x723382b781eb:      jmp    0x723382b781ed
   0x723382b781ed:      jmp    0x723382b781ef
   0x723382b781ef:      jmp    0x723382b781f1
   0x723382b781f1:      jmp    0x723382b781f3
   0x723382b781f3:      mov    -0x30(%rsp),%rax
   0x723382b781..:      ...    ......
   0x723382b78208:      mov    %rcx,(%rax)
   0x723382b7820b:      jmp    0x723382b781c5

This is not efficient at all, and triggering the simplifycfg pass ends up
tacking a few hundreds micro seconds while possibly saving much more time
during execution. On a basic benchmark, I saved 7ms on query runtime while
using 0.2ms on extra JIT compilation overhead
---
 src/backend/jit/llvm/llvmjit.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 2e8aa4749db..491968d8b12 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -633,6 +633,11 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 	{
 		/* we rely on mem2reg heavily, so emit even in the O0 case */
 		LLVMAddPromoteMemoryToRegisterPass(llvm_fpm);
+		/*
+		 * the tuple deforming generates a lot of basic blocks,
+		 * simplify them even with O0
+		 */
+		LLVMAddCFGSimplificationPass(llvm_fpm);
 	}
 
 	LLVMPassManagerBuilderPopulateFunctionPassManager(llvm_pmb, llvm_fpm);
@@ -672,14 +677,21 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 	LLVMErrorRef err;
 	const char *passes;
 
+	/*
+	 * Determine the LLVM pass pipeline to use.
+	 * For OPT3 we use the standard suite.
+	 * For lower optimization levels, we explicitly include:
+	 * - mem2reg to promote stack variables,
+	 * - simplifycfg to clean up the control flow
+	 * When the inliner flag is set, the inline pass is added. Note that
+	 * default<O0> already includes the always-inline pass.
+	 */
 	if (context->base.flags & PGJIT_OPT3)
 		passes = "default<O3>";
 	else if (context->base.flags & PGJIT_INLINE)
-		/* if doing inlining, but no expensive optimization, add inline pass */
-		passes = "default<O0>,mem2reg,inline";
+		passes = "default<O0>,mem2reg,simplifycfg,inline";
 	else
-		/* default<O0> includes always-inline pass */
-		passes = "default<O0>,mem2reg";
+		passes = "default<O0>,mem2reg,simplifycfg";
 
 	options = LLVMCreatePassBuilderOptions();
 
-- 
2.43.0

Reply via email to