Re: LLVM 16 (opaque pointers)
On Mon, Oct 23, 2023 at 10:47:24PM -0400, Tom Lane wrote: > Andres Freund writes: > > FC 29 is well out of support, so I don't think it makes sense to invest any > > further time in this. Personally, I don't think it's useful to have years > > old > > fedora in the buildfarm... > > +1. It's good to test old LTS distros, but Fedora releases have a > short shelf life by design. I'll start retiring those old Fedora ones I have. :) Regards, Mark
Re: LLVM 16 (opaque pointers)
Andres Freund writes: > FC 29 is well out of support, so I don't think it makes sense to invest any > further time in this. Personally, I don't think it's useful to have years old > fedora in the buildfarm... +1. It's good to test old LTS distros, but Fedora releases have a short shelf life by design. regards, tom lane
Re: LLVM 16 (opaque pointers)
Hi, On 2023-10-24 10:17:22 +1300, Thomas Munro wrote: > This POWER animal fails, unexpectedly to me: > > > bonito: 7.0.1 Fedora 29 > > We could try to chase that down, or we could rejoice that at least it > works on current release. It must begin working somewhere between 7 > and 11, but when I checked which LLVM releases I could easily install > on eg cascabel (if I could get access) using the repo at apt.llvm.org, > I saw that they don't even have anything older than 11. So someone > with access who wants to figure this out might have many days or weeks > of compiling ahead of them. I could reproduce the failure on bonito. The stack trace is: #0 0x7fffb83541e8 in raise () from /lib64/libc.so.6 #1 0x7fffb833448c in abort () from /lib64/libc.so.6 #2 0x7fff9c68dd78 in std::__replacement_assert (_file=, _line=, _function=, _condition=) at /usr/include/c++/8/ppc64le-redhat-linux/bits/c++config.h:447 #3 0x7fff9df90838 in std::unique_ptr >::operator* ( this=0x1b946cb8) at ../include/llvm/Support/MemAlloc.h:29 #4 llvm::OrcCBindingsStack::OrcCBindingsStack(llvm::TargetMachine&, std::function > ()>) (this=0x1b946be0, TM=..., IndirectStubsMgrBuilder=...) at ../lib/ExecutionEngine/Orc/OrcCBindingsStack.h:242 #5 0x7fff9df90940 in LLVMOrcCreateInstance (TM=0x1b933ae0) at /usr/include/c++/8/bits/move.h:182 #6 0x7fffa0618f8c in llvm_session_initialize () at /home/andres/src/postgres/src/backend/jit/llvm/llvmjit.c:981 #7 0x7fffa06179a8 in llvm_create_context (jitFlags=25) at /home/andres/src/postgres/src/backend/jit/llvm/llvmjit.c:219 #8 0x7fffa0626cbc in llvm_compile_expr (state=0x1b8ef390) at /home/andres/src/postgres/src/backend/jit/llvm/llvmjit_expr.c:142 #9 0x10a76fc8 in jit_compile_expr (state=0x1b8ef390) at /home/andres/src/postgres/src/backend/jit/jit.c:177 #10 0x10404550 in ExecReadyExpr (state=0x1b8ef390) at /home/andres/src/postgres/src/backend/executor/execExpr.c:875 with this assertion message printed: /usr/include/c++/8/bits/unique_ptr.h:328: typename std::add_lvalue_reference<_Tp>::type std::unique_ptr<_Tp, Dp>::operator*() const [with Tp = llvm::orc::JITCompileCallbackManager; _Dp = std::default_delete; typename std::add_lvalue_reference<_Tp>::type = llvm::orc::JITCompileCallbackManager&]: Assertion 'get() != pointer()' failed. I wanted to use a debug build to investigate in more detail, but bonito is a small VM. Thus I built llvm 7 on a more powerful gcc cfarm machine, running on AlmaLinux 9.2. The problem doesn't reproduce there. Given the crash in some c++ standard library code, that the fc29 patches to llvm look harmless, that building/using llvm 7 on a newer distro does not show issues on PPC, it seems likely that this is a compiler / standard library issue. FC 29 is well out of support, so I don't think it makes sense to invest any further time in this. Personally, I don't think it's useful to have years old fedora in the buildfarm... Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
On Tue, Oct 24, 2023 at 10:17:22AM +1300, Thomas Munro wrote: > On Tue, Oct 24, 2023 at 4:27 AM Mark Wong wrote: > > I haven't gotten around to disabling llvm on any of my animals with llvm > > < 7 yet. Do you still want to hold on that? > > Yes, please disable --with-llvm on s390x and POWER animals with LLVM < > 7 (see below). Also, you have a bunch of machines with LLVM 16 that > are failing to compile on REL_11_STABLE. That is expected, because we > agreed not to back-patch the LLVM 16 API changes into REL_11_STABLE: > > > kingsnake: 16.0.6 Fedora Linux 38 > > krait: CentOS 16.0.6 Stream 8 > > lancehead: CentOS 16.0.6 Stream 8 I should have updated these to not use --with-llvm for REL_11_STABLE. > These POWER machines fail as expected, and it's unfixable: > > > elasmobranch: 5.0.1 openSUSE Leap 15.0 > > demoiselle: 5.0.1 openSUSE Leap 15.0 > > cavefish: 6.0.0 Ubuntu 18.04.6 LTS These should now be updated to not use --with-llvm at all. > These s390x animals are failing, but don't show the layout complaint. > I can see that LLVM 6 also lacked a case for s390x in > llvm::orc::createLocalIndirectStubsManagerBuilder(), the thing that > was fixed in 7 with the addition of a default case. Therefore these > presumably fail just like old LLVM on POWER, and it's unfixable. So I > suggest turning off --with-llvm on these two: > > > cotinga: 6.0.0 Ubuntu 18.04.6 LTS > > perch: 6.0.0 Ubuntu 18.04.6 LTS Ok, I should have removed --with-llvm here too. > This s390x animal doesn't actually have --with-llvm enabled so it > passes, but surely it'd be just like lora: > > > mamushi: 15.0.7 Red Hat Enterprise Linux 9.2 Oops, I think I added it now. I think I made all the recommended changes, and trimmed out the lines where I didn't need to do anything. :) Andres pointed out to me that my animals aren't set up to collect core file so I'm also trying to update that too... Regards, Mark
Re: LLVM 16 (opaque pointers)
On Tue, Oct 24, 2023 at 4:27 AM Mark Wong wrote: > I haven't gotten around to disabling llvm on any of my animals with llvm > < 7 yet. Do you still want to hold on that? Yes, please disable --with-llvm on s390x and POWER animals with LLVM < 7 (see below). Also, you have a bunch of machines with LLVM 16 that are failing to compile on REL_11_STABLE. That is expected, because we agreed not to back-patch the LLVM 16 API changes into REL_11_STABLE: > kingsnake: 16.0.6 Fedora Linux 38 > krait: CentOS 16.0.6 Stream 8 > lancehead: CentOS 16.0.6 Stream 8 These POWER machines fail as expected, and it's unfixable: > elasmobranch: 5.0.1 openSUSE Leap 15.0 > demoiselle: 5.0.1 openSUSE Leap 15.0 > cavefish: 6.0.0 Ubuntu 18.04.6 LTS (Well, we could in theory supply our own fixed llvm::orc::createLocalIndirectStubsManagerBuilder() function to hide the broken one in LLVM <= 6, but that way lies madness IMHO. An LTS distro that cares could look into back-patching LLVM's fixes, but for us, let us focus on current software.) This POWER animal fails, unexpectedly to me: > bonito: 7.0.1 Fedora 29 We could try to chase that down, or we could rejoice that at least it works on current release. It must begin working somewhere between 7 and 11, but when I checked which LLVM releases I could easily install on eg cascabel (if I could get access) using the repo at apt.llvm.org, I saw that they don't even have anything older than 11. So someone with access who wants to figure this out might have many days or weeks of compiling ahead of them. These POWER animals are passing, as expected: > cascabel: 11.0.1 Debian GNU/Linux 11 > babbler: 15.0.7 AlmaLinux 8.8 > pytilia: 15.0.7 AlmaLinux 8.8 > nicator: 15.0.7 AlmaLinux 9.2 > twinspot: 15.0.7 AlmaLinux 9.2 > habu: 16.0.6 Fedora Linux 38 > kingsnake: 16.0.6 Fedora Linux 38 > krait: CentOS 16.0.6 Stream 8 > lancehead: CentOS 16.0.6 Stream 8 These s390x animals are passing: > branta: 10.0.0 Ubuntu 20.04.4 LTS > pike: 11.0.1 Debian GNU/Linux 11 > rinkhals: 11.0.1 Debian GNU/Linux 11 > sarus: 14.0.0 Ubuntu 22.04.1 LTS These s390x animals are failing, but don't show the layout complaint. I can see that LLVM 6 also lacked a case for s390x in llvm::orc::createLocalIndirectStubsManagerBuilder(), the thing that was fixed in 7 with the addition of a default case. Therefore these presumably fail just like old LLVM on POWER, and it's unfixable. So I suggest turning off --with-llvm on these two: > cotinga: 6.0.0 Ubuntu 18.04.6 LTS > perch: 6.0.0 Ubuntu 18.04.6 LTS These s390x animals are failing with the mismatched layout problem, which should be fixed by Andres's patch to tolerate the changing z12/z13 ABI thing by falling back to whatever clang picked (at a cost of not using all the features of your newer CPU, unless you explicitly tell clang to target it): > aracari: 15.0.7 Red Hat Enterprise Linux 8.6 > pipit: 15.0.7 Red Hat Enterprise Linux 8.6 > lora: 15.0.7 Red Hat Enterprise Linux 9.2 This s390x animal doesn't actually have --with-llvm enabled so it passes, but surely it'd be just like lora: > mamushi: 15.0.7 Red Hat Enterprise Linux 9.2
Re: LLVM 16 (opaque pointers)
On Mon, Oct 23, 2023 at 01:15:04PM +1300, Thomas Munro wrote: > On Sun, Oct 22, 2023 at 2:44 PM Thomas Munro wrote: > > On Sat, Oct 21, 2023 at 2:45 PM Tom Lane wrote: > > > Thomas Munro writes: > > > > (It'd be nice if the > > > > build farm logged "$LLVM_CONFIG --version" somewhere.) > > > > > > It's not really the buildfarm script's responsibility to do that, > > > but feel free to make configure do so. > > > > Done, copying the example of how we do it for perl and various other things. > > Build farm measles update: > > With that we can see that nicator (LLVM 15 on POWER) is green. We can > see that cavefish (LLVM 6 on POWER) is red as expected. We can also > see that bonito (LLVM 7 on POWER) is red, so my earlier theory that > this might be due to the known bug we got fixed in LLVM 7 is not > enough. Maybe there are other things fixed on POWER somewhere between > those LLVM versions? I suspect it'll be hard to figure out without > debug builds and backtraces. I haven't gotten around to disabling llvm on any of my animals with llvm < 7 yet. Do you still want to hold on that? > One thing is definitely our fault, though. xenodermus shows failures > on REL_12_STABLE and REL_13_STABLE, using debug LLVM 6 on x86. I > couldn't reproduce this locally on (newer) debug LLVM, so I bugged > Andres for access to the host/libraries and chased it down. There is > some type punning for a function parameter REL_13_STABLE and earlier, > removed by Andres in REL_14_STABLE, and when I back-patched my > refactoring I effectively back-patched a few changes from his commit > df99ddc70b97 that removed the type punning, but I should have brought > one more line from that commit to remove another trace of it. See > attached. Here are my list of llvm-config versions and distros for s390x and POWER (didn't see any issues on aarch64 but I grabbed all the info at the same time.) s390x: branta: 10.0.0 Ubuntu 20.04.4 LTS cotinga: 6.0.0 Ubuntu 18.04.6 LTS perch: 6.0.0 Ubuntu 18.04.6 LTS sarus: 14.0.0 Ubuntu 22.04.1 LTS aracari: 15.0.7 Red Hat Enterprise Linux 8.6 pipit: 15.0.7 Red Hat Enterprise Linux 8.6 lora: 15.0.7 Red Hat Enterprise Linux 9.2 mamushi: 15.0.7 Red Hat Enterprise Linux 9.2 pike: 11.0.1 Debian GNU/Linux 11 rinkhals: 11.0.1 Debian GNU/Linux 11 POWER: bonito: 7.0.1 Fedora 29 cavefish: 6.0.0 Ubuntu 18.04.6 LTS demoiselle: 5.0.1 openSUSE Leap 15.0 elasmobranch: 5.0.1 openSUSE Leap 15.0 babbler: 15.0.7 AlmaLinux 8.8 pytilia: 15.0.7 AlmaLinux 8.8 nicator: 15.0.7 AlmaLinux 9.2 twinspot: 15.0.7 AlmaLinux 9.2 cascabel: 11.0.1 Debian GNU/Linux 11 habu: 16.0.6 Fedora Linux 38 kingsnake: 16.0.6 Fedora Linux 38 krait: CentOS 16.0.6 Stream 8 lancehead: CentOS 16.0.6 Stream 8 aarch64: boiga: 14.0.5 Fedora Linux 36 corzo: 14.0.5 Fedora Linux 36 desman: 16.0.6 Fedora Linux 38 motmot: 16.0.6 Fedora Linux 38 whinchat: 11.0.1 Debian GNU/Linux 11 jackdaw: 11.0.1 Debian GNU/Linux 11 blackneck: 7.0.1 Debian GNU/Linux 10 alimoche: 7.0.1 Debian GNU/Linux 10 bulbul: 15.0.7 AlmaLinux 8.8 broadbill: 15.0.7 AlmaLinux 8.8 oystercatcher: 15.0.7 AlmaLinux 9.2 potoo: 15.0.7 AlmaLinux 9.2 whiting: 6.0.0 Ubuntu 18.04.5 LTS vimba: 6.0.0 Ubuntu 18.04.5 LTS splitfin: 10.0.0 Ubuntu 20.04.6 LTS rudd: 10.0.0 Ubuntu 20.04.6 LTS turbot: 14.0.0 Ubuntu 22.04.3 LTS shiner: 14.0.0 Ubuntu 22.04.3 LTS ziege: 16.0.6 CentOS Stream 8 chevrotain: 11.0.1 Debian GNU/Linux 11 Regards, Mark
Re: LLVM 16 (opaque pointers)
On Sun, Oct 22, 2023 at 2:44 PM Thomas Munro wrote: > On Sat, Oct 21, 2023 at 2:45 PM Tom Lane wrote: > > Thomas Munro writes: > > > (It'd be nice if the > > > build farm logged "$LLVM_CONFIG --version" somewhere.) > > > > It's not really the buildfarm script's responsibility to do that, > > but feel free to make configure do so. > > Done, copying the example of how we do it for perl and various other things. Build farm measles update: With that we can see that nicator (LLVM 15 on POWER) is green. We can see that cavefish (LLVM 6 on POWER) is red as expected. We can also see that bonito (LLVM 7 on POWER) is red, so my earlier theory that this might be due to the known bug we got fixed in LLVM 7 is not enough. Maybe there are other things fixed on POWER somewhere between those LLVM versions? I suspect it'll be hard to figure out without debug builds and backtraces. One thing is definitely our fault, though. xenodermus shows failures on REL_12_STABLE and REL_13_STABLE, using debug LLVM 6 on x86. I couldn't reproduce this locally on (newer) debug LLVM, so I bugged Andres for access to the host/libraries and chased it down. There is some type punning for a function parameter REL_13_STABLE and earlier, removed by Andres in REL_14_STABLE, and when I back-patched my refactoring I effectively back-patched a few changes from his commit df99ddc70b97 that removed the type punning, but I should have brought one more line from that commit to remove another trace of it. See attached. From 52397f8c70641080f2487ee5f019f143dd35957c Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sun, 22 Oct 2023 23:20:56 + Subject: [PATCH] jit: Fix LLVM back-patching bug in 12 and 13. While back-patching f90b4a84, I missed that branches before 14 did some type punning in a function parameter. That didn't cause any problem for release builds of LLVM, but debug builds of some older versions would fail a type cross-check assertion. To fix this, we need to back-patch a line of df99ddc7. Per build farm animal xenodermus, which runs a debug build of LLVM 6 with jit_above_cost=0. diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c index d84fbba7cc..c2e367f00d 100644 --- a/src/backend/jit/llvm/llvmjit_expr.c +++ b/src/backend/jit/llvm/llvmjit_expr.c @@ -1126,7 +1126,7 @@ llvm_compile_expr(ExprState *state) llvm_pg_var_type("TypeExecEvalSubroutine")); v_params[0] = v_state; - v_params[1] = l_ptr_const(op, l_ptr(TypeSizeT)); + v_params[1] = l_ptr_const(op, l_ptr(StructExprEvalStep)); v_params[2] = v_econtext; l_call(b, LLVMGetFunctionType(ExecEvalSubroutineTemplate), -- 2.42.0
Re: LLVM 16 (opaque pointers)
On Sat, Oct 21, 2023 at 2:45 PM Tom Lane wrote: > Thomas Munro writes: > > (It'd be nice if the > > build farm logged "$LLVM_CONFIG --version" somewhere.) > > It's not really the buildfarm script's responsibility to do that, > but feel free to make configure do so. Done, copying the example of how we do it for perl and various other things.
Re: LLVM 16 (opaque pointers)
On Sat, Oct 21, 2023 at 7:08 PM Andres Freund wrote: > I've attached a patch revision that I spent the last couple hours working > on. It's very very roughly based on a patch Tom Stellard had written (which I > think a few rpm packages use). But instead of encoding details about specific > layout details, I made the code check if the data layout works and fall back > to the cpu / features used for llvmjit_types.bc. This way it's not s390x > specific, future odd architecture behaviour would "automatically" be handled > the same The explanation makes sense and this seems like a solid plan to deal with it. I didn't try on a s390x, but I tested locally on our master branch with LLVM 7, 10, 17, 18, and then I hacked your patch to take the fallback path as if a layout mismatch had been detected, and it worked fine: 2023-10-22 11:49:55.663 NZDT [12000] DEBUG: detected CPU "skylake", with features "...", resulting in layout "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" 2023-10-22 11:49:55.664 NZDT [12000] DEBUG: detected CPU / features yield incompatible data layout, using values from module instead 2023-10-22 11:49:55.664 NZDT [12000] DETAIL: module CPU "x86-64", features "...", resulting in layout "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" +To deal with that, check if data layouts match during JIT initialization. If +the runtime detected cpu / features result in a different layout, try if the +cpu/features recorded in in llvmjit_types.bc work. s|try |check | s| in in | in | + errmsg_internal("could not determine working CPU / feature comination for JIT compilation"), s|comination|combination| s| / |/|g
Re: LLVM 16 (opaque pointers)
Hi, On 2023-10-19 06:20:26 +1300, Thomas Munro wrote: > Interestingly, a new problem just showed up on the the RHEL9 s390x > machine "lora", where a previously reported problem [1] apparently > re-appeared. It complains about incompatible layout, previously > blamed on mismatch between clang and LLVM versions. I've attached a patch revision that I spent the last couple hours working on. It's very very roughly based on a patch Tom Stellard had written (which I think a few rpm packages use). But instead of encoding details about specific layout details, I made the code check if the data layout works and fall back to the cpu / features used for llvmjit_types.bc. This way it's not s390x specific, future odd architecture behaviour would "automatically" be handled the same. With that at least the main regression tests pass on s390x, even with jit_above_cost=0. > I can see that its clang is v15 from clues in the conflig log, but I don't > know which version of LLVM is being used. However, I see now that > --with-llvm was literally just turned on, so there is no reason to think > that this would have worked before or this work is relevant. Strange though > -- we must be able to JIT further than that on s390x because we have crash > reports in other threads (ie we made it past this and into other more > advanced brokenness). You can avoid the borkedness by a) running on an older cpu b) adding compilation flags to change the code generation target (e.g. -march=native). And some RPM packages have applied the patch by Tom Stellard. > [1] > https://www.postgresql.org/message-id/flat/20210319190047.7o4bwhbp5dzkqif3%40alap3.anarazel.de#ec51b488ca8eac8c603d91c0439d38b2 Greetings, Andres Freund >From e7ab9eb11576ef85e4d2f6e1ade0a6028279634d Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 20 Oct 2023 23:03:53 -0700 Subject: [PATCH v2] jit: Add fallback in case of runtime/compile time ABI mismatch LLVM's s390x target uses a different ABI (called data layout in LLVM) for z13 and newer processors. If IR files (like llvmjit_types.bc) are compiled to target a processor older than z13, which is the default, and JIT occurs on a z13 or newer processor, the ABI mismatch will cause JIT to fail at runtime. To deal with that, check if data layouts match during JIT initialization. If the runtime detected cpu / features result in a different layout, try if the cpu/features recorded in in llvmjit_types.bc work. Author: Andres Freund Author: Tom Stellard (in an older version) Discussion: 16971-5d004d34742a3...@postgresql.org Discussion: ca+hukg+hoperbgvws4je3uy1uo3trnhf08hmispmrpodgti...@mail.gmail.com --- src/backend/jit/llvm/llvmjit.c | 191 - 1 file changed, 166 insertions(+), 25 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 58f638859a4..fa2a982991d 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -101,6 +101,8 @@ static size_t llvm_jit_context_in_use_count = 0; static size_t llvm_llvm_context_reuse_count = 0; static const char *llvm_triple = NULL; static const char *llvm_layout = NULL; +static char *llvm_cpu = NULL; +static char *llvm_cpu_features = NULL; static LLVMContextRef llvm_context; @@ -123,6 +125,7 @@ static void llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module); static void llvm_create_types(void); static void llvm_set_target(void); +static void llvm_set_cpu_and_features(void); static void llvm_recreate_llvm_context(void); static uint64_t llvm_resolve_symbol(const char *name, void *ctx); @@ -884,9 +887,6 @@ static void llvm_session_initialize(void) { MemoryContext oldcontext; - char *error = NULL; - char *cpu = NULL; - char *features = NULL; LLVMTargetMachineRef opt0_tm; LLVMTargetMachineRef opt3_tm; @@ -931,38 +931,21 @@ llvm_session_initialize(void) */ llvm_set_target(); - if (LLVMGetTargetFromTriple(llvm_triple, _targetref, ) != 0) - { - elog(FATAL, "failed to query triple %s", error); - } - - /* - * We want the generated code to use all available features. Therefore - * grab the host CPU string and detect features of the current CPU. The - * latter is needed because some CPU architectures default to enabling - * features not all CPUs have (weird, huh). - */ - cpu = LLVMGetHostCPUName(); - features = LLVMGetHostCPUFeatures(); - elog(DEBUG2, "LLVMJIT detected CPU \"%s\", with features \"%s\"", - cpu, features); + llvm_set_cpu_and_features(); opt0_tm = - LLVMCreateTargetMachine(llvm_targetref, llvm_triple, cpu, features, + LLVMCreateTargetMachine(llvm_targetref, llvm_triple, +llvm_cpu, llvm_cpu_features, LLVMCodeGenLevelNone, LLVMRelocDefault, LLVMCodeModelJITDefault); opt3_tm = - LLVMCreateTargetMachine(llvm_targetref, llvm_triple, cpu, features, + LLVMCreateTargetMachine(llvm_targetref, llvm_triple, +llvm_cpu, llvm_cpu_features,
Re: LLVM 16 (opaque pointers)
Thomas Munro writes: > (It'd be nice if the > build farm logged "$LLVM_CONFIG --version" somewhere.) It's not really the buildfarm script's responsibility to do that, but feel free to make configure do so. regards, tom lane
Re: LLVM 16 (opaque pointers)
On Sat, Oct 21, 2023 at 12:02 PM Thomas Munro wrote: > On Sat, Oct 21, 2023 at 11:12 AM Andres Freund wrote: > > I'm quite sure that jiting did pass on ppc64 at some point. > > Yeah, I remember debugging it on EDB's POWER machine. First off, we > know that LLVM < 7 doesn't work for us on POWER, because: > > https://www.postgresql.org/message-id/CAEepm%3D39F_B3Ou8S3OrUw%2BhJEUP3p%3DwCu0ug-TTW67qKN53g3w%40mail.gmail.com > > That was fixed: > > https://github.com/llvm/llvm-project/commit/a95b0df5eddbe7fa1e9f8fe0b1ff62427e1c0318 > > So I think that means that we'd first have to go through those animals > and figure out which ones have older LLVM, and ignore those results -- > they just can't use --with-llvm. Unfortunately there doesn't seem to > be any clue on the version from the paths used by OpenSUSE. Mark, do > you know? Adding Mark to this subthread. Concretely, could you please disable --with-llvm on any of those machines running LLVM < 7? And report what version any remaining animals are running? (It'd be nice if the build farm logged "$LLVM_CONFIG --version" somewhere.) One of them seems to have clang 5 which is a clue -- if the LLVM is also 5 it's just not going to work, as LLVM is one of those forwards-only projects that doesn't back-patch fixes like that.
Re: LLVM 16 (opaque pointers)
Hi, On 2023-10-21 12:02:51 +1300, Thomas Munro wrote: > On Sat, Oct 21, 2023 at 11:12 AM Andres Freund wrote: > > > I doubt I can get anywhere near an s390x though, and we definitely had > > > pre-existing problems on that arch. > > > > IMO an LLVM bug, rather than a postgres bug, but I guess it's all a matter > > of > > perspective. > > https://github.com/llvm/llvm-project/issues/53009#issuecomment-1042748553 > > Ah, good to know about that. But there are also reports of crashes in > released versions that manage to get passed that ABI-wobble business: > > https://www.postgresql.org/message-id/flat/CAF1DzPXjpPxnsgySz2Zjm8d2dx7%3DJ070C%2BMQBFh%2B9NBNcBKCAg%40mail.gmail.com Trying to debug that now, using access to an s390x box provided by Mark... Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
On Sat, Oct 21, 2023 at 11:12 AM Andres Freund wrote: > On 2023-10-21 10:48:47 +1300, Thomas Munro wrote: > > On Thu, Oct 19, 2023 at 6:20 AM Thomas Munro wrote: > > I see that Mark has also just enabled --with-llvm on some POWER Linux > > animals, and they have failed in various ways. The failures are > > strangely lacking in detail. It seems we didn't have coverage before, > > and I recall that there were definitely versions of LLVM that *didn't* > > work for our usage in the past, which I'll need to dredge out of the > > archives. I will try to get onto a cfarm POWER machine and see if I > > can reproduce that, before and after these commits, and whose bug is > > it etc. > > I'm quite sure that jiting did pass on ppc64 at some point. Yeah, I remember debugging it on EDB's POWER machine. First off, we know that LLVM < 7 doesn't work for us on POWER, because: https://www.postgresql.org/message-id/CAEepm%3D39F_B3Ou8S3OrUw%2BhJEUP3p%3DwCu0ug-TTW67qKN53g3w%40mail.gmail.com That was fixed: https://github.com/llvm/llvm-project/commit/a95b0df5eddbe7fa1e9f8fe0b1ff62427e1c0318 So I think that means that we'd first have to go through those animals and figure out which ones have older LLVM, and ignore those results -- they just can't use --with-llvm. Unfortunately there doesn't seem to be any clue on the version from the paths used by OpenSUSE. Mark, do you know? > > I doubt I can get anywhere near an s390x though, and we definitely had > > pre-existing problems on that arch. > > IMO an LLVM bug, rather than a postgres bug, but I guess it's all a matter of > perspective. > https://github.com/llvm/llvm-project/issues/53009#issuecomment-1042748553 Ah, good to know about that. But there are also reports of crashes in released versions that manage to get passed that ABI-wobble business: https://www.postgresql.org/message-id/flat/CAF1DzPXjpPxnsgySz2Zjm8d2dx7%3DJ070C%2BMQBFh%2B9NBNcBKCAg%40mail.gmail.com
Re: LLVM 16 (opaque pointers)
Hi, On 2023-10-21 10:48:47 +1300, Thomas Munro wrote: > On Thu, Oct 19, 2023 at 6:20 AM Thomas Munro wrote: > > Interestingly, a new problem just showed up on the the RHEL9 s390x > > machine "lora", where a previously reported problem [1] apparently > > re-appeared. It complains about incompatible layout, previously > > blamed on mismatch between clang and LLVM versions. I can see that > > its clang is v15 from clues in the conflig log, but I don't know which > > version of LLVM is being used. However, I see now that --with-llvm > > was literally just turned on, so there is no reason to think that this > > would have worked before or this work is relevant. Strange though -- > > we must be able to JIT further than that on s390x because we have > > crash reports in other threads (ie we made it past this and into other > > more advanced brokenness). > > I see that Mark has also just enabled --with-llvm on some POWER Linux > animals, and they have failed in various ways. The failures are > strangely lacking in detail. It seems we didn't have coverage before, > and I recall that there were definitely versions of LLVM that *didn't* > work for our usage in the past, which I'll need to dredge out of the > archives. I will try to get onto a cfarm POWER machine and see if I > can reproduce that, before and after these commits, and whose bug is > it etc. I'm quite sure that jiting did pass on ppc64 at some point. > I doubt I can get anywhere near an s390x though, and we definitely had > pre-existing problems on that arch. IMO an LLVM bug, rather than a postgres bug, but I guess it's all a matter of perspective. https://github.com/llvm/llvm-project/issues/53009#issuecomment-1042748553 I had made another bug report about this issue at some point, but I can't refind it right now. Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
Thomas Munro writes: > I doubt I can get anywhere near an s390x though, and we definitely had > pre-existing problems on that arch. Yeah. Too bad there's no s390x in the gcc compile farm. (I'm wondering how straight a line to draw between that fact and llvm's evident shortcomings on s390x.) I'm missing my old access to Red Hat's dev machines. But in the meantime, Mark's clearly got beaucoup access to s390 machines, so I wonder if he can let you into any of them? regards, tom lane
Re: LLVM 16 (opaque pointers)
On Sat, Oct 21, 2023 at 10:48:47AM +1300, Thomas Munro wrote: > On Thu, Oct 19, 2023 at 6:20 AM Thomas Munro wrote: > > Interestingly, a new problem just showed up on the the RHEL9 s390x > > machine "lora", where a previously reported problem [1] apparently > > re-appeared. It complains about incompatible layout, previously > > blamed on mismatch between clang and LLVM versions. I can see that > > its clang is v15 from clues in the conflig log, but I don't know which > > version of LLVM is being used. However, I see now that --with-llvm > > was literally just turned on, so there is no reason to think that this > > would have worked before or this work is relevant. Strange though -- > > we must be able to JIT further than that on s390x because we have > > crash reports in other threads (ie we made it past this and into other > > more advanced brokenness). > > I see that Mark has also just enabled --with-llvm on some POWER Linux > animals, and they have failed in various ways. The failures are > strangely lacking in detail. It seems we didn't have coverage before, > and I recall that there were definitely versions of LLVM that *didn't* > work for our usage in the past, which I'll need to dredge out of the > archives. I will try to get onto a cfarm POWER machine and see if I > can reproduce that, before and after these commits, and whose bug is > it etc. Yeah, I'm slowing enabling --with-llvm on POWER, s390x, and aarch64 (but none here yet as I write this)... > I doubt I can get anywhere near an s390x though, and we definitely had > pre-existing problems on that arch. If you want to send me your ssh key, I have access to these systems through OSUOSL and LinuxFoundation programs. Regards, Mark -- Mark Wong EDB https://enterprisedb.com
Re: LLVM 16 (opaque pointers)
On Thu, Oct 19, 2023 at 6:20 AM Thomas Munro wrote: > Interestingly, a new problem just showed up on the the RHEL9 s390x > machine "lora", where a previously reported problem [1] apparently > re-appeared. It complains about incompatible layout, previously > blamed on mismatch between clang and LLVM versions. I can see that > its clang is v15 from clues in the conflig log, but I don't know which > version of LLVM is being used. However, I see now that --with-llvm > was literally just turned on, so there is no reason to think that this > would have worked before or this work is relevant. Strange though -- > we must be able to JIT further than that on s390x because we have > crash reports in other threads (ie we made it past this and into other > more advanced brokenness). I see that Mark has also just enabled --with-llvm on some POWER Linux animals, and they have failed in various ways. The failures are strangely lacking in detail. It seems we didn't have coverage before, and I recall that there were definitely versions of LLVM that *didn't* work for our usage in the past, which I'll need to dredge out of the archives. I will try to get onto a cfarm POWER machine and see if I can reproduce that, before and after these commits, and whose bug is it etc. I doubt I can get anywhere near an s390x though, and we definitely had pre-existing problems on that arch.
Re: LLVM 16 (opaque pointers)
Hi, On Thu, 2023-10-19 at 06:20 +1300, Thomas Munro wrote: > Pushed, after digging up some old LLVM skeletons to test, and those > "old LLVM" animals are turning green now. I went ahead and pushed the > much smaller and simpler patch for LLVM 17. Thank you! I can confirm that RPMs built fine on Fedora 39 with those patches, which ships LLVM 17.0.2 as of today. I can also confirm that builds are not broken on RHEL 9 and 8 and Fedora 37 which ship LLVM 15, and Fedora 38 (LLVM 16). Thanks again! Regards, -- Devrim Gündüz Open Source Solution Architect, PostgreSQL Major Contributor Twitter: @DevrimGunduz , @DevrimGunduzTR
Re: LLVM 16 (opaque pointers)
Hi, On 2023-10-19 06:20:26 +1300, Thomas Munro wrote: > On Thu, Oct 19, 2023 at 1:06 AM Thomas Munro wrote: > > On Thu, Oct 19, 2023 at 12:02 AM Thomas Munro > > wrote: > > > I pushed the first patch, for LLVM 16, and the build farm told me that > > > some old LLVM versions don't like it. The problem seems to be the > > > function LLVMGlobalGetValueType(). I can see that that was only added > > > to the C API in 2018, so it looks like I may need to back-port that > > > (trivial) wrapper into our own llvmjit_wrap.cpp for LLVM < 8. > > > > Concretely something like the attached should probably fix it, but > > it'll take me a little while to confirm that... > > Pushed, after digging up some old LLVM skeletons to test, and those > "old LLVM" animals are turning green now. I went ahead and pushed the > much smaller and simpler patch for LLVM 17. I enabled a new set of buildfarm animals to test LLVM 16 and 17. I initially forgot to disable them for 11, which means we'll have those failed build on 11 until they age out :/. Particularly for the LLVM debug builds it'll take a fair bit to run on all branches. Each branch takes about 3h. Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
On Thu, Oct 19, 2023 at 1:06 AM Thomas Munro wrote: > On Thu, Oct 19, 2023 at 12:02 AM Thomas Munro wrote: > > I pushed the first patch, for LLVM 16, and the build farm told me that > > some old LLVM versions don't like it. The problem seems to be the > > function LLVMGlobalGetValueType(). I can see that that was only added > > to the C API in 2018, so it looks like I may need to back-port that > > (trivial) wrapper into our own llvmjit_wrap.cpp for LLVM < 8. > > Concretely something like the attached should probably fix it, but > it'll take me a little while to confirm that... Pushed, after digging up some old LLVM skeletons to test, and those "old LLVM" animals are turning green now. I went ahead and pushed the much smaller and simpler patch for LLVM 17. Interestingly, a new problem just showed up on the the RHEL9 s390x machine "lora", where a previously reported problem [1] apparently re-appeared. It complains about incompatible layout, previously blamed on mismatch between clang and LLVM versions. I can see that its clang is v15 from clues in the conflig log, but I don't know which version of LLVM is being used. However, I see now that --with-llvm was literally just turned on, so there is no reason to think that this would have worked before or this work is relevant. Strange though -- we must be able to JIT further than that on s390x because we have crash reports in other threads (ie we made it past this and into other more advanced brokenness). [1] https://www.postgresql.org/message-id/flat/20210319190047.7o4bwhbp5dzkqif3%40alap3.anarazel.de#ec51b488ca8eac8c603d91c0439d38b2
Re: LLVM 16 (opaque pointers)
On Thu, Oct 19, 2023 at 12:02 AM Thomas Munro wrote: > I pushed the first patch, for LLVM 16, and the build farm told me that > some old LLVM versions don't like it. The problem seems to be the > function LLVMGlobalGetValueType(). I can see that that was only added > to the C API in 2018, so it looks like I may need to back-port that > (trivial) wrapper into our own llvmjit_wrap.cpp for LLVM < 8. Concretely something like the attached should probably fix it, but it'll take me a little while to confirm that... 0001-Supply-missing-LLVMGlobalGetValueType-for-LLVM-8.patch Description: Binary data
Re: LLVM 16 (opaque pointers)
I pushed the first patch, for LLVM 16, and the build farm told me that some old LLVM versions don't like it. The problem seems to be the function LLVMGlobalGetValueType(). I can see that that was only added to the C API in 2018, so it looks like I may need to back-port that (trivial) wrapper into our own llvmjit_wrap.cpp for LLVM < 8.
Re: LLVM 16 (opaque pointers)
Le vendredi 13 octobre 2023, 22:32:13 CEST Thomas Munro a écrit : > On Wed, Oct 11, 2023 at 10:31 PM Ronan Dunklau wrote: > > Le mercredi 11 octobre 2023, 10:59:50 CEST Thomas Munro a écrit : > > > The back-patch to 12 was a little trickier than anticipated, but after > > > taking a break and trying again I now have PG 12...17 patches that > > > I've tested against LLVM 10...18 (that's 54 combinations), in every > > > case only with the clang corresponding to LLVM. > > > > Thank you Thomas for those patches, and the extensive testing, I will run > > my own and let you know. > > Thanks! No news is good news, I hope? I'm hoping to commit this today. > > > > I've attached only the patches for master, but the 12-16 versions are > > > available at https://github.com/macdice/postgres/tree/llvm16-$N in > > > case anyone has comments on those. > > > > For PG13 and PG12, it looks like the ExecEvalBoolSubroutineTemplate is not > > used anywhere, as ExecEvalBoolSubroutine was introduced in PG14 if I'm not > > mistaken. > > Right, looks like I can remove that in those branches. Oh sorry I thought I followed up. I ran the same stress testing involving several hours of sqlsmith with all jit costs set to zero and didn't notice anything with LLVM16. Thank you ! -- Ronan Dunklau
Re: LLVM 16 (opaque pointers)
On Sat, Oct 14, 2023 at 3:56 AM Andres Freund wrote: > On 2023-10-13 16:44:13 +0200, Dmitry Dolgov wrote: > > Here is what I had in mind (only this part in the second patch was changed). > > Makes sense to me. I think we'll likely eventually want to use a custom > pipeline anyway, and I think we should consider using an optimization level > inbetween "not at all" "as hard as possible"... Thanks Dmitry and Andres. I'm planning to commit these today if there are no further comments.
Re: LLVM 16 (opaque pointers)
On Wed, Oct 11, 2023 at 10:31 PM Ronan Dunklau wrote: > Le mercredi 11 octobre 2023, 10:59:50 CEST Thomas Munro a écrit : > > The back-patch to 12 was a little trickier than anticipated, but after > > taking a break and trying again I now have PG 12...17 patches that > > I've tested against LLVM 10...18 (that's 54 combinations), in every > > case only with the clang corresponding to LLVM. > > Thank you Thomas for those patches, and the extensive testing, I will run my > own and let you know. Thanks! No news is good news, I hope? I'm hoping to commit this today. > > I've attached only the patches for master, but the 12-16 versions are > > available at https://github.com/macdice/postgres/tree/llvm16-$N in > > case anyone has comments on those. > > For PG13 and PG12, it looks like the ExecEvalBoolSubroutineTemplate is not > used anywhere, as ExecEvalBoolSubroutine was introduced in PG14 if I'm not > mistaken. Right, looks like I can remove that in those branches.
Re: LLVM 16 (opaque pointers)
On 2023-10-13 16:44:13 +0200, Dmitry Dolgov wrote: > Here is what I had in mind (only this part in the second patch was changed). Makes sense to me. I think we'll likely eventually want to use a custom pipeline anyway, and I think we should consider using an optimization level inbetween "not at all" "as hard as possible"... Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
Hi, On 2023-10-13 11:06:21 +0200, Dmitry Dolgov wrote: > > On Thu, Oct 12, 2023 at 04:31:20PM -0700, Andres Freund wrote: > > I also don't think we should add the mem2reg pass outside of -O0 - running > > it > > after a real optimization pipeline doesn't seem useful and might even make > > the > > code worse? mem2reg is included in default (and obviously also in O3). > > My understanding was that while mem2reg is included everywhere above > -O0, this set of passes won't hurt. But yeah, if you say it could > degrade the final result, it's better to not do this. I'll update this > part. It's indeed included anywhere above that, but adding it explicitly to the schedule means it's excuted twice: echo 'int foo(int a) { return a / 343; }' | clang-16 -emit-llvm -x c -c -o - -S -|sed -e 's/optnone//'|opt-17 -debug-pass-manager -passes='default,mem2reg' -o /dev/null 2>&1|grep Promote Running pass: PromotePass on foo (2 instructions) Running pass: PromotePass on foo (2 instructions) The second one is in a point in the pipeline where it doesn't help. It also requires another analysis pass to be executed unnecessarily. Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
> On Fri, Oct 13, 2023 at 11:06:21AM +0200, Dmitry Dolgov wrote: > > On Thu, Oct 12, 2023 at 04:31:20PM -0700, Andres Freund wrote: > > I don't think the "function(no-op-function),no-op-module" bit does something > > particularly useful? > > Right, looks like leftovers after verifying which passes were actually > applied. My bad, could be removed. > > > I also don't think we should add the mem2reg pass outside of -O0 - running > > it > > after a real optimization pipeline doesn't seem useful and might even make > > the > > code worse? mem2reg is included in default (and obviously also in O3). > > My understanding was that while mem2reg is included everywhere above > -O0, this set of passes won't hurt. But yeah, if you say it could > degrade the final result, it's better to not do this. I'll update this > part. Here is what I had in mind (only this part in the second patch was changed). >From 040121be6d150e8f18f154a64b409baef2b15ffb Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 9 May 2022 08:27:47 +1200 Subject: [PATCH v4 1/2] jit: Support opaque pointers in LLVM 16. Remove use of LLVMGetElementType() and provide the type of all pointers to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM versions[1]. * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions. * For LLVM == 15, we'll continue to do the same, explicitly opting out of opaque pointer mode. * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take the extra type argument. The difference is hidden behind some new IR emitting wrapper functions l_load(), l_gep(), l_call() etc. The change is mostly mechanical, except that at each site the correct type had to be provided. In some places we needed to do some extra work to get functions types, including some new wrappers for C++ APIs that are not yet exposed by in LLVM's C API, and some new "example" functions in llvmjit_types.c because it's no longer possible to start from the function pointer type and ask for the function type. Back-patch to 12, because it's a little tricker in 11 and we agreed not to put the latest LLVM support into the upcoming final release of 11. [1] https://llvm.org/docs/OpaquePointers.html Reviewed-by: Dmitry Dolgov <9erthali...@gmail.com> Reviewed-by: Ronan Dunklau Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com --- src/backend/jit/llvm/llvmjit.c| 59 ++-- src/backend/jit/llvm/llvmjit_deform.c | 119 src/backend/jit/llvm/llvmjit_expr.c | 401 -- src/backend/jit/llvm/llvmjit_types.c | 39 ++- src/backend/jit/llvm/llvmjit_wrap.cpp | 12 + src/backend/jit/llvm/meson.build | 2 +- src/include/jit/llvmjit.h | 7 + src/include/jit/llvmjit_emit.h| 106 +-- 8 files changed, 481 insertions(+), 264 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 4dfaf797432..1c8e1ab3a76 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -64,8 +64,10 @@ LLVMTypeRef TypeStorageBool; LLVMTypeRef TypePGFunction; LLVMTypeRef StructNullableDatum; LLVMTypeRef StructHeapTupleData; +LLVMTypeRef StructMinimalTupleData; LLVMTypeRef StructTupleDescData; LLVMTypeRef StructTupleTableSlot; +LLVMTypeRef StructHeapTupleHeaderData; LLVMTypeRef StructHeapTupleTableSlot; LLVMTypeRef StructMinimalTupleTableSlot; LLVMTypeRef StructMemoryContextData; @@ -76,8 +78,11 @@ LLVMTypeRef StructExprState; LLVMTypeRef StructAggState; LLVMTypeRef StructAggStatePerGroupData; LLVMTypeRef StructAggStatePerTransData; +LLVMTypeRef StructPlanState; LLVMValueRef AttributeTemplate; +LLVMValueRef ExecEvalSubroutineTemplate; +LLVMValueRef ExecEvalBoolSubroutineTemplate; static LLVMModuleRef llvm_types_module = NULL; @@ -451,11 +456,7 @@ llvm_pg_var_type(const char *varname) if (!v_srcvar) elog(ERROR, "variable %s not in llvmjit_types.c", varname); - /* look at the contained type */ - typ = LLVMTypeOf(v_srcvar); - Assert(typ != NULL && LLVMGetTypeKind(typ) == LLVMPointerTypeKind); - typ = LLVMGetElementType(typ); - Assert(typ != NULL); + typ = LLVMGlobalGetValueType(v_srcvar); return typ; } @@ -467,12 +468,14 @@ llvm_pg_var_type(const char *varname) LLVMTypeRef llvm_pg_var_func_type(const char *varname) { - LLVMTypeRef typ = llvm_pg_var_type(varname); + LLVMValueRef v_srcvar; + LLVMTypeRef typ; + + v_srcvar = LLVMGetNamedFunction(llvm_types_module, varname); + if (!v_srcvar) + elog(ERROR, "function %s not in llvmjit_types.c", varname); - /* look at the contained type */ - Assert(LLVMGetTypeKind(typ) == LLVMPointerTypeKind); - typ = LLVMGetElementType(typ); - Assert(typ != NULL && LLVMGetTypeKind(typ) == LLVMFunctionTypeKind); + typ = LLVMGetFunctionType(v_srcvar); return typ; } @@ -502,7 +505,7 @@ llvm_pg_func(LLVMModuleRef mod, const char *funcname) v_fn = LLVMAddFunction(mod,
Re: LLVM 16 (opaque pointers)
> On Thu, Oct 12, 2023 at 04:31:20PM -0700, Andres Freund wrote: > Hi, > > On 2023-10-11 21:59:50 +1300, Thomas Munro wrote: > > +#else > > + LLVMPassBuilderOptionsRef options; > > + LLVMErrorRef err; > > + int compile_optlevel; > > + char *passes; > > + > > + if (context->base.flags & PGJIT_OPT3) > > + compile_optlevel = 3; > > + else > > + compile_optlevel = 0; > > + > > + passes = > > psprintf("default,mem2reg,function(no-op-function),no-op-module", > > + compile_optlevel); > > I don't think the "function(no-op-function),no-op-module" bit does something > particularly useful? Right, looks like leftovers after verifying which passes were actually applied. My bad, could be removed. > I also don't think we should add the mem2reg pass outside of -O0 - running it > after a real optimization pipeline doesn't seem useful and might even make the > code worse? mem2reg is included in default (and obviously also in O3). My understanding was that while mem2reg is included everywhere above -O0, this set of passes won't hurt. But yeah, if you say it could degrade the final result, it's better to not do this. I'll update this part.
Re: LLVM 16 (opaque pointers)
Hi, On 2023-10-11 21:59:50 +1300, Thomas Munro wrote: > +#else > + LLVMPassBuilderOptionsRef options; > + LLVMErrorRef err; > + int compile_optlevel; > + char *passes; > + > + if (context->base.flags & PGJIT_OPT3) > + compile_optlevel = 3; > + else > + compile_optlevel = 0; > + > + passes = > psprintf("default,mem2reg,function(no-op-function),no-op-module", > + compile_optlevel); I don't think the "function(no-op-function),no-op-module" bit does something particularly useful? I also don't think we should add the mem2reg pass outside of -O0 - running it after a real optimization pipeline doesn't seem useful and might even make the code worse? mem2reg is included in default (and obviously also in O3). Thanks for working on this stuff! I'm working on setting up buildfarm animals for 16, 17, each once with a normal and an assertion enabled LLVM build. Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
Le mercredi 11 octobre 2023, 10:59:50 CEST Thomas Munro a écrit : > The back-patch to 12 was a little trickier than anticipated, but after > taking a break and trying again I now have PG 12...17 patches that > I've tested against LLVM 10...18 (that's 54 combinations), in every > case only with the clang corresponding to LLVM. Thank you Thomas for those patches, and the extensive testing, I will run my own and let you know. > I've attached only the patches for master, but the 12-16 versions are > available at https://github.com/macdice/postgres/tree/llvm16-$N in > case anyone has comments on those. For PG13 and PG12, it looks like the ExecEvalBoolSubroutineTemplate is not used anywhere, as ExecEvalBoolSubroutine was introduced in PG14 if I'm not mistaken. Best regards, -- Ronan Dunklau
Re: LLVM 16 (opaque pointers)
On Thu, Sep 21, 2023 at 12:47 PM Thomas Munro wrote: > On Thu, Sep 21, 2023 at 12:24 PM Devrim Gündüz wrote: > > RHEL releases new LLVM version along with their new minor releases every > > 6 month, and we have to build older versions with new LLVM each time. > > From RHEL point of view, it would be great if we can back-patch back to > > v12 :( > > Got it. OK, I'll work on 12 and 13 now. The back-patch to 12 was a little trickier than anticipated, but after taking a break and trying again I now have PG 12...17 patches that I've tested against LLVM 10...18 (that's 54 combinations), in every case only with the clang corresponding to LLVM. For 12, I decided to back-patch the llvm_types_module variable that was introduced in 13, to keep the code more similar. For master, I had to rebase over Daniel's recent commits, which required re-adding unused variables removed by 2dad308e, and then changing a bunch of LLVM type constructors like LLVMInt8Type() to the LLVMInt8TypeInContext(lc, ...) variants following the example of 9dce2203. Without that, type assertions in my LLVM 18 debug build would fail (and maybe there could be a leak problem, though I'm not sure that really applied to integer (non-struct) types). I've attached only the patches for master, but the 12-16 versions are available at https://github.com/macdice/postgres/tree/llvm16-$N in case anyone has comments on those. From 6d7b21fe103978ceb34f758284938e66391c0a7e Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 9 May 2022 08:27:47 +1200 Subject: [PATCH v3 1/2] jit: Support opaque pointers in LLVM 16. Remove use of LLVMGetElementType() and provide the type of all pointers to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM versions[1]. * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions. * For LLVM == 15, we'll continue to do the same, explicitly opting out of opaque pointer mode. * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take the extra type argument. The difference is hidden behind some new IR emitting wrapper functions l_load(), l_gep(), l_call() etc. The change is mostly mechanical, except that at each site the correct type had to be provided. In some places we needed to do some extra work to get functions types, including some new wrappers for C++ APIs that are not yet exposed by in LLVM's C API, and some new "example" functions in llvmjit_types.c because it's no longer possible to start from the function pointer type and ask for the function type. Back-patch to 12, because it's a little tricker in 11 and we agreed not to put the latest LLVM support into the upcoming final release of 11. [1] https://llvm.org/docs/OpaquePointers.html Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Ronan Dunklau Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com --- src/backend/jit/llvm/llvmjit.c| 59 ++-- src/backend/jit/llvm/llvmjit_deform.c | 119 src/backend/jit/llvm/llvmjit_expr.c | 401 -- src/backend/jit/llvm/llvmjit_types.c | 39 ++- src/backend/jit/llvm/llvmjit_wrap.cpp | 12 + src/backend/jit/llvm/meson.build | 2 +- src/include/jit/llvmjit.h | 7 + src/include/jit/llvmjit_emit.h| 106 +-- 8 files changed, 481 insertions(+), 264 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 4dfaf79743..1c8e1ab3a7 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -64,8 +64,10 @@ LLVMTypeRef TypeStorageBool; LLVMTypeRef TypePGFunction; LLVMTypeRef StructNullableDatum; LLVMTypeRef StructHeapTupleData; +LLVMTypeRef StructMinimalTupleData; LLVMTypeRef StructTupleDescData; LLVMTypeRef StructTupleTableSlot; +LLVMTypeRef StructHeapTupleHeaderData; LLVMTypeRef StructHeapTupleTableSlot; LLVMTypeRef StructMinimalTupleTableSlot; LLVMTypeRef StructMemoryContextData; @@ -76,8 +78,11 @@ LLVMTypeRef StructExprState; LLVMTypeRef StructAggState; LLVMTypeRef StructAggStatePerGroupData; LLVMTypeRef StructAggStatePerTransData; +LLVMTypeRef StructPlanState; LLVMValueRef AttributeTemplate; +LLVMValueRef ExecEvalSubroutineTemplate; +LLVMValueRef ExecEvalBoolSubroutineTemplate; static LLVMModuleRef llvm_types_module = NULL; @@ -451,11 +456,7 @@ llvm_pg_var_type(const char *varname) if (!v_srcvar) elog(ERROR, "variable %s not in llvmjit_types.c", varname); - /* look at the contained type */ - typ = LLVMTypeOf(v_srcvar); - Assert(typ != NULL && LLVMGetTypeKind(typ) == LLVMPointerTypeKind); - typ = LLVMGetElementType(typ); - Assert(typ != NULL); + typ = LLVMGlobalGetValueType(v_srcvar); return typ; } @@ -467,12 +468,14 @@ llvm_pg_var_type(const char *varname) LLVMTypeRef llvm_pg_var_func_type(const char *varname) { - LLVMTypeRef typ = llvm_pg_var_type(varname); + LLVMValueRef v_srcvar; + LLVMTypeRef typ; + +
Re: LLVM 16 (opaque pointers)
On Thu, Sep 21, 2023 at 12:24 PM Devrim Gündüz wrote: > On Thu, 2023-09-21 at 08:22 +1200, Thomas Munro wrote: > > So far I've tested LLVM versions 10, 15, 16, 17, 18 (= their main > > branch) against PostgreSQL versions 14, 15, 16. I've attached the > > versions that apply to master and 16, and pushed back-patches to 14 > > and 15 to public branches if anyone's interested[1]. Back-patching > > further seems a bit harder. I'm quite willing to do it, but ... do we > > actually need to, ie does anyone really *require* old PostgreSQL > > release branches to work with new LLVM? > > RHEL releases new LLVM version along with their new minor releases every > 6 month, and we have to build older versions with new LLVM each time. > From RHEL point of view, it would be great if we can back-patch back to > v12 :( Got it. OK, I'll work on 12 and 13 now.
Re: LLVM 16 (opaque pointers)
Hi Thomas, On Thu, 2023-09-21 at 08:22 +1200, Thomas Munro wrote: > So far I've tested LLVM versions 10, 15, 16, 17, 18 (= their main > branch) against PostgreSQL versions 14, 15, 16. I've attached the > versions that apply to master and 16, and pushed back-patches to 14 > and 15 to public branches if anyone's interested[1]. Back-patching > further seems a bit harder. I'm quite willing to do it, but ... do we > actually need to, ie does anyone really *require* old PostgreSQL > release branches to work with new LLVM? RHEL releases new LLVM version along with their new minor releases every 6 month, and we have to build older versions with new LLVM each time. >From RHEL point of view, it would be great if we can back-patch back to v12 :( Regards, -- Devrim Gündüz Open Source Solution Architect, PostgreSQL Major Contributor Twitter: @DevrimGunduz , @DevrimGunduzTR
Re: LLVM 16 (opaque pointers)
Belated thanks Dmitry, Ronan, Andres for your feedback. Here's a new version, also including Dmitry's patch for 17 which it is now also time to push. It required a bit more trivial #if magic to be conditional, as Dmitry already mentioned. I just noticed that Dmitry had the LLVMPassBuilderOptionsSetInlinerThreshold() function added to LLVM 17's C API for this patch. Thanks! (Better than putting stuff in llvmjit_wrap.c, if you can get it upstreamed in time.) I thought I needed to block users from building with too-old clang and too-new LLVM, but I haven't managed to find a combination that actually breaks anything. I wouldn't recommend it, but for example clang 10 bitcode seems to be inlinable without problems by LLVM 16 on my system (I didn't use an assert build of LLVM though). I think that could be a separate adjustment if we learn that we need to enforce or document a constraint there. So far I've tested LLVM versions 10, 15, 16, 17, 18 (= their main branch) against PostgreSQL versions 14, 15, 16. I've attached the versions that apply to master and 16, and pushed back-patches to 14 and 15 to public branches if anyone's interested[1]. Back-patching further seems a bit harder. I'm quite willing to do it, but ... do we actually need to, ie does anyone really *require* old PostgreSQL release branches to work with new LLVM? (I'll start a separate thread about the related question of when we get to drop support for old LLVMs.) One point from an earlier email: On Sat, Aug 12, 2023 at 6:09 AM Andres Freund wrote: > > AttributeTemplate(PG_FUNCTION_ARGS) > > { > > + PGFunction fp PG_USED_FOR_ASSERTS_ONLY; > > + > > + fp = > Other parts of the file do this by putting the functions into > referenced_functions[], i'd copy that here and below. Actually here I just wanted to assert that the 3 template functions match certain function pointer types. To restate what these functions are about: in the JIT code I need the function type, but we have only the function pointer type, and it is now impossible to go from a function pointer type to a function type, so I needed to define some example functions with the right prototype (well, one of them existed already but I needed more), and then I wanted to assert that they are assignable to the appropriate function pointer types. Does that make sense? In this version I changed it to what I hope is a more obvious/explicit expression of that goal: + AssertVariableIsOfType(, + ExecEvalSubroutine); [1] https://github.com/macdice/postgres/tree/llvm16-14 and -15 From d5e4ac6dd9c2016c00bedfd8cd404e2f277701e1 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 9 May 2022 08:27:47 +1200 Subject: [PATCH v2 1/2] jit: Support opaque pointers in LLVM 16. Remove use of LLVMGetElementType() and provide the type of all pointers to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM versions[1]. * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions. * For LLVM == 15, we'll continue to do the same, explicitly opting out of opaque pointer mode. * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take the extra type argument. The difference is hidden behind some new IR emitting wrapper functions l_load(), l_gep(), l_call() etc. The change is mostly mechanical, except that at each site the correct type had to be provided. In some places we needed to do some extra work to get functions types, including some new wrappers for C++ APIs that are not yet exposed by in LLVM's C API, and some new "example" functions in llvmjit_types.c because it's no longer possible to start from the function pointer type and ask for the function type. Back-patch to 12, because it's a little tricker in 11 and we agreed not to put the latest LLVM support into the upcoming final release of 11. [1] https://llvm.org/docs/OpaquePointers.html Reviewed-by: Dmitry Dolgov <9erthali...@gmail.com> Reviewed-by: Ronan Dunklau Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com --- src/backend/jit/llvm/llvmjit.c| 57 ++-- src/backend/jit/llvm/llvmjit_deform.c | 119 src/backend/jit/llvm/llvmjit_expr.c | 401 -- src/backend/jit/llvm/llvmjit_types.c | 39 ++- src/backend/jit/llvm/llvmjit_wrap.cpp | 12 + src/backend/jit/llvm/meson.build | 2 +- src/include/jit/llvmjit.h | 7 + src/include/jit/llvmjit_emit.h| 106 +-- 8 files changed, 479 insertions(+), 264 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 09650e2c70..06bb440503 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -85,8 +85,11 @@ LLVMTypeRef StructExprState; LLVMTypeRef StructAggState; LLVMTypeRef StructAggStatePerGroupData; LLVMTypeRef StructAggStatePerTransData; +LLVMTypeRef
Re: LLVM 16 (opaque pointers)
[...] Further changes are already needed for their "main" branch (LLVM 17-to-be), so this won't quite be enough to shut seawasp up. For information, the physical server which was hosting my 2 bf animals (seawasp and moonjelly) has given up rebooting after a power cut a few weeks/months ago, and I have not setup a replacement (yet). -- Fabien.
Re: LLVM 16 (opaque pointers)
Hi, On 2023-05-21 15:01:41 +1200, Thomas Munro wrote: > *For anyone working with this type of IR generation code and > questioning their sanity, I can pass on some excellent advice I got > from Andres: build LLVM yourself with assertions enabled, as they > catch some classes of silly mistake that otherwise just segfault > inscrutably on execution. Hm. I think we need a buildfarm animal with an assertion enabled llvm 16 once we merge this. I think after an upgrade my buildfarm machine has the necessary resources. > @@ -150,7 +150,7 @@ llvm_compile_expr(ExprState *state) > > /* create function */ > eval_fn = LLVMAddFunction(mod, funcname, > - > llvm_pg_var_func_type("TypeExprStateEvalFunc")); > + > llvm_pg_var_func_type("ExecInterpExprStillValid")); Hm, that's a bit ugly. But ... > @@ -77,9 +80,44 @@ extern Datum AttributeTemplate(PG_FUNCTION_ARGS); > Datum > AttributeTemplate(PG_FUNCTION_ARGS) > { > + PGFunction fp PG_USED_FOR_ASSERTS_ONLY; > + > + fp = > PG_RETURN_NULL(); > } Other parts of the file do this by putting the functions into referenced_functions[], i'd copy that here and below. > +void > +ExecEvalSubroutineTemplate(ExprState *state, > +struct ExprEvalStep *op, > +ExprContext *econtext) > +{ > + ExecEvalSubroutine fp PG_USED_FOR_ASSERTS_ONLY; > + > + fp = > +} > + > +extern bool ExecEvalBoolSubroutineTemplate(ExprState *state, > + >struct ExprEvalStep *op, > + >ExprContext *econtext); > +bool > +ExecEvalBoolSubroutineTemplate(ExprState *state, > +struct ExprEvalStep > *op, > +ExprContext > *econtext) > +{ > + ExecEvalBoolSubroutine fp PG_USED_FOR_ASSERTS_ONLY; > + > + fp = > + return false; > +} > + Thanks for working on this! Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
Hi, On 2023-08-10 16:56:54 +0200, Ronan Dunklau wrote: > I tried my hand at backporting it to previous versions, and not knowing > anything about it made me indeed question my sanity. It's quite easy for PG > 15, 14, 13. PG 12 is nothing insurmontable either, but PG 11 is a bit hairier > most notably due to to the change in fcinfo args representation. But I guess > that's also a topic for another day :-) Given that 11 is about to be EOL, I don't think it's worth spending the time to support a new LLVM version for it. Greetings, Andres Freund
Re: LLVM 16 (opaque pointers)
Le dimanche 21 mai 2023, 05:01:41 CEST Thomas Munro a écrit : > Hi, > > Here is a draft version of the long awaited patch to support LLVM 16. > It's mostly mechanical donkeywork, but it took some time: this donkey > found it quite hard to understand the mighty getelementptr > instruction[1] and the code generation well enough to figure out all > the right types, and small mistakes took some debugging effort*. I > now finally have a patch that passes all tests. > > Though it's not quite ready yet, I thought I should give this status > update to report that the main task is more or less complete, since > we're starting to get quite a few emails about it (mostly from Fedora > users) and there is an entry for it on the Open Items for 16 wiki > page. Comments/review/testing welcome. Hello Thomas, Thank you for this effort ! I've tested it against llvm 15 and 16, and found no problem with it. > 6. I need to go through the types again with a fine tooth comb, and > check the test coverage to look out for eg GEP array arithmetic with > the wrong type/size that isn't being exercised. I haven't gone through the test coverage myself, but I exercised the following things: - running make installcheck with jit_above_cost = 0 - letting sqlsmith hammer random queries at it for a few hours. This didn't show obvious issues. > *For anyone working with this type of IR generation code and > questioning their sanity, I can pass on some excellent advice I got > from Andres: build LLVM yourself with assertions enabled, as they > catch some classes of silly mistake that otherwise just segfault > > inscrutably on execution. I tried my hand at backporting it to previous versions, and not knowing anything about it made me indeed question my sanity. It's quite easy for PG 15, 14, 13. PG 12 is nothing insurmontable either, but PG 11 is a bit hairier most notably due to to the change in fcinfo args representation. But I guess that's also a topic for another day :-) Best regards, -- Ronan Dunklau
Re: LLVM 16 (opaque pointers)
> On Mon, May 22, 2023 at 03:38:44PM +1200, Thomas Munro wrote: > Further changes are already needed for their "main" branch (LLVM > 17-to-be), so this won't quite be enough to shut seawasp up. At a > glance, we will need to change from the "old pass manager" API that > has recently been vaporised[1] > (llvm-c/Transforms/PassManagerBuilder.h) to the new one[2][3] > (llvm-c/Transforms/PassBuilder.h), which I suspect/hope will be as > simple as changing llvmjit.c to call LLVMRunPasses() with a string > describing the passes we want in "opt -passes" format, instead of our > code that calls LLVMAddFunctionInlingPass() etc. But that'll be a > topic for another day, and another thread. > > [1] > https://github.com/llvm/llvm-project/commit/0aac9a2875bad4f065367e4a6553fad78605f895 > [2] https://llvm.org/docs/NewPassManager.html > [3] https://reviews.llvm.org/D102136 Thanks for tackling the topic! I've tested it with a couple of versions, LLVM 12 that comes with my Gentoo box, LLVM 15 build from sources and the modified version of patch adopted for LLVM 17 (build form sources as well). In all three cases everything seems to be working fine. Simple benchmarking with a query stolen from some other jit thread (pgbench running single client with multiple unions of selects a-la SELECT a, count(*), sum(b) FROM test WHERE c = 2 GROUP BY a) show some slight performance differences, but nothing dramatic so far. LLVM 17 version produces the lowest latency, with faster generation, inlining and optimization, but slower emission time. LLVM 12 version produces the largest latencies with everything except emission timings being slower. LLVM 15 is somewhere in between. I'll continue reviewing and, for the records, attach adjustments I was using for LLVM 17 (purely for testing, not taking into account other versions), in case if I've missed something. >From 33e39a376fdbcceb6d4e757f4a798b3922e7068c Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Sun, 4 Jun 2023 11:10:41 +0200 Subject: [PATCH 2/2] LLVM 17 --- src/backend/jit/llvm/llvmjit.c| 73 --- src/backend/jit/llvm/llvmjit_wrap.cpp | 4 ++ 2 files changed, 25 insertions(+), 52 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index b83bc04ae3a..b701e167abd 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -18,6 +18,9 @@ #include #include #include +#if LLVM_VERSION_MAJOR > 16 +#include +#endif #if LLVM_VERSION_MAJOR > 11 #include #include @@ -27,12 +30,14 @@ #endif #include #include +#if LLVM_VERSION_MAJOR < 17 #include #include #include #if LLVM_VERSION_MAJOR > 6 #include #endif +#endif #include "jit/llvmjit.h" #include "jit/llvmjit_emit.h" @@ -559,69 +564,33 @@ llvm_function_reference(LLVMJitContext *context, static void llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module) { - LLVMPassManagerBuilderRef llvm_pmb; - LLVMPassManagerRef llvm_mpm; - LLVMPassManagerRef llvm_fpm; - LLVMValueRef func; + LLVMPassBuilderOptionsRef options; + LLVMErrorRef err; int compile_optlevel; + char *passes; if (context->base.flags & PGJIT_OPT3) compile_optlevel = 3; else compile_optlevel = 0; - /* - * Have to create a new pass manager builder every pass through, as the - * inliner has some per-builder state. Otherwise one ends up only inlining - * a function the first time though. - */ - llvm_pmb = LLVMPassManagerBuilderCreate(); - LLVMPassManagerBuilderSetOptLevel(llvm_pmb, compile_optlevel); - llvm_fpm = LLVMCreateFunctionPassManagerForModule(module); + passes = psprintf("default,mem2reg,function(no-op-function),no-op-module", + compile_optlevel); - if (context->base.flags & PGJIT_OPT3) - { - /* TODO: Unscientifically determined threshold */ - LLVMPassManagerBuilderUseInlinerWithThreshold(llvm_pmb, 512); - } - else - { - /* we rely on mem2reg heavily, so emit even in the O0 case */ - LLVMAddPromoteMemoryToRegisterPass(llvm_fpm); - } + options = LLVMCreatePassBuilderOptions(); - LLVMPassManagerBuilderPopulateFunctionPassManager(llvm_pmb, llvm_fpm); +#ifdef LLVM_PASS_DEBUG + LLVMPassBuilderOptionsSetDebugLogging(options, 1); +#endif - /* - * Do function level optimization. This could be moved to the point where - * functions are emitted, to reduce memory usage a bit. - */ - LLVMInitializeFunctionPassManager(llvm_fpm); - for (func = LLVMGetFirstFunction(context->module); - func != NULL; - func = LLVMGetNextFunction(func)) - LLVMRunFunctionPassManager(llvm_fpm, func); - LLVMFinalizeFunctionPassManager(llvm_fpm); - LLVMDisposePassManager(llvm_fpm); + LLVMPassBuilderOptionsSetInlinerThreshold(options, 512); - /* - * Perform module level optimization. We do so even in the non-optimized - * case, so always-inline functions etc get inlined. It's cheap enough. - */ - llvm_mpm = LLVMCreatePassManager(); - LLVMPassManagerBuilderPopulateModulePassManager(llvm_pmb, -
Re: LLVM 16 (opaque pointers)
Oh, one important thing I forgot to mention: that patch is for LLVM 16 only, and I developed it with a local build of their "release/16.x" branch on a FreeBSD box, and also tested with a released package for 16 on a Debian box. Further changes are already needed for their "main" branch (LLVM 17-to-be), so this won't quite be enough to shut seawasp up. At a glance, we will need to change from the "old pass manager" API that has recently been vaporised[1] (llvm-c/Transforms/PassManagerBuilder.h) to the new one[2][3] (llvm-c/Transforms/PassBuilder.h), which I suspect/hope will be as simple as changing llvmjit.c to call LLVMRunPasses() with a string describing the passes we want in "opt -passes" format, instead of our code that calls LLVMAddFunctionInlingPass() etc. But that'll be a topic for another day, and another thread. [1] https://github.com/llvm/llvm-project/commit/0aac9a2875bad4f065367e4a6553fad78605f895 [2] https://llvm.org/docs/NewPassManager.html [3] https://reviews.llvm.org/D102136
LLVM 16 (opaque pointers)
Hi, Here is a draft version of the long awaited patch to support LLVM 16. It's mostly mechanical donkeywork, but it took some time: this donkey found it quite hard to understand the mighty getelementptr instruction[1] and the code generation well enough to figure out all the right types, and small mistakes took some debugging effort*. I now finally have a patch that passes all tests. Though it's not quite ready yet, I thought I should give this status update to report that the main task is more or less complete, since we're starting to get quite a few emails about it (mostly from Fedora users) and there is an entry for it on the Open Items for 16 wiki page. Comments/review/testing welcome. Here are some things I think I need to do next (probably after PGCon): 1. If you use non-matching clang and LLVM versions I think we might use "clang -no-opaque-pointers" at the wrong times (I've not looked into that interaction yet). 2. The treatment of function types is a bit inconsistent/messy and could be tidied up. 3. There are quite a lot of extra function calls that could perhaps be elided (ie type variables instead of LLVMTypeInt8(), and calls to LLVMStructGetTypeAtIndex() that are not used in LLVM < 16). 4. Could use some comments. 5. I need to test with very old versions of LLVM and Clang that we claim to support (I have several years' worth of releases around but nothing older than 9). 6. I need to go through the types again with a fine tooth comb, and check the test coverage to look out for eg GEP array arithmetic with the wrong type/size that isn't being exercised. *For anyone working with this type of IR generation code and questioning their sanity, I can pass on some excellent advice I got from Andres: build LLVM yourself with assertions enabled, as they catch some classes of silly mistake that otherwise just segfault inscrutably on execution. [1] https://llvm.org/docs/GetElementPtr.html From 996e4bceee21a9f27116623c1fdac8eab0cdf814 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 9 May 2022 08:27:47 +1200 Subject: [PATCH] jit: Support opaque pointers in LLVM 16. Remove use of LLVMGetElementType() and provide the type of all pointers to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM versions[1]. * For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions. * For LLVM == 15, we'll continue to do the same, explicitly opting out of opaque pointer mode. * For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take the extra type argument. The difference is hidden behind some new IR emitting wrapper functions l_load(), l_gep(), l_call() etc. The change is mostly mechanical, except that at each site the correct type had to be provided. In some places we needed to do some extra work to get functions types, including some new wrappers for C++ APIs that are not yet exposed by in LLVM's C API, and some new "example" functions in llvmjit_types.c because it's no longer possible to start from the function pointer type and ask for the function type. Back-patch to all releases. [1] https://llvm.org/docs/OpaquePointers.html diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 04ae3052a8..b83bc04ae3 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -85,8 +85,11 @@ LLVMTypeRef StructExprState; LLVMTypeRef StructAggState; LLVMTypeRef StructAggStatePerGroupData; LLVMTypeRef StructAggStatePerTransData; +LLVMTypeRef StructPlanState; LLVMValueRef AttributeTemplate; +LLVMValueRef ExecEvalSubroutineTemplate; +LLVMValueRef ExecEvalBoolSubroutineTemplate; LLVMModuleRef llvm_types_module = NULL; @@ -382,11 +385,7 @@ llvm_pg_var_type(const char *varname) if (!v_srcvar) elog(ERROR, "variable %s not in llvmjit_types.c", varname); - /* look at the contained type */ - typ = LLVMTypeOf(v_srcvar); - Assert(typ != NULL && LLVMGetTypeKind(typ) == LLVMPointerTypeKind); - typ = LLVMGetElementType(typ); - Assert(typ != NULL); + typ = LLVMGlobalGetValueType(v_srcvar); return typ; } @@ -398,12 +397,14 @@ llvm_pg_var_type(const char *varname) LLVMTypeRef llvm_pg_var_func_type(const char *varname) { - LLVMTypeRef typ = llvm_pg_var_type(varname); + LLVMValueRef v_srcvar; + LLVMTypeRef typ; + + v_srcvar = LLVMGetNamedFunction(llvm_types_module, varname); + if (!v_srcvar) + elog(ERROR, "function %s not in llvmjit_types.c", varname); - /* look at the contained type */ - Assert(LLVMGetTypeKind(typ) == LLVMPointerTypeKind); - typ = LLVMGetElementType(typ); - Assert(typ != NULL && LLVMGetTypeKind(typ) == LLVMFunctionTypeKind); + typ = LLVMGetFunctionType(v_srcvar); return typ; } @@ -433,7 +434,7 @@ llvm_pg_func(LLVMModuleRef mod, const char *funcname) v_fn = LLVMAddFunction(mod, funcname, - LLVMGetElementType(LLVMTypeOf(v_srcfn))); + LLVMGetFunctionType(v_srcfn)); llvm_copy_attributes(v_srcfn, v_fn); return v_fn; @@ -529,7 +530,7 @@