Re: broken JIT support on Fedora 40
> On Tue, May 14, 2024 at 11:09:39AM -0400, Robert Haas wrote: > On Tue, Apr 9, 2024 at 8:44 PM Thomas Munro wrote: > > I pushed the illegal attribute fix though. Thanks for the detective work! > > This was commit 53c8d6c9f157f2bc8211b8de02417e55fefddbc7 and as I > understand it that fixed the issue originally reported on this thread. > > Therefore, I have marked https://commitfest.postgresql.org/48/4917/ as > Committed. > > If that's not correct, please feel free to fix. If there are other > issues that need to be patched separately, please consider opening a > new CF entry for those issues once a patch is available. Thanks, that's correct. I think the only thing left is to add a verifier pass, which everybody seems to be agreed is nice to have. The plan is to add it only to master without back-patching. I assume Thomas did not have time for that yet, so I've added the latest suggestions into his patch, and going to open a CF item to not forget about it. >From 67d955d9b69ea5204f02ace1a27ee4938b0dfd3a Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 9 Apr 2024 18:57:48 +1200 Subject: [PATCH v1] Run LLVM verify pass on all generated IR. We fixed a recent problem that crashed LLVM 18, but Dmitry pointed out that we'd have known about that all along if we'd automatically run the verify pass on the IR we generate. Turn that on in assertion builds. That reveals one other complaint about a name, which is also fixed here. Suggested-by: Dmitry Dolgov <9erthali...@gmail.com> Discussion: https://postgr.es/m/CAFj8pRACpVFr7LMdVYENUkScG5FCYMZDDdSGNU-tch%2Bw98OxYg%40mail.gmail.com --- src/backend/jit/llvm/llvmjit.c | 5 + src/backend/jit/llvm/llvmjit_expr.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 1d439f24554..0bafe309bb6 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -714,6 +714,11 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module) LLVMPassBuilderOptionsSetDebugLogging(options, 1); #endif + /* In assertion builds, run the LLVM verify pass. */ +#ifdef USE_ASSERT_CHECKING + LLVMPassBuilderOptionsSetVerifyEach(options, true); +#endif + LLVMPassBuilderOptionsSetInlinerThreshold(options, 512); err = LLVMRunPasses(module, passes, NULL, options); diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c index 9e0efd26687..cd956fa7e38 100644 --- a/src/backend/jit/llvm/llvmjit_expr.c +++ b/src/backend/jit/llvm/llvmjit_expr.c @@ -2765,7 +2765,7 @@ create_LifetimeEnd(LLVMModuleRef mod) LLVMContextRef lc; /* variadic pointer argument */ - const char *nm = "llvm.lifetime.end.p0i8"; + const char *nm = "llvm.lifetime.end.p0"; fn = LLVMGetNamedFunction(mod, nm); if (fn) base-commit: a3e6c6f929912f928fa405909d17bcbf0c1b03ee -- 2.41.0
Re: broken JIT support on Fedora 40
On Tue, Apr 9, 2024 at 8:44 PM Thomas Munro wrote: > I pushed the illegal attribute fix though. Thanks for the detective work! This was commit 53c8d6c9f157f2bc8211b8de02417e55fefddbc7 and as I understand it that fixed the issue originally reported on this thread. Therefore, I have marked https://commitfest.postgresql.org/48/4917/ as Committed. If that's not correct, please feel free to fix. If there are other issues that need to be patched separately, please consider opening a new CF entry for those issues once a patch is available. -- Robert Haas EDB: http://www.enterprisedb.com
Re: broken JIT support on Fedora 40
Hi, On 2024-04-10 22:15:27 +0200, Dmitry Dolgov wrote: > > On Wed, Apr 10, 2024 at 12:43:23PM +1200, Thomas Munro wrote: > > On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > + /* In assertion builds, run the LLVM verify pass. */ > > > +#ifdef USE_ASSERT_CHECKING > > > + LLVMPassBuilderOptionsSetVerifyEach(options, true); > > > +#endif > > > > Thanks, that seems nicer. I think the question is whether it will > > slow down build farm/CI/local meson test runs to a degree that exceeds > > its value. Another option would be to have some other opt-in macro, > > like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain > > JIT-related stuff to turn on. > > Oh, I see. I'm afraid I don't have enough knowledge of the CI pipeline, > but at least locally for me installcheck became only few percent slower > with the verify pass. I think it's worthwhile to add. It makes some problems so much easier to find. And if you're building with debug enabled llvm, performance is so atrocious anyway, that this isn't going to change the big picture... > > Supposing we go with USE_ASSERT_CHECKING, I have another question: > > > > - const char *nm = "llvm.lifetime.end.p0i8"; > > + const char *nm = "llvm.lifetime.end.p0"; > > > > Was that a mistake, or did the mangling rules change in some version? > > I'm not sure, but inclined to think it's the same as with noundef -- a > mistake, which was revealed in some recent version of LLVM. From what I > understand the suffix i8 indicates an overloaded argument of that type, > which is probably not needed. At the same time I can't get this error > from the verify pass with LLVM-12 or LLVM-15 (I have those at hand by > accident). To make it even more confusing I've found a few similar > examples in other projects, where this was really triggered by an issue > in LLVM [1]. I'm afraid that it actually has changed over time, I'm fairly sure that it was required at some point. Greetings, Andres Freund
Re: broken JIT support on Fedora 40
> On Wed, Apr 10, 2024 at 12:43:23PM +1200, Thomas Munro wrote: > On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > + /* In assertion builds, run the LLVM verify pass. */ > > +#ifdef USE_ASSERT_CHECKING > > + LLVMPassBuilderOptionsSetVerifyEach(options, true); > > +#endif > > Thanks, that seems nicer. I think the question is whether it will > slow down build farm/CI/local meson test runs to a degree that exceeds > its value. Another option would be to have some other opt-in macro, > like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain > JIT-related stuff to turn on. Oh, I see. I'm afraid I don't have enough knowledge of the CI pipeline, but at least locally for me installcheck became only few percent slower with the verify pass. > Supposing we go with USE_ASSERT_CHECKING, I have another question: > > - const char *nm = "llvm.lifetime.end.p0i8"; > + const char *nm = "llvm.lifetime.end.p0"; > > Was that a mistake, or did the mangling rules change in some version? I'm not sure, but inclined to think it's the same as with noundef -- a mistake, which was revealed in some recent version of LLVM. From what I understand the suffix i8 indicates an overloaded argument of that type, which is probably not needed. At the same time I can't get this error from the verify pass with LLVM-12 or LLVM-15 (I have those at hand by accident). To make it even more confusing I've found a few similar examples in other projects, where this was really triggered by an issue in LLVM [1]. [1]: https://github.com/rust-lang/rust/issues/102738
Re: broken JIT support on Fedora 40
st 10. 4. 2024 v 2:44 odesílatel Thomas Munro napsal: > On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthali...@gmail.com> > wrote: > > + /* In assertion builds, run the LLVM verify pass. */ > > +#ifdef USE_ASSERT_CHECKING > > + LLVMPassBuilderOptionsSetVerifyEach(options, true); > > +#endif > > Thanks, that seems nicer. I think the question is whether it will > slow down build farm/CI/local meson test runs to a degree that exceeds > its value. Another option would be to have some other opt-in macro, > like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain > JIT-related stuff to turn on. > > Supposing we go with USE_ASSERT_CHECKING, I have another question: > > - const char *nm = "llvm.lifetime.end.p0i8"; > + const char *nm = "llvm.lifetime.end.p0"; > > Was that a mistake, or did the mangling rules change in some version? > I don't currently feel inclined to go and test this on the ancient > versions we claim to support in back-branches. Perhaps we should just > do this in master, and then it'd be limited to worrying about LLVM > versions 10-18 (see 820b5af7), which have the distinct advantage of > being available in package repositories for testing. Or I suppose we > could back-patch, but only do it if LLVM_VERSION_MAJOR >= 10. Or we > could do it unconditionally, and wait for ancient-LLVM build farm > animals to break if they're going to. > > I pushed the illegal attribute fix though. Thanks for the detective work! > > (It crossed my mind that perhaps deform functions should have their > own template function, but if someone figures out that that's a good > idea, I think we'll *still* need that change just pushed.) > all tests passed on fc 40 without problems Thank you Pavel
Re: broken JIT support on Fedora 40
On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > + /* In assertion builds, run the LLVM verify pass. */ > +#ifdef USE_ASSERT_CHECKING > + LLVMPassBuilderOptionsSetVerifyEach(options, true); > +#endif Thanks, that seems nicer. I think the question is whether it will slow down build farm/CI/local meson test runs to a degree that exceeds its value. Another option would be to have some other opt-in macro, like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain JIT-related stuff to turn on. Supposing we go with USE_ASSERT_CHECKING, I have another question: - const char *nm = "llvm.lifetime.end.p0i8"; + const char *nm = "llvm.lifetime.end.p0"; Was that a mistake, or did the mangling rules change in some version? I don't currently feel inclined to go and test this on the ancient versions we claim to support in back-branches. Perhaps we should just do this in master, and then it'd be limited to worrying about LLVM versions 10-18 (see 820b5af7), which have the distinct advantage of being available in package repositories for testing. Or I suppose we could back-patch, but only do it if LLVM_VERSION_MAJOR >= 10. Or we could do it unconditionally, and wait for ancient-LLVM build farm animals to break if they're going to. I pushed the illegal attribute fix though. Thanks for the detective work! (It crossed my mind that perhaps deform functions should have their own template function, but if someone figures out that that's a good idea, I think we'll *still* need that change just pushed.)
Re: broken JIT support on Fedora 40
> On Tue, Apr 09, 2024 at 07:07:58PM +1200, Thomas Munro wrote: > On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > > Yep, I think this is it. I've spent some hours trying to understand why > > > > suddenly deform function has noundef ret attribute, when it shouldn't -- > > > > this explains it and the proposed change fixes the crash. One thing that > > > > is still not clear to me though is why this copied attribute doesn't > > > > show up in the bitcode dumped right before running inline pass (I've > > > > added this to troubleshoot the issue). > > > > > > One thing to consider in this context is indeed adding "verify" pass as > > > suggested in the PR, at least for the debugging configuration. Without > > > the fix > > > it immediately returns: > > > > > > Running analysis: VerifierAnalysis on deform_0_1 > > > Attribute 'noundef' applied to incompatible type! > > > > > > llvm error: Broken function found, compilation aborted! > > > > Here is what I have in mind. Interestingly enough, it also shows few > > more errors besides "noundef": > > > > Intrinsic name not mangled correctly for type arguments! Should be: > > llvm.lifetime.end.p0 > > ptr @llvm.lifetime.end.p0i8 > > > > It refers to the function from create_LifetimeEnd, not sure how > > important is this. > > Would it be too slow to run the verify pass always, in assertion > builds? Here's a patch for the original issue, and a patch to try > that idea + a fix for that other complaint it spits out. The latter > would only run for LLVM 17+, but that seems OK. Sounds like a good idea. About the overhead, I've done a quick test on the reproducer at hands, doing explain analyze in a tight loop and fetching "optimization" timinigs. It gives quite visible difference 96ms p99 with verify vs 46ms p99 without verify (and a rather low stddev, ~1.5ms). But I can imagine it's acceptable for a build with assertions. Btw, I've found there is a C-api for this exposed, which produces the same warnings for me. Maybe it would be even better this way: /** * Toggle adding the VerifierPass for the PassBuilder, ensuring all functions * inside the module is valid. */ void LLVMPassBuilderOptionsSetVerifyEach(LLVMPassBuilderOptionsRef Options, LLVMBool VerifyEach); + /* In assertion builds, run the LLVM verify pass. */ +#ifdef USE_ASSERT_CHECKING + LLVMPassBuilderOptionsSetVerifyEach(options, true); +#endif
Re: broken JIT support on Fedora 40
On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > Yep, I think this is it. I've spent some hours trying to understand why > > > suddenly deform function has noundef ret attribute, when it shouldn't -- > > > this explains it and the proposed change fixes the crash. One thing that > > > is still not clear to me though is why this copied attribute doesn't > > > show up in the bitcode dumped right before running inline pass (I've > > > added this to troubleshoot the issue). > > > > One thing to consider in this context is indeed adding "verify" pass as > > suggested in the PR, at least for the debugging configuration. Without the > > fix > > it immediately returns: > > > > Running analysis: VerifierAnalysis on deform_0_1 > > Attribute 'noundef' applied to incompatible type! > > > > llvm error: Broken function found, compilation aborted! > > Here is what I have in mind. Interestingly enough, it also shows few > more errors besides "noundef": > > Intrinsic name not mangled correctly for type arguments! Should be: > llvm.lifetime.end.p0 > ptr @llvm.lifetime.end.p0i8 > > It refers to the function from create_LifetimeEnd, not sure how > important is this. Would it be too slow to run the verify pass always, in assertion builds? Here's a patch for the original issue, and a patch to try that idea + a fix for that other complaint it spits out. The latter would only run for LLVM 17+, but that seems OK. From 57af42d9a1b47b7361c7200a17a210f2ca37bd5d Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 9 Apr 2024 18:10:30 +1200 Subject: [PATCH 1/2] Fix illegal attribute LLVM attribute propagation. Commit 72559438 started copying more attributes from AttributeTemplate to the functions we generate on the fly. In the case of deform functions, which return void, this meant that "noundef", from AttributeTemplate's return value (a Datum) was copied to a void type. Older LLVM releases were OK with that, but LLVM 18 crashes. "noundef" is rejected for void by the verify pass, which would have alerted us to this problem (something we'll consider checking automatically in a later commit). Update our llvm_copy_attributes() function to skip copying the attribute for the return value, if the target function returns void. Thanks to Dmitry Dolgov for help chasing this down. Back-patch to all supported releases, like 72559438. Reported-by: Pavel Stehule Reviewed-by: Dmitry Dolgov <9erthali...@gmail.com> Discussion: https://postgr.es/m/CAFj8pRACpVFr7LMdVYENUkScG5FCYMZDDdSGNU-tch%2Bw98OxYg%40mail.gmail.com --- src/backend/jit/llvm/llvmjit.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index ec0fdd49324..92b4993a98a 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -552,8 +552,11 @@ llvm_copy_attributes(LLVMValueRef v_from, LLVMValueRef v_to) /* copy function attributes */ llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeFunctionIndex); - /* and the return value attributes */ - llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex); + if (LLVMGetTypeKind(LLVMGetFunctionReturnType(v_to)) != LLVMVoidTypeKind) + { + /* and the return value attributes */ + llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex); + } /* and each function parameter's attribute */ param_count = LLVMCountParams(v_from); -- 2.44.0 From 91d8b747686aa07341337e1cd0b7c2244e955adb Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 9 Apr 2024 18:57:48 +1200 Subject: [PATCH 2/2] Run LLVM verify pass on all generated IR. We fixed a recent problem that crashed LLVM 18, but Dmitry pointed out that we'd have known about that all along if we'd automatically run the verify pass on the IR we generate. Turn that on in assertion builds. That reveals one other complaint about a name, which is also fixed here. Suggested-by: Dmitry Dolgov <9erthali...@gmail.com> Discussion: https://postgr.es/m/CAFj8pRACpVFr7LMdVYENUkScG5FCYMZDDdSGNU-tch%2Bw98OxYg%40mail.gmail.com --- src/backend/jit/llvm/llvmjit.c | 11 +-- src/backend/jit/llvm/llvmjit_expr.c | 2 +- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 92b4993a98a..bc429e970f2 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -703,10 +703,17 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module) LLVMErrorRef err; const char *passes; + /* In assertion builds, run the LLVM verify pass. */ +#ifdef USE_ASSERT_CHECKING +#define PASSES_PREFIX "verify," +#else +#define PASSES_PREFIX "" +#endif + if (context->base.flags & PGJIT_OPT3) - passes = "default"; + passes = PASSES_PREFIX "default"; else - passes = "default,mem2reg"; + passes = PASSES_PREFIX "default,mem2reg"; options = LLVMCreatePassBuilderOptions(); diff --git
Re: broken JIT support on Fedora 40
> On Fri, Apr 05, 2024 at 03:50:50PM +0200, Dmitry Dolgov wrote: > > On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote: > > > On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote: > > > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro > > > wrote: > > > > https://github.com/llvm/llvm-project/pull/87093 > > > > > > Oh, with those clues, I think I might see... It is a bit strange that > > > we copy attributes from AttributeTemplate(), a function that returns > > > Datum, to our void deform function. It works (I mean doesn't crash) > > > if you just comment this line out: > > > > > > llvm_copy_attributes(AttributeTemplate, v_deform_fn); > > > > > > ... but I guess that disables inlining of the deform function? So > > > perhaps we just need to teach that thing not to try to copy the return > > > value's attributes, which also seems to work here: > > > > Yep, I think this is it. I've spent some hours trying to understand why > > suddenly deform function has noundef ret attribute, when it shouldn't -- > > this explains it and the proposed change fixes the crash. One thing that > > is still not clear to me though is why this copied attribute doesn't > > show up in the bitcode dumped right before running inline pass (I've > > added this to troubleshoot the issue). > > One thing to consider in this context is indeed adding "verify" pass as > suggested in the PR, at least for the debugging configuration. Without the fix > it immediately returns: > > Running analysis: VerifierAnalysis on deform_0_1 > Attribute 'noundef' applied to incompatible type! > > llvm error: Broken function found, compilation aborted! Here is what I have in mind. Interestingly enough, it also shows few more errors besides "noundef": Intrinsic name not mangled correctly for type arguments! Should be: llvm.lifetime.end.p0 ptr @llvm.lifetime.end.p0i8 It refers to the function from create_LifetimeEnd, not sure how important is this. >From 7a05bd48a4270998a08e63e07cdf1cb9932bfddd Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Fri, 5 Apr 2024 17:52:06 +0200 Subject: [PATCH v1] Add jit_verify_bitcode When passing LLVM IR through optimizer, for troubleshooting purposes it's useful to apply the "verify" pass as well. It can catch an invalid IR, if such was produced by mistake, and output a meaningful message about the error, e.g.: Attribute 'noundef' applied to incompatible type! ptr @deform_0_1 Control this via jit_verify_bitcode development guc. --- doc/src/sgml/config.sgml| 19 +++ src/backend/jit/jit.c | 1 + src/backend/jit/llvm/llvmjit.c | 11 --- src/backend/utils/misc/guc_tables.c | 11 +++ src/include/jit/jit.h | 2 +- 5 files changed, 40 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 624518e0b01..5f07ae099f4 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -11931,6 +11931,25 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1) + + jit_verify_bitcode (boolean) + + jit_verify_bitcode configuration parameter + + + + +Adds a verify pass to the pipeline, when optimizing +generated LLVM IR. It helps to identify +cases when an invalid IR was generated due to errors, and is only +useful for working on the internals of the JIT implementation. +The default setting is off. +Only superusers and users with the appropriate SET +privilege can change this setting. + + + + jit_expressions (boolean) diff --git a/src/backend/jit/jit.c b/src/backend/jit/jit.c index 815b58f33c5..ad94e4b55ee 100644 --- a/src/backend/jit/jit.c +++ b/src/backend/jit/jit.c @@ -39,6 +39,7 @@ bool jit_tuple_deforming = true; double jit_above_cost = 10; double jit_inline_above_cost = 50; double jit_optimize_above_cost = 50; +bool jit_verify_bitcode = false; static JitProviderCallbacks provider; static bool provider_successfully_loaded = false; diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index ec0fdd49324..a4b21abb83a 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -698,12 +698,17 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module) #else LLVMPassBuilderOptionsRef options; LLVMErrorRef err; - const char *passes; + const char *passes, *intermediate_passes; if (context->base.flags & PGJIT_OPT3) - passes = "default"; + intermediate_passes = "default"; else - passes = "default,mem2reg"; + intermediate_passes = "default,mem2reg"; + + if (jit_verify_bitcode) + passes = psprintf("verify,%s", intermediate_passes); + else + passes = intermediate_passes; options = LLVMCreatePassBuilderOptions(); diff --git
Re: broken JIT support on Fedora 40
> On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote: > > On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote: > > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro > > wrote: > > > https://github.com/llvm/llvm-project/pull/87093 > > > > Oh, with those clues, I think I might see... It is a bit strange that > > we copy attributes from AttributeTemplate(), a function that returns > > Datum, to our void deform function. It works (I mean doesn't crash) > > if you just comment this line out: > > > > llvm_copy_attributes(AttributeTemplate, v_deform_fn); > > > > ... but I guess that disables inlining of the deform function? So > > perhaps we just need to teach that thing not to try to copy the return > > value's attributes, which also seems to work here: > > Yep, I think this is it. I've spent some hours trying to understand why > suddenly deform function has noundef ret attribute, when it shouldn't -- > this explains it and the proposed change fixes the crash. One thing that > is still not clear to me though is why this copied attribute doesn't > show up in the bitcode dumped right before running inline pass (I've > added this to troubleshoot the issue). One thing to consider in this context is indeed adding "verify" pass as suggested in the PR, at least for the debugging configuration. Without the fix it immediately returns: Running analysis: VerifierAnalysis on deform_0_1 Attribute 'noundef' applied to incompatible type! llvm error: Broken function found, compilation aborted!
Re: broken JIT support on Fedora 40
> On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote: > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro wrote: > > https://github.com/llvm/llvm-project/pull/87093 > > Oh, with those clues, I think I might see... It is a bit strange that > we copy attributes from AttributeTemplate(), a function that returns > Datum, to our void deform function. It works (I mean doesn't crash) > if you just comment this line out: > > llvm_copy_attributes(AttributeTemplate, v_deform_fn); > > ... but I guess that disables inlining of the deform function? So > perhaps we just need to teach that thing not to try to copy the return > value's attributes, which also seems to work here: Yep, I think this is it. I've spent some hours trying to understand why suddenly deform function has noundef ret attribute, when it shouldn't -- this explains it and the proposed change fixes the crash. One thing that is still not clear to me though is why this copied attribute doesn't show up in the bitcode dumped right before running inline pass (I've added this to troubleshoot the issue).
Re: broken JIT support on Fedora 40
On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro wrote: > https://github.com/llvm/llvm-project/pull/87093 Oh, with those clues, I think I might see... It is a bit strange that we copy attributes from AttributeTemplate(), a function that returns Datum, to our void deform function. It works (I mean doesn't crash) if you just comment this line out: llvm_copy_attributes(AttributeTemplate, v_deform_fn); ... but I guess that disables inlining of the deform function? So perhaps we just need to teach that thing not to try to copy the return value's attributes, which also seems to work here: diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index ec0fdd49324..92b4993a98a 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -552,8 +552,11 @@ llvm_copy_attributes(LLVMValueRef v_from, LLVMValueRef v_to) /* copy function attributes */ llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeFunctionIndex); - /* and the return value attributes */ - llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex); + if (LLVMGetTypeKind(LLVMGetFunctionReturnType(v_to)) != LLVMVoidTypeKind) + { + /* and the return value attributes */ + llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex); + }
Re: broken JIT support on Fedora 40
On Sun, Mar 31, 2024 at 5:59 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > Yeah, sorry, I'm a bit baffled about this situation myself. Yesterday > I've opened a one-line PR fix that should address the issue, maybe this > would help. In the meantime I've attached what did work for me as a > workaround -- it essentially just makes the deform function to return > some value. It's ugly, but since call site will ignore that, and it's > only one occasion where LLVMBuildRetVoid is used, maybe it's acceptable. > Give me a moment, I'm going to test this change more (waiting on > rebuilding LLVM, it takes quire a while on my machine :( ), then can > confirm that it works as expected on the latest version. Great news. I see your PR: https://github.com/llvm/llvm-project/pull/87093 Checking their release schedule, they have: Mar 5th: 18.1.0 was released Mar 8th: 18.1.1 was released Mar 19th: 18.1.2 was released Apr 16th: 18.1.3 Apr 30th: 18.1.4 May 14th: 18.1.5 May 28th: 18.1.6 (if necessary) Our next release is May 9th. So assuming your PR goes in in the next couple of weeks and makes it into their 18.1.3 or even .4, there is no point in pushing a work-around on our side.
Re: broken JIT support on Fedora 40
> On Sat, Mar 30, 2024 at 04:38:11PM +1300, Thomas Munro wrote: > On Fri, Mar 22, 2024 at 7:15 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > For verification, I've modified the deform.outblock to call LLVMBuildRet > > > instead of LLVMBuildRetVoid and this seems to help -- inline and deform > > > stages are still performed as before, but nothing crashes. But of course > > > it doesn't sound right that inlining pass cannot process such code. > > Thanks for investigating and filing the issue. It doesn't seem to be > moving yet. Do you want to share the LLVMBuildRet() workaround? > Maybe we need to consider shipping something like that in the > meantime? Yeah, sorry, I'm a bit baffled about this situation myself. Yesterday I've opened a one-line PR fix that should address the issue, maybe this would help. In the meantime I've attached what did work for me as a workaround -- it essentially just makes the deform function to return some value. It's ugly, but since call site will ignore that, and it's only one occasion where LLVMBuildRetVoid is used, maybe it's acceptable. Give me a moment, I'm going to test this change more (waiting on rebuilding LLVM, it takes quire a while on my machine :( ), then can confirm that it works as expected on the latest version. >From 83e37220cc3e8cb04b0d8a608240fdde4b003713 Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Sat, 30 Mar 2024 17:45:38 +0100 Subject: [PATCH v1] Workaround for deform crashing in LLVM inliner Deform function returns via the outblock that contains a "ret void" instruction build with LLVMBuildRetVoid. This triggers a bug in the latest LLVM 18.1, when the inliner pass is applied to the function [1]. Avoid the issue via introducing a fake return variable, which will be ignored on the call site. [1]: https://github.com/llvm/llvm-project/issues/86162 --- src/backend/jit/llvm/llvmjit_deform.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c index b07f8e7f756..7ebc7b6d731 100644 --- a/src/backend/jit/llvm/llvmjit_deform.c +++ b/src/backend/jit/llvm/llvmjit_deform.c @@ -761,6 +761,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, { LLVMValueRef v_off = l_load(b, TypeSizeT, v_offp, ""); LLVMValueRef v_flags; + LLVMValueRef v_fake_ret = l_sizet_const(0); LLVMBuildStore(b, l_int16_const(lc, natts), v_nvalidp); v_off = LLVMBuildTrunc(b, v_off, LLVMInt32TypeInContext(lc), ""); @@ -768,7 +769,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, v_flags = l_load(b, LLVMInt16TypeInContext(lc), v_flagsp, "tts_flags"); v_flags = LLVMBuildOr(b, v_flags, l_int16_const(lc, TTS_FLAG_SLOW), ""); LLVMBuildStore(b, v_flags, v_flagsp); - LLVMBuildRetVoid(b); + LLVMBuildRet(b, v_fake_ret); } LLVMDisposeBuilder(b); base-commit: 7eb9a8201890f3b208fd4c109a5b08bf139b692a -- 2.41.0
Re: broken JIT support on Fedora 40
On Fri, Mar 22, 2024 at 7:15 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > For verification, I've modified the deform.outblock to call LLVMBuildRet > > instead of LLVMBuildRetVoid and this seems to help -- inline and deform > > stages are still performed as before, but nothing crashes. But of course > > it doesn't sound right that inlining pass cannot process such code. Thanks for investigating and filing the issue. It doesn't seem to be moving yet. Do you want to share the LLVMBuildRet() workaround? Maybe we need to consider shipping something like that in the meantime?
Re: broken JIT support on Fedora 40
> On Sun, Mar 17, 2024 at 09:02:08PM +0100, Dmitry Dolgov wrote: > > On Fri, Mar 15, 2024 at 01:54:38PM +1300, Thomas Munro wrote: > > For me it seems that the LLVMRunPasses() call, new in > > > > commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff > > Author: Thomas Munro > > Date: Wed Oct 18 22:15:54 2023 +1300 > > > > jit: Changes for LLVM 17. > > > > is reaching code that segfaults inside libLLVM, specifically in > > llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool, > > llvm::AAResults*, bool, llvm::Function*). First obvious question > > would be: is that NULL argument still acceptable? Perhaps it wants > > our LLVMTargetMachineRef there: > > > >err = LLVMRunPasses(module, passes, NULL, options); > > > > But then when we see what is does with that argument, it arrives at a > > place that apparently accepts nullptr. > > > > https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56 > > https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124 > > > > Hrmph. Might need an assertion build to learn more. I'll try to look > > again next week or so. > > Looks like I can reproduce this as well, libLLVM crashes when reaching > AddReturnAttributes inside InlineFunction, when trying to access > operands of the return instruction. I think, for whatever reason, the > latest LLVM doesn't like (i.e. do not expect this when performing > inlining pass) return instructions that do not have a return value, and > this is what happens in the outblock of deform function we generate > (slot_compile_deform). > > For verification, I've modified the deform.outblock to call LLVMBuildRet > instead of LLVMBuildRetVoid and this seems to help -- inline and deform > stages are still performed as before, but nothing crashes. But of course > it doesn't sound right that inlining pass cannot process such code. > Unfortunately I don't see any obvious change in the recent LLVM history > that would justify this outcome, might be a genuine bug, will > investigate further. I think I found the change that got it all started [1], the commit has a set of tags like 18.1.0-rc1 and is relatively recent. The message doesn't say anything related to the crash that we see, so I assume it's indeed a bug. I've opened an issue to confirm this understanding [2] (wow, issues were indeed moved to github since the last time I was touching LLVM), let's see what would be the response. [1]: https://github.com/llvm/llvm-project/commit/2da4960f20f7e5d88a68ce25636a895284dc66d8 [2]: https://github.com/llvm/llvm-project/issues/86162
Re: broken JIT support on Fedora 40
> On Fri, Mar 15, 2024 at 01:54:38PM +1300, Thomas Munro wrote: > For me it seems that the LLVMRunPasses() call, new in > > commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff > Author: Thomas Munro > Date: Wed Oct 18 22:15:54 2023 +1300 > > jit: Changes for LLVM 17. > > is reaching code that segfaults inside libLLVM, specifically in > llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool, > llvm::AAResults*, bool, llvm::Function*). First obvious question > would be: is that NULL argument still acceptable? Perhaps it wants > our LLVMTargetMachineRef there: > >err = LLVMRunPasses(module, passes, NULL, options); > > But then when we see what is does with that argument, it arrives at a > place that apparently accepts nullptr. > > https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56 > https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124 > > Hrmph. Might need an assertion build to learn more. I'll try to look > again next week or so. Looks like I can reproduce this as well, libLLVM crashes when reaching AddReturnAttributes inside InlineFunction, when trying to access operands of the return instruction. I think, for whatever reason, the latest LLVM doesn't like (i.e. do not expect this when performing inlining pass) return instructions that do not have a return value, and this is what happens in the outblock of deform function we generate (slot_compile_deform). For verification, I've modified the deform.outblock to call LLVMBuildRet instead of LLVMBuildRetVoid and this seems to help -- inline and deform stages are still performed as before, but nothing crashes. But of course it doesn't sound right that inlining pass cannot process such code. Unfortunately I don't see any obvious change in the recent LLVM history that would justify this outcome, might be a genuine bug, will investigate further.
Re: broken JIT support on Fedora 40
For me it seems that the LLVMRunPasses() call, new in commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff Author: Thomas Munro Date: Wed Oct 18 22:15:54 2023 +1300 jit: Changes for LLVM 17. is reaching code that segfaults inside libLLVM, specifically in llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool, llvm::AAResults*, bool, llvm::Function*). First obvious question would be: is that NULL argument still acceptable? Perhaps it wants our LLVMTargetMachineRef there: err = LLVMRunPasses(module, passes, NULL, options); But then when we see what is does with that argument, it arrives at a place that apparently accepts nullptr. https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56 https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124 Hrmph. Might need an assertion build to learn more. I'll try to look again next week or so.
Re: broken JIT support on Fedora 40
On Fri, Mar 15, 2024 at 12:27 PM Daniel Gustafsson wrote: > > On 14 Mar 2024, at 20:15, Pavel Stehule wrote: > > > build is ok, but regress tests fails with crash (it works without > > -with-llvm) > > Can you post some details around this crash? It doesn't seem to be a > combination we have covered in the buildfarm. Yeah, 18.1 (note they switched to 1-based minor numbers, there was no 18.0) just came out a week or so ago. Despite testing their 18 branch just before their "RC1" tag, as recently as commit d282e88e50521a457fa1b36e55f43bac02a3167f Author: Thomas Munro Date: Thu Jan 25 10:37:35 2024 +1300 Track LLVM 18 changes. at which point everything worked, it seems that something changed before they released. I haven't looked into why yet but it's crashing on my FreeBSD box too.
Re: broken JIT support on Fedora 40
> On 14 Mar 2024, at 20:15, Pavel Stehule wrote: > build is ok, but regress tests fails with crash (it works without -with-llvm) Can you post some details around this crash? It doesn't seem to be a combination we have covered in the buildfarm. -- Daniel Gustafsson
Re: broken JIT support on Fedora 40
Hi čt 14. 3. 2024 v 19:20 odesílatel Robert Haas napsal: > On Wed, Mar 6, 2024 at 1:54 AM Pavel Stehule > wrote: > > after today update, the build with --with-llvm produces broken code, and > make check fails with crash > > > > Upgradehwdata-0.380-1.fc40.noarch > @updates-testing > > Upgraded hwdata-0.379-1.fc40.noarch > @@System > > Upgradellvm-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > > Upgraded llvm-17.0.6-6.fc40.x86_64 > @@System > > Upgradellvm-devel-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > > Upgraded llvm-devel-17.0.6-6.fc40.x86_64 > @@System > > Upgradellvm-googletest-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > > Upgraded llvm-googletest-17.0.6-6.fc40.x86_64 > @@System > > Upgradellvm-libs-18.1.0~rc4-1.fc40.i686 > @updates-testing > > Upgraded llvm-libs-17.0.6-6.fc40.i686 > @@System > > Upgradellvm-libs-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > > Upgraded llvm-libs-17.0.6-6.fc40.x86_64 > @@System > > Upgradellvm-static-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > > Upgraded llvm-static-17.0.6-6.fc40.x86_64 > @@System > > Upgradellvm-test-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > > Instalovat llvm17-libs-17.0.6-7.fc40.i686 > @updates-testing > > Instalovat llvm17-libs-17.0.6-7.fc40.x86_64 > @updates-testing > > I don't know anything about LLVM, but somehow I'm guessing that people > who do will need more detail than this in order to be able to fix the > problem. I see that support for LLVM 18 was added in commit > d282e88e50521a457fa1b36e55f43bac02a3167f on January 18th; perhaps you > were building from an older commit? > I repeated build and check today, fresh master, today fedora update with same result build is ok, but regress tests fails with crash (it works without -with-llvm) Regards Pavel > > -- > Robert Haas > EDB: http://www.enterprisedb.com >
Re: broken JIT support on Fedora 40
On Wed, Mar 6, 2024 at 1:54 AM Pavel Stehule wrote: > after today update, the build with --with-llvm produces broken code, and make > check fails with crash > > Upgradehwdata-0.380-1.fc40.noarch > @updates-testing > Upgraded hwdata-0.379-1.fc40.noarch @@System > Upgradellvm-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > Upgraded llvm-17.0.6-6.fc40.x86_64@@System > Upgradellvm-devel-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > Upgraded llvm-devel-17.0.6-6.fc40.x86_64 @@System > Upgradellvm-googletest-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > Upgraded llvm-googletest-17.0.6-6.fc40.x86_64 @@System > Upgradellvm-libs-18.1.0~rc4-1.fc40.i686 > @updates-testing > Upgraded llvm-libs-17.0.6-6.fc40.i686 @@System > Upgradellvm-libs-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > Upgraded llvm-libs-17.0.6-6.fc40.x86_64 @@System > Upgradellvm-static-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > Upgraded llvm-static-17.0.6-6.fc40.x86_64 @@System > Upgradellvm-test-18.1.0~rc4-1.fc40.x86_64 > @updates-testing > Instalovat llvm17-libs-17.0.6-7.fc40.i686 > @updates-testing > Instalovat llvm17-libs-17.0.6-7.fc40.x86_64 > @updates-testing I don't know anything about LLVM, but somehow I'm guessing that people who do will need more detail than this in order to be able to fix the problem. I see that support for LLVM 18 was added in commit d282e88e50521a457fa1b36e55f43bac02a3167f on January 18th; perhaps you were building from an older commit? -- Robert Haas EDB: http://www.enterprisedb.com
broken JIT support on Fedora 40
Hi after today update, the build with --with-llvm produces broken code, and make check fails with crash Upgradehwdata-0.380-1.fc40.noarch @updates-testing Upgraded hwdata-0.379-1.fc40.noarch @@System Upgradellvm-18.1.0~rc4-1.fc40.x86_64 @updates-testing Upgraded llvm-17.0.6-6.fc40.x86_64@@System Upgradellvm-devel-18.1.0~rc4-1.fc40.x86_64 @updates-testing Upgraded llvm-devel-17.0.6-6.fc40.x86_64 @@System Upgradellvm-googletest-18.1.0~rc4-1.fc40.x86_64 @updates-testing Upgraded llvm-googletest-17.0.6-6.fc40.x86_64 @@System Upgradellvm-libs-18.1.0~rc4-1.fc40.i686 @updates-testing Upgraded llvm-libs-17.0.6-6.fc40.i686 @@System Upgradellvm-libs-18.1.0~rc4-1.fc40.x86_64 @updates-testing Upgraded llvm-libs-17.0.6-6.fc40.x86_64 @@System Upgradellvm-static-18.1.0~rc4-1.fc40.x86_64 @updates-testing Upgraded llvm-static-17.0.6-6.fc40.x86_64 @@System Upgradellvm-test-18.1.0~rc4-1.fc40.x86_64 @updates-testing Instalovat llvm17-libs-17.0.6-7.fc40.i686 @updates-testing Instalovat llvm17-libs-17.0.6-7.fc40.x86_64 @updates-testing Regards Pavel