Re: broken JIT support on Fedora 40

2024-05-16 Thread Dmitry Dolgov
> On Tue, May 14, 2024 at 11:09:39AM -0400, Robert Haas wrote:
> On Tue, Apr 9, 2024 at 8:44 PM Thomas Munro  wrote:
> > I pushed the illegal attribute fix though.  Thanks for the detective work!
>
> This was commit 53c8d6c9f157f2bc8211b8de02417e55fefddbc7 and as I
> understand it that fixed the issue originally reported on this thread.
>
> Therefore, I have marked https://commitfest.postgresql.org/48/4917/ as
> Committed.
>
> If that's not correct, please feel free to fix. If there are other
> issues that need to be patched separately, please consider opening a
> new CF entry for those issues once a patch is available.

Thanks, that's correct. I think the only thing left is to add a verifier
pass, which everybody seems to be agreed is nice to have. The plan is to
add it only to master without back-patching. I assume Thomas did not
have time for that yet, so I've added the latest suggestions into his
patch, and going to open a CF item to not forget about it.
>From 67d955d9b69ea5204f02ace1a27ee4938b0dfd3a Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 9 Apr 2024 18:57:48 +1200
Subject: [PATCH v1] Run LLVM verify pass on all generated IR.

We fixed a recent problem that crashed LLVM 18, but Dmitry pointed out
that we'd have known about that all along if we'd automatically run the
verify pass on the IR we generate.

Turn that on in assertion builds.  That reveals one other complaint
about a name, which is also fixed here.

Suggested-by: Dmitry Dolgov <9erthali...@gmail.com>
Discussion: https://postgr.es/m/CAFj8pRACpVFr7LMdVYENUkScG5FCYMZDDdSGNU-tch%2Bw98OxYg%40mail.gmail.com
---
 src/backend/jit/llvm/llvmjit.c  | 5 +
 src/backend/jit/llvm/llvmjit_expr.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 1d439f24554..0bafe309bb6 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -714,6 +714,11 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 	LLVMPassBuilderOptionsSetDebugLogging(options, 1);
 #endif
 
+	/* In assertion builds, run the LLVM verify pass. */
+#ifdef USE_ASSERT_CHECKING
+	LLVMPassBuilderOptionsSetVerifyEach(options, true);
+#endif
+
 	LLVMPassBuilderOptionsSetInlinerThreshold(options, 512);
 
 	err = LLVMRunPasses(module, passes, NULL, options);
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 9e0efd26687..cd956fa7e38 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -2765,7 +2765,7 @@ create_LifetimeEnd(LLVMModuleRef mod)
 	LLVMContextRef lc;
 
 	/* variadic pointer argument */
-	const char *nm = "llvm.lifetime.end.p0i8";
+	const char *nm = "llvm.lifetime.end.p0";
 
 	fn = LLVMGetNamedFunction(mod, nm);
 	if (fn)

base-commit: a3e6c6f929912f928fa405909d17bcbf0c1b03ee
-- 
2.41.0



Re: broken JIT support on Fedora 40

2024-05-14 Thread Robert Haas
On Tue, Apr 9, 2024 at 8:44 PM Thomas Munro  wrote:
> I pushed the illegal attribute fix though.  Thanks for the detective work!

This was commit 53c8d6c9f157f2bc8211b8de02417e55fefddbc7 and as I
understand it that fixed the issue originally reported on this thread.

Therefore, I have marked https://commitfest.postgresql.org/48/4917/ as
Committed.

If that's not correct, please feel free to fix. If there are other
issues that need to be patched separately, please consider opening a
new CF entry for those issues once a patch is available.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: broken JIT support on Fedora 40

2024-04-10 Thread Andres Freund
Hi,

On 2024-04-10 22:15:27 +0200, Dmitry Dolgov wrote:
> > On Wed, Apr 10, 2024 at 12:43:23PM +1200, Thomas Munro wrote:
> > On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > > +   /* In assertion builds, run the LLVM verify pass. */
> > > +#ifdef USE_ASSERT_CHECKING
> > > +   LLVMPassBuilderOptionsSetVerifyEach(options, true);
> > > +#endif
> >
> > Thanks, that seems nicer.  I think the question is whether it will
> > slow down build farm/CI/local meson test runs to a degree that exceeds
> > its value.  Another option would be to have some other opt-in macro,
> > like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
> > JIT-related stuff to turn on.
> 
> Oh, I see. I'm afraid I don't have enough knowledge of the CI pipeline,
> but at least locally for me installcheck became only few percent slower
> with the verify pass.

I think it's worthwhile to add. It makes some problems so much easier to
find. And if you're building with debug enabled llvm, performance is so
atrocious anyway, that this isn't going to change the big picture...


> > Supposing we go with USE_ASSERT_CHECKING, I have another question:
> >
> > -   const char *nm = "llvm.lifetime.end.p0i8";
> > +   const char *nm = "llvm.lifetime.end.p0";
> >
> > Was that a mistake, or did the mangling rules change in some version?
> 
> I'm not sure, but inclined to think it's the same as with noundef -- a
> mistake, which was revealed in some recent version of LLVM. From what I
> understand the suffix i8 indicates an overloaded argument of that type,
> which is probably not needed. At the same time I can't get this error
> from the verify pass with LLVM-12 or LLVM-15 (I have those at hand by
> accident). To make it even more confusing I've found a few similar
> examples in other projects, where this was really triggered by an issue
> in LLVM [1].

I'm afraid that it actually has changed over time, I'm fairly sure that it was
required at some point.

Greetings,

Andres Freund




Re: broken JIT support on Fedora 40

2024-04-10 Thread Dmitry Dolgov
> On Wed, Apr 10, 2024 at 12:43:23PM +1200, Thomas Munro wrote:
> On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > +   /* In assertion builds, run the LLVM verify pass. */
> > +#ifdef USE_ASSERT_CHECKING
> > +   LLVMPassBuilderOptionsSetVerifyEach(options, true);
> > +#endif
>
> Thanks, that seems nicer.  I think the question is whether it will
> slow down build farm/CI/local meson test runs to a degree that exceeds
> its value.  Another option would be to have some other opt-in macro,
> like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
> JIT-related stuff to turn on.

Oh, I see. I'm afraid I don't have enough knowledge of the CI pipeline,
but at least locally for me installcheck became only few percent slower
with the verify pass.

> Supposing we go with USE_ASSERT_CHECKING, I have another question:
>
> -   const char *nm = "llvm.lifetime.end.p0i8";
> +   const char *nm = "llvm.lifetime.end.p0";
>
> Was that a mistake, or did the mangling rules change in some version?

I'm not sure, but inclined to think it's the same as with noundef -- a
mistake, which was revealed in some recent version of LLVM. From what I
understand the suffix i8 indicates an overloaded argument of that type,
which is probably not needed. At the same time I can't get this error
from the verify pass with LLVM-12 or LLVM-15 (I have those at hand by
accident). To make it even more confusing I've found a few similar
examples in other projects, where this was really triggered by an issue
in LLVM [1].

[1]: https://github.com/rust-lang/rust/issues/102738




Re: broken JIT support on Fedora 40

2024-04-10 Thread Pavel Stehule
st 10. 4. 2024 v 2:44 odesílatel Thomas Munro 
napsal:

> On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
> > +   /* In assertion builds, run the LLVM verify pass. */
> > +#ifdef USE_ASSERT_CHECKING
> > +   LLVMPassBuilderOptionsSetVerifyEach(options, true);
> > +#endif
>
> Thanks, that seems nicer.  I think the question is whether it will
> slow down build farm/CI/local meson test runs to a degree that exceeds
> its value.  Another option would be to have some other opt-in macro,
> like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
> JIT-related stuff to turn on.
>
> Supposing we go with USE_ASSERT_CHECKING, I have another question:
>
> -   const char *nm = "llvm.lifetime.end.p0i8";
> +   const char *nm = "llvm.lifetime.end.p0";
>
> Was that a mistake, or did the mangling rules change in some version?
> I don't currently feel inclined to go and test this on the ancient
> versions we claim to support in back-branches.  Perhaps we should just
> do this in master, and then it'd be limited to worrying about LLVM
> versions 10-18 (see 820b5af7), which have the distinct advantage of
> being available in package repositories for testing.  Or I suppose we
> could back-patch, but only do it if LLVM_VERSION_MAJOR >= 10.  Or we
> could do it unconditionally, and wait for ancient-LLVM build farm
> animals to break if they're going to.
>
> I pushed the illegal attribute fix though.  Thanks for the detective work!
>
> (It crossed my mind that perhaps deform functions should have their
> own template function, but if someone figures out that that's a good
> idea, I think we'll *still* need that change just pushed.)
>

all tests passed on fc 40 without problems

Thank you

Pavel


Re: broken JIT support on Fedora 40

2024-04-09 Thread Thomas Munro
On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> +   /* In assertion builds, run the LLVM verify pass. */
> +#ifdef USE_ASSERT_CHECKING
> +   LLVMPassBuilderOptionsSetVerifyEach(options, true);
> +#endif

Thanks, that seems nicer.  I think the question is whether it will
slow down build farm/CI/local meson test runs to a degree that exceeds
its value.  Another option would be to have some other opt-in macro,
like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
JIT-related stuff to turn on.

Supposing we go with USE_ASSERT_CHECKING, I have another question:

-   const char *nm = "llvm.lifetime.end.p0i8";
+   const char *nm = "llvm.lifetime.end.p0";

Was that a mistake, or did the mangling rules change in some version?
I don't currently feel inclined to go and test this on the ancient
versions we claim to support in back-branches.  Perhaps we should just
do this in master, and then it'd be limited to worrying about LLVM
versions 10-18 (see 820b5af7), which have the distinct advantage of
being available in package repositories for testing.  Or I suppose we
could back-patch, but only do it if LLVM_VERSION_MAJOR >= 10.  Or we
could do it unconditionally, and wait for ancient-LLVM build farm
animals to break if they're going to.

I pushed the illegal attribute fix though.  Thanks for the detective work!

(It crossed my mind that perhaps deform functions should have their
own template function, but if someone figures out that that's a good
idea, I think we'll *still* need that change just pushed.)




Re: broken JIT support on Fedora 40

2024-04-09 Thread Dmitry Dolgov
> On Tue, Apr 09, 2024 at 07:07:58PM +1200, Thomas Munro wrote:
> On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > > > Yep, I think this is it. I've spent some hours trying to understand why
> > > > suddenly deform function has noundef ret attribute, when it shouldn't --
> > > > this explains it and the proposed change fixes the crash. One thing that
> > > > is still not clear to me though is why this copied attribute doesn't
> > > > show up in the bitcode dumped right before running inline pass (I've
> > > > added this to troubleshoot the issue).
> > >
> > > One thing to consider in this context is indeed adding "verify" pass as
> > > suggested in the PR, at least for the debugging configuration. Without 
> > > the fix
> > > it immediately returns:
> > >
> > >   Running analysis: VerifierAnalysis on deform_0_1
> > >   Attribute 'noundef' applied to incompatible type!
> > >
> > >   llvm error: Broken function found, compilation aborted!
> >
> > Here is what I have in mind. Interestingly enough, it also shows few
> > more errors besides "noundef":
> >
> > Intrinsic name not mangled correctly for type arguments! Should be: 
> > llvm.lifetime.end.p0
> > ptr @llvm.lifetime.end.p0i8
> >
> > It refers to the function from create_LifetimeEnd, not sure how
> > important is this.
>
> Would it be too slow to run the verify pass always, in assertion
> builds?  Here's a patch for the original issue, and a patch to try
> that idea + a fix for that other complaint it spits out.  The latter
> would only run for LLVM 17+, but that seems OK.

Sounds like a good idea. About the overhead, I've done a quick test on the
reproducer at hands, doing explain analyze in a tight loop and fetching
"optimization" timinigs. It gives quite visible difference 96ms p99 with verify
vs 46ms p99 without verify (and a rather low stddev, ~1.5ms). But I can
imagine it's acceptable for a build with assertions.

Btw, I've found there is a C-api for this exposed, which produces the same
warnings for me. Maybe it would be even better this way:

/**
 * Toggle adding the VerifierPass for the PassBuilder, ensuring all 
functions
 * inside the module is valid.
 */
void LLVMPassBuilderOptionsSetVerifyEach(LLVMPassBuilderOptionsRef 
Options,

 LLVMBool VerifyEach);


+   /* In assertion builds, run the LLVM verify pass. */
+#ifdef USE_ASSERT_CHECKING
+   LLVMPassBuilderOptionsSetVerifyEach(options, true);
+#endif





Re: broken JIT support on Fedora 40

2024-04-09 Thread Thomas Munro
On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > > Yep, I think this is it. I've spent some hours trying to understand why
> > > suddenly deform function has noundef ret attribute, when it shouldn't --
> > > this explains it and the proposed change fixes the crash. One thing that
> > > is still not clear to me though is why this copied attribute doesn't
> > > show up in the bitcode dumped right before running inline pass (I've
> > > added this to troubleshoot the issue).
> >
> > One thing to consider in this context is indeed adding "verify" pass as
> > suggested in the PR, at least for the debugging configuration. Without the 
> > fix
> > it immediately returns:
> >
> >   Running analysis: VerifierAnalysis on deform_0_1
> >   Attribute 'noundef' applied to incompatible type!
> >
> >   llvm error: Broken function found, compilation aborted!
>
> Here is what I have in mind. Interestingly enough, it also shows few
> more errors besides "noundef":
>
> Intrinsic name not mangled correctly for type arguments! Should be: 
> llvm.lifetime.end.p0
> ptr @llvm.lifetime.end.p0i8
>
> It refers to the function from create_LifetimeEnd, not sure how
> important is this.

Would it be too slow to run the verify pass always, in assertion
builds?  Here's a patch for the original issue, and a patch to try
that idea + a fix for that other complaint it spits out.  The latter
would only run for LLVM 17+, but that seems OK.
From 57af42d9a1b47b7361c7200a17a210f2ca37bd5d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 9 Apr 2024 18:10:30 +1200
Subject: [PATCH 1/2] Fix illegal attribute LLVM attribute propagation.

Commit 72559438 started copying more attributes from AttributeTemplate
to the functions we generate on the fly.  In the case of deform
functions, which return void, this meant that "noundef", from
AttributeTemplate's return value (a Datum) was copied to a void type.
Older LLVM releases were OK with that, but LLVM 18 crashes.  "noundef"
is rejected for void by the verify pass, which would have alerted us to
this problem (something we'll consider checking automatically in a later
commit).

Update our llvm_copy_attributes() function to skip copying the attribute
for the return value, if the target function returns void.

Thanks to Dmitry Dolgov for help chasing this down.

Back-patch to all supported releases, like 72559438.

Reported-by: Pavel Stehule 
Reviewed-by: Dmitry Dolgov <9erthali...@gmail.com>
Discussion: https://postgr.es/m/CAFj8pRACpVFr7LMdVYENUkScG5FCYMZDDdSGNU-tch%2Bw98OxYg%40mail.gmail.com
---
 src/backend/jit/llvm/llvmjit.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index ec0fdd49324..92b4993a98a 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -552,8 +552,11 @@ llvm_copy_attributes(LLVMValueRef v_from, LLVMValueRef v_to)
 	/* copy function attributes */
 	llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeFunctionIndex);
 
-	/* and the return value attributes */
-	llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex);
+	if (LLVMGetTypeKind(LLVMGetFunctionReturnType(v_to)) != LLVMVoidTypeKind)
+	{
+		/* and the return value attributes */
+		llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex);
+	}
 
 	/* and each function parameter's attribute */
 	param_count = LLVMCountParams(v_from);
-- 
2.44.0

From 91d8b747686aa07341337e1cd0b7c2244e955adb Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 9 Apr 2024 18:57:48 +1200
Subject: [PATCH 2/2] Run LLVM verify pass on all generated IR.

We fixed a recent problem that crashed LLVM 18, but Dmitry pointed out
that we'd have known about that all along if we'd automatically run the
verify pass on the IR we generate.

Turn that on in assertion builds.  That reveals one other complaint
about a name, which is also fixed here.

Suggested-by: Dmitry Dolgov <9erthali...@gmail.com>
Discussion: https://postgr.es/m/CAFj8pRACpVFr7LMdVYENUkScG5FCYMZDDdSGNU-tch%2Bw98OxYg%40mail.gmail.com
---
 src/backend/jit/llvm/llvmjit.c  | 11 +--
 src/backend/jit/llvm/llvmjit_expr.c |  2 +-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 92b4993a98a..bc429e970f2 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -703,10 +703,17 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 	LLVMErrorRef err;
 	const char *passes;
 
+	/* In assertion builds, run the LLVM verify pass. */
+#ifdef USE_ASSERT_CHECKING
+#define PASSES_PREFIX "verify,"
+#else
+#define PASSES_PREFIX ""
+#endif
+
 	if (context->base.flags & PGJIT_OPT3)
-		passes = "default";
+		passes = PASSES_PREFIX "default";
 	else
-		passes = "default,mem2reg";
+		passes = PASSES_PREFIX "default,mem2reg";
 
 	options = LLVMCreatePassBuilderOptions();
 
diff --git 

Re: broken JIT support on Fedora 40

2024-04-05 Thread Dmitry Dolgov
> On Fri, Apr 05, 2024 at 03:50:50PM +0200, Dmitry Dolgov wrote:
> > On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote:
> > > On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
> > > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro  
> > > wrote:
> > > > https://github.com/llvm/llvm-project/pull/87093
> > >
> > > Oh, with those clues, I think I might see...  It is a bit strange that
> > > we copy attributes from AttributeTemplate(), a function that returns
> > > Datum, to our void deform function.  It works (I mean doesn't crash)
> > > if you just comment this line out:
> > >
> > > llvm_copy_attributes(AttributeTemplate, v_deform_fn);
> > >
> > > ... but I guess that disables inlining of the deform function?  So
> > > perhaps we just need to teach that thing not to try to copy the return
> > > value's attributes, which also seems to work here:
> >
> > Yep, I think this is it. I've spent some hours trying to understand why
> > suddenly deform function has noundef ret attribute, when it shouldn't --
> > this explains it and the proposed change fixes the crash. One thing that
> > is still not clear to me though is why this copied attribute doesn't
> > show up in the bitcode dumped right before running inline pass (I've
> > added this to troubleshoot the issue).
>
> One thing to consider in this context is indeed adding "verify" pass as
> suggested in the PR, at least for the debugging configuration. Without the fix
> it immediately returns:
>
>   Running analysis: VerifierAnalysis on deform_0_1
>   Attribute 'noundef' applied to incompatible type!
>
>   llvm error: Broken function found, compilation aborted!

Here is what I have in mind. Interestingly enough, it also shows few
more errors besides "noundef":

Intrinsic name not mangled correctly for type arguments! Should be: 
llvm.lifetime.end.p0
ptr @llvm.lifetime.end.p0i8

It refers to the function from create_LifetimeEnd, not sure how
important is this.
>From 7a05bd48a4270998a08e63e07cdf1cb9932bfddd Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Fri, 5 Apr 2024 17:52:06 +0200
Subject: [PATCH v1] Add jit_verify_bitcode

When passing LLVM IR through optimizer, for troubleshooting purposes
it's useful to apply the "verify" pass as well. It can catch an invalid
IR, if such was produced by mistake, and output a meaningful message
about the error, e.g.:

Attribute 'noundef' applied to incompatible type!
ptr @deform_0_1

Control this via jit_verify_bitcode development guc.
---
 doc/src/sgml/config.sgml| 19 +++
 src/backend/jit/jit.c   |  1 +
 src/backend/jit/llvm/llvmjit.c  | 11 ---
 src/backend/utils/misc/guc_tables.c | 11 +++
 src/include/jit/jit.h   |  2 +-
 5 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 624518e0b01..5f07ae099f4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11931,6 +11931,25 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
   
  
 
+ 
+  jit_verify_bitcode (boolean)
+  
+   jit_verify_bitcode configuration parameter
+  
+  
+  
+   
+Adds a verify pass to the pipeline, when optimizing
+generated LLVM IR. It helps to identify
+cases when an invalid IR was generated due to errors, and is only
+useful for working on the internals of the JIT implementation.
+The default setting is off.
+Only superusers and users with the appropriate SET
+privilege can change this setting.
+   
+  
+ 
+
  
   jit_expressions (boolean)
   
diff --git a/src/backend/jit/jit.c b/src/backend/jit/jit.c
index 815b58f33c5..ad94e4b55ee 100644
--- a/src/backend/jit/jit.c
+++ b/src/backend/jit/jit.c
@@ -39,6 +39,7 @@ bool		jit_tuple_deforming = true;
 double		jit_above_cost = 10;
 double		jit_inline_above_cost = 50;
 double		jit_optimize_above_cost = 50;
+bool		jit_verify_bitcode = false;
 
 static JitProviderCallbacks provider;
 static bool provider_successfully_loaded = false;
diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index ec0fdd49324..a4b21abb83a 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -698,12 +698,17 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 #else
 	LLVMPassBuilderOptionsRef options;
 	LLVMErrorRef err;
-	const char *passes;
+	const char *passes, *intermediate_passes;
 
 	if (context->base.flags & PGJIT_OPT3)
-		passes = "default";
+		intermediate_passes = "default";
 	else
-		passes = "default,mem2reg";
+		intermediate_passes = "default,mem2reg";
+
+	if (jit_verify_bitcode)
+		passes = psprintf("verify,%s", intermediate_passes);
+	else
+		passes = intermediate_passes;
 
 	options = LLVMCreatePassBuilderOptions();
 
diff --git 

Re: broken JIT support on Fedora 40

2024-04-05 Thread Dmitry Dolgov
> On Fri, Apr 05, 2024 at 03:21:06PM +0200, Dmitry Dolgov wrote:
> > On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
> > On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro  
> > wrote:
> > > https://github.com/llvm/llvm-project/pull/87093
> >
> > Oh, with those clues, I think I might see...  It is a bit strange that
> > we copy attributes from AttributeTemplate(), a function that returns
> > Datum, to our void deform function.  It works (I mean doesn't crash)
> > if you just comment this line out:
> >
> > llvm_copy_attributes(AttributeTemplate, v_deform_fn);
> >
> > ... but I guess that disables inlining of the deform function?  So
> > perhaps we just need to teach that thing not to try to copy the return
> > value's attributes, which also seems to work here:
>
> Yep, I think this is it. I've spent some hours trying to understand why
> suddenly deform function has noundef ret attribute, when it shouldn't --
> this explains it and the proposed change fixes the crash. One thing that
> is still not clear to me though is why this copied attribute doesn't
> show up in the bitcode dumped right before running inline pass (I've
> added this to troubleshoot the issue).

One thing to consider in this context is indeed adding "verify" pass as
suggested in the PR, at least for the debugging configuration. Without the fix
it immediately returns:

Running analysis: VerifierAnalysis on deform_0_1
Attribute 'noundef' applied to incompatible type!

llvm error: Broken function found, compilation aborted!




Re: broken JIT support on Fedora 40

2024-04-05 Thread Dmitry Dolgov
> On Sat, Apr 06, 2024 at 02:00:38AM +1300, Thomas Munro wrote:
> On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro  wrote:
> > https://github.com/llvm/llvm-project/pull/87093
>
> Oh, with those clues, I think I might see...  It is a bit strange that
> we copy attributes from AttributeTemplate(), a function that returns
> Datum, to our void deform function.  It works (I mean doesn't crash)
> if you just comment this line out:
>
> llvm_copy_attributes(AttributeTemplate, v_deform_fn);
>
> ... but I guess that disables inlining of the deform function?  So
> perhaps we just need to teach that thing not to try to copy the return
> value's attributes, which also seems to work here:

Yep, I think this is it. I've spent some hours trying to understand why
suddenly deform function has noundef ret attribute, when it shouldn't --
this explains it and the proposed change fixes the crash. One thing that
is still not clear to me though is why this copied attribute doesn't
show up in the bitcode dumped right before running inline pass (I've
added this to troubleshoot the issue).




Re: broken JIT support on Fedora 40

2024-04-05 Thread Thomas Munro
On Sun, Mar 31, 2024 at 12:49 PM Thomas Munro  wrote:
> https://github.com/llvm/llvm-project/pull/87093

Oh, with those clues, I think I might see...  It is a bit strange that
we copy attributes from AttributeTemplate(), a function that returns
Datum, to our void deform function.  It works (I mean doesn't crash)
if you just comment this line out:

llvm_copy_attributes(AttributeTemplate, v_deform_fn);

... but I guess that disables inlining of the deform function?  So
perhaps we just need to teach that thing not to try to copy the return
value's attributes, which also seems to work here:

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index ec0fdd49324..92b4993a98a 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -552,8 +552,11 @@ llvm_copy_attributes(LLVMValueRef v_from,
LLVMValueRef v_to)
/* copy function attributes */
llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeFunctionIndex);

-   /* and the return value attributes */
-   llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex);
+   if (LLVMGetTypeKind(LLVMGetFunctionReturnType(v_to)) !=
LLVMVoidTypeKind)
+   {
+   /* and the return value attributes */
+   llvm_copy_attributes_at_index(v_from, v_to,
LLVMAttributeReturnIndex);
+   }




Re: broken JIT support on Fedora 40

2024-03-30 Thread Thomas Munro
On Sun, Mar 31, 2024 at 5:59 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> Yeah, sorry, I'm a bit baffled about this situation myself. Yesterday
> I've opened a one-line PR fix that should address the issue, maybe this
> would help. In the meantime I've attached what did work for me as a
> workaround -- it essentially just makes the deform function to return
> some value. It's ugly, but since call site will ignore that, and it's
> only one occasion where LLVMBuildRetVoid is used, maybe it's acceptable.
> Give me a moment, I'm going to test this change more (waiting on
> rebuilding LLVM, it takes quire a while on my machine :( ), then can
> confirm that it works as expected on the latest version.

Great news.  I see your PR:

https://github.com/llvm/llvm-project/pull/87093

Checking their release schedule, they have:

Mar  5th: 18.1.0 was released
Mar  8th: 18.1.1 was released
Mar 19th: 18.1.2 was released
Apr 16th: 18.1.3
Apr 30th: 18.1.4
May 14th: 18.1.5
May 28th: 18.1.6 (if necessary)

Our next release is May 9th.  So assuming your PR goes in in the next
couple of weeks and makes it into their 18.1.3 or even .4, there is no
point in pushing a work-around on our side.




Re: broken JIT support on Fedora 40

2024-03-30 Thread Dmitry Dolgov
> On Sat, Mar 30, 2024 at 04:38:11PM +1300, Thomas Munro wrote:
> On Fri, Mar 22, 2024 at 7:15 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > > For verification, I've modified the deform.outblock to call LLVMBuildRet
> > > instead of LLVMBuildRetVoid and this seems to help -- inline and deform
> > > stages are still performed as before, but nothing crashes. But of course
> > > it doesn't sound right that inlining pass cannot process such code.
>
> Thanks for investigating and filing the issue.  It doesn't seem to be
> moving yet.  Do you want to share the LLVMBuildRet() workaround?
> Maybe we need to consider shipping something like that in the
> meantime?

Yeah, sorry, I'm a bit baffled about this situation myself. Yesterday
I've opened a one-line PR fix that should address the issue, maybe this
would help. In the meantime I've attached what did work for me as a
workaround -- it essentially just makes the deform function to return
some value. It's ugly, but since call site will ignore that, and it's
only one occasion where LLVMBuildRetVoid is used, maybe it's acceptable.
Give me a moment, I'm going to test this change more (waiting on
rebuilding LLVM, it takes quire a while on my machine :( ), then can
confirm that it works as expected on the latest version.
>From 83e37220cc3e8cb04b0d8a608240fdde4b003713 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthali...@gmail.com>
Date: Sat, 30 Mar 2024 17:45:38 +0100
Subject: [PATCH v1] Workaround for deform crashing in LLVM inliner

Deform function returns via the outblock that contains a "ret void"
instruction build with LLVMBuildRetVoid. This triggers a bug in the
latest LLVM 18.1, when the inliner pass is applied to the function [1].
Avoid the issue via introducing a fake return variable, which will be
ignored on the call site.

[1]: https://github.com/llvm/llvm-project/issues/86162
---
 src/backend/jit/llvm/llvmjit_deform.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index b07f8e7f756..7ebc7b6d731 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -761,6 +761,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 	{
 		LLVMValueRef v_off = l_load(b, TypeSizeT, v_offp, "");
 		LLVMValueRef v_flags;
+		LLVMValueRef v_fake_ret = l_sizet_const(0);
 
 		LLVMBuildStore(b, l_int16_const(lc, natts), v_nvalidp);
 		v_off = LLVMBuildTrunc(b, v_off, LLVMInt32TypeInContext(lc), "");
@@ -768,7 +769,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 		v_flags = l_load(b, LLVMInt16TypeInContext(lc), v_flagsp, "tts_flags");
 		v_flags = LLVMBuildOr(b, v_flags, l_int16_const(lc, TTS_FLAG_SLOW), "");
 		LLVMBuildStore(b, v_flags, v_flagsp);
-		LLVMBuildRetVoid(b);
+		LLVMBuildRet(b, v_fake_ret);
 	}
 
 	LLVMDisposeBuilder(b);

base-commit: 7eb9a8201890f3b208fd4c109a5b08bf139b692a
-- 
2.41.0



Re: broken JIT support on Fedora 40

2024-03-29 Thread Thomas Munro
On Fri, Mar 22, 2024 at 7:15 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > For verification, I've modified the deform.outblock to call LLVMBuildRet
> > instead of LLVMBuildRetVoid and this seems to help -- inline and deform
> > stages are still performed as before, but nothing crashes. But of course
> > it doesn't sound right that inlining pass cannot process such code.

Thanks for investigating and filing the issue.  It doesn't seem to be
moving yet.  Do you want to share the LLVMBuildRet() workaround?
Maybe we need to consider shipping something like that in the
meantime?




Re: broken JIT support on Fedora 40

2024-03-21 Thread Dmitry Dolgov
> On Sun, Mar 17, 2024 at 09:02:08PM +0100, Dmitry Dolgov wrote:
> > On Fri, Mar 15, 2024 at 01:54:38PM +1300, Thomas Munro wrote:
> > For me it seems that the LLVMRunPasses() call, new in
> >
> > commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
> > Author: Thomas Munro 
> > Date:   Wed Oct 18 22:15:54 2023 +1300
> >
> > jit: Changes for LLVM 17.
> >
> > is reaching code that segfaults inside libLLVM, specifically in
> > llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
> > llvm::AAResults*, bool, llvm::Function*).  First obvious question
> > would be: is that NULL argument still acceptable?  Perhaps it wants
> > our LLVMTargetMachineRef there:
> >
> >err = LLVMRunPasses(module, passes, NULL, options);
> >
> > But then when we see what is does with that argument, it arrives at a
> > place that apparently accepts nullptr.
> >
> > https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56
> > https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124
> >
> > Hrmph.  Might need an assertion build to learn more.  I'll try to look
> > again next week or so.
>
> Looks like I can reproduce this as well, libLLVM crashes when reaching
> AddReturnAttributes inside InlineFunction, when trying to access
> operands of the return instruction. I think, for whatever reason, the
> latest LLVM doesn't like (i.e. do not expect this when performing
> inlining pass) return instructions that do not have a return value, and
> this is what happens in the outblock of deform function we generate
> (slot_compile_deform).
>
> For verification, I've modified the deform.outblock to call LLVMBuildRet
> instead of LLVMBuildRetVoid and this seems to help -- inline and deform
> stages are still performed as before, but nothing crashes. But of course
> it doesn't sound right that inlining pass cannot process such code.
> Unfortunately I don't see any obvious change in the recent LLVM history
> that would justify this outcome, might be a genuine bug, will
> investigate further.

I think I found the change that got it all started [1], the commit has a
set of tags like 18.1.0-rc1 and is relatively recent. The message
doesn't say anything related to the crash that we see, so I assume it's
indeed a bug. I've opened an issue to confirm this understanding [2]
(wow, issues were indeed moved to github since the last time I was
touching LLVM), let's see what would be the response.

[1]: 
https://github.com/llvm/llvm-project/commit/2da4960f20f7e5d88a68ce25636a895284dc66d8
[2]: https://github.com/llvm/llvm-project/issues/86162




Re: broken JIT support on Fedora 40

2024-03-17 Thread Dmitry Dolgov
> On Fri, Mar 15, 2024 at 01:54:38PM +1300, Thomas Munro wrote:
> For me it seems that the LLVMRunPasses() call, new in
>
> commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
> Author: Thomas Munro 
> Date:   Wed Oct 18 22:15:54 2023 +1300
>
> jit: Changes for LLVM 17.
>
> is reaching code that segfaults inside libLLVM, specifically in
> llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
> llvm::AAResults*, bool, llvm::Function*).  First obvious question
> would be: is that NULL argument still acceptable?  Perhaps it wants
> our LLVMTargetMachineRef there:
>
>err = LLVMRunPasses(module, passes, NULL, options);
>
> But then when we see what is does with that argument, it arrives at a
> place that apparently accepts nullptr.
>
> https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56
> https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124
>
> Hrmph.  Might need an assertion build to learn more.  I'll try to look
> again next week or so.

Looks like I can reproduce this as well, libLLVM crashes when reaching
AddReturnAttributes inside InlineFunction, when trying to access
operands of the return instruction. I think, for whatever reason, the
latest LLVM doesn't like (i.e. do not expect this when performing
inlining pass) return instructions that do not have a return value, and
this is what happens in the outblock of deform function we generate
(slot_compile_deform).

For verification, I've modified the deform.outblock to call LLVMBuildRet
instead of LLVMBuildRetVoid and this seems to help -- inline and deform
stages are still performed as before, but nothing crashes. But of course
it doesn't sound right that inlining pass cannot process such code.
Unfortunately I don't see any obvious change in the recent LLVM history
that would justify this outcome, might be a genuine bug, will
investigate further.




Re: broken JIT support on Fedora 40

2024-03-14 Thread Thomas Munro
For me it seems that the LLVMRunPasses() call, new in

commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
Author: Thomas Munro 
Date:   Wed Oct 18 22:15:54 2023 +1300

jit: Changes for LLVM 17.

is reaching code that segfaults inside libLLVM, specifically in
llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
llvm::AAResults*, bool, llvm::Function*).  First obvious question
would be: is that NULL argument still acceptable?  Perhaps it wants
our LLVMTargetMachineRef there:

   err = LLVMRunPasses(module, passes, NULL, options);

But then when we see what is does with that argument, it arrives at a
place that apparently accepts nullptr.

https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56
https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124

Hrmph.  Might need an assertion build to learn more.  I'll try to look
again next week or so.




Re: broken JIT support on Fedora 40

2024-03-14 Thread Thomas Munro
On Fri, Mar 15, 2024 at 12:27 PM Daniel Gustafsson  wrote:
> > On 14 Mar 2024, at 20:15, Pavel Stehule  wrote:
>
> > build is ok, but regress tests fails with crash (it works without 
> > -with-llvm)
>
> Can you post some details around this crash?  It doesn't seem to be a
> combination we have covered in the buildfarm.

Yeah, 18.1 (note they switched to 1-based minor numbers, there was no
18.0) just came out a week or so ago.  Despite testing their 18 branch
just before their "RC1" tag, as recently as

commit d282e88e50521a457fa1b36e55f43bac02a3167f
Author: Thomas Munro 
Date:   Thu Jan 25 10:37:35 2024 +1300

Track LLVM 18 changes.

at which point everything worked, it seems that something changed
before they released.  I haven't looked into why yet but it's crashing
on my FreeBSD box too.




Re: broken JIT support on Fedora 40

2024-03-14 Thread Daniel Gustafsson
> On 14 Mar 2024, at 20:15, Pavel Stehule  wrote:

> build is ok, but regress tests fails with crash (it works without -with-llvm)

Can you post some details around this crash?  It doesn't seem to be a
combination we have covered in the buildfarm.

--
Daniel Gustafsson





Re: broken JIT support on Fedora 40

2024-03-14 Thread Pavel Stehule
Hi

čt 14. 3. 2024 v 19:20 odesílatel Robert Haas 
napsal:

> On Wed, Mar 6, 2024 at 1:54 AM Pavel Stehule 
> wrote:
> > after today update, the build with --with-llvm produces broken code, and
> make check fails with crash
> >
> > Upgradehwdata-0.380-1.fc40.noarch
>  @updates-testing
> > Upgraded   hwdata-0.379-1.fc40.noarch
>  @@System
> > Upgradellvm-18.1.0~rc4-1.fc40.x86_64
> @updates-testing
> > Upgraded   llvm-17.0.6-6.fc40.x86_64
> @@System
> > Upgradellvm-devel-18.1.0~rc4-1.fc40.x86_64
> @updates-testing
> > Upgraded   llvm-devel-17.0.6-6.fc40.x86_64
> @@System
> > Upgradellvm-googletest-18.1.0~rc4-1.fc40.x86_64
>  @updates-testing
> > Upgraded   llvm-googletest-17.0.6-6.fc40.x86_64
>  @@System
> > Upgradellvm-libs-18.1.0~rc4-1.fc40.i686
>  @updates-testing
> > Upgraded   llvm-libs-17.0.6-6.fc40.i686
>  @@System
> > Upgradellvm-libs-18.1.0~rc4-1.fc40.x86_64
>  @updates-testing
> > Upgraded   llvm-libs-17.0.6-6.fc40.x86_64
>  @@System
> > Upgradellvm-static-18.1.0~rc4-1.fc40.x86_64
>  @updates-testing
> > Upgraded   llvm-static-17.0.6-6.fc40.x86_64
>  @@System
> > Upgradellvm-test-18.1.0~rc4-1.fc40.x86_64
>  @updates-testing
> > Instalovat llvm17-libs-17.0.6-7.fc40.i686
>  @updates-testing
> > Instalovat llvm17-libs-17.0.6-7.fc40.x86_64
>  @updates-testing
>
> I don't know anything about LLVM, but somehow I'm guessing that people
> who do will need more detail than this in order to be able to fix the
> problem. I see that support for LLVM 18 was added in commit
> d282e88e50521a457fa1b36e55f43bac02a3167f on January 18th; perhaps you
> were building from an older commit?
>

I repeated build and check today, fresh master, today fedora update with
same result

build is ok, but regress tests fails with crash (it works without
-with-llvm)

Regards

Pavel






>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>


Re: broken JIT support on Fedora 40

2024-03-14 Thread Robert Haas
On Wed, Mar 6, 2024 at 1:54 AM Pavel Stehule  wrote:
> after today update, the build with --with-llvm produces broken code, and make 
> check fails with crash
>
> Upgradehwdata-0.380-1.fc40.noarch   
> @updates-testing
> Upgraded   hwdata-0.379-1.fc40.noarch   @@System
> Upgradellvm-18.1.0~rc4-1.fc40.x86_64
> @updates-testing
> Upgraded   llvm-17.0.6-6.fc40.x86_64@@System
> Upgradellvm-devel-18.1.0~rc4-1.fc40.x86_64  
> @updates-testing
> Upgraded   llvm-devel-17.0.6-6.fc40.x86_64  @@System
> Upgradellvm-googletest-18.1.0~rc4-1.fc40.x86_64 
> @updates-testing
> Upgraded   llvm-googletest-17.0.6-6.fc40.x86_64 @@System
> Upgradellvm-libs-18.1.0~rc4-1.fc40.i686 
> @updates-testing
> Upgraded   llvm-libs-17.0.6-6.fc40.i686 @@System
> Upgradellvm-libs-18.1.0~rc4-1.fc40.x86_64   
> @updates-testing
> Upgraded   llvm-libs-17.0.6-6.fc40.x86_64   @@System
> Upgradellvm-static-18.1.0~rc4-1.fc40.x86_64 
> @updates-testing
> Upgraded   llvm-static-17.0.6-6.fc40.x86_64 @@System
> Upgradellvm-test-18.1.0~rc4-1.fc40.x86_64   
> @updates-testing
> Instalovat llvm17-libs-17.0.6-7.fc40.i686   
> @updates-testing
> Instalovat llvm17-libs-17.0.6-7.fc40.x86_64 
> @updates-testing

I don't know anything about LLVM, but somehow I'm guessing that people
who do will need more detail than this in order to be able to fix the
problem. I see that support for LLVM 18 was added in commit
d282e88e50521a457fa1b36e55f43bac02a3167f on January 18th; perhaps you
were building from an older commit?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




broken JIT support on Fedora 40

2024-03-05 Thread Pavel Stehule
Hi

after today update, the build with --with-llvm produces broken code, and
make check fails with crash

Upgradehwdata-0.380-1.fc40.noarch
@updates-testing
Upgraded   hwdata-0.379-1.fc40.noarch   @@System
Upgradellvm-18.1.0~rc4-1.fc40.x86_64
 @updates-testing
Upgraded   llvm-17.0.6-6.fc40.x86_64@@System
Upgradellvm-devel-18.1.0~rc4-1.fc40.x86_64
 @updates-testing
Upgraded   llvm-devel-17.0.6-6.fc40.x86_64  @@System
Upgradellvm-googletest-18.1.0~rc4-1.fc40.x86_64
@updates-testing
Upgraded   llvm-googletest-17.0.6-6.fc40.x86_64 @@System
Upgradellvm-libs-18.1.0~rc4-1.fc40.i686
@updates-testing
Upgraded   llvm-libs-17.0.6-6.fc40.i686 @@System
Upgradellvm-libs-18.1.0~rc4-1.fc40.x86_64
@updates-testing
Upgraded   llvm-libs-17.0.6-6.fc40.x86_64   @@System
Upgradellvm-static-18.1.0~rc4-1.fc40.x86_64
@updates-testing
Upgraded   llvm-static-17.0.6-6.fc40.x86_64 @@System
Upgradellvm-test-18.1.0~rc4-1.fc40.x86_64
@updates-testing
Instalovat llvm17-libs-17.0.6-7.fc40.i686
@updates-testing
Instalovat llvm17-libs-17.0.6-7.fc40.x86_64
@updates-testing

Regards

Pavel