Don't clean up LLVM state when exiting in a bad way

2021-08-18 Thread Jelte Fennema
Hi,

I ran into some segfaults when using Postgres that was compiled with LLVM 7. 
According to the backtraces these crashes happened during the call to 
llvm_shutdown, during cleanup after another out of memory condition. It seems 
that calls to LLVMOrcDisposeInstance, can crash (at least on LLVM 7) when LLVM 
is left in bad state. I attached the relevant part of the stacktrace to this 
email.

With the attached patch these segfaults went away. The patch turns 
llvm_shutdown into a no-op whenever the backend is exiting with an error. Based 
on my understanding of the code this should be totally fine. No memory should 
be leaked, since all memory will be cleaned up anyway once the backend exits 
shortly after. The only reason this cleanup code even seems to exist at all is 
to get useful LLVM profiling data. To me it seems be acceptable if the 
profiling data is incorrect/missing when the backend exits with an error.

Jelte


0001-Skip-LLVM-shutdown-when-bad-exit.patch
Description: 0001-Skip-LLVM-shutdown-when-bad-exit.patch
#0  notifyFreed (K=, Obj=..., this=) at 
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:485
#1  operator() (K=, Obj=..., __closure=) at 
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:226
#2  std::_Function_handler > ()>)::{lambda(unsigned 
long, llvm::object::ObjectFile const&)#3}>::_M_invoke(std::_Any_data const&, 
unsigned long, llvm::object::ObjectFile const&) (__functor=..., 
__args#0=, __args#1=...) at 
/usr/include/c++/4.8.2/functional:2071
#3  0x7fa1697e1578 in operator() (__args#1=..., __args#0=, 
this=) at /usr/include/c++/4.8.2/functional:2471
#4  ~ConcreteLinkedObject (this=0x2766920, __in_chrg=) at 
/usr/src/debug/llvm-7.0.1.src/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h:179
#5  
llvm::orc::RTDyldObjectLinkingLayer::ConcreteLinkedObject
 >::~ConcreteLinkedObject (this=0x2766920, __in_chrg=) at 
/usr/src/debug/llvm-7.0.1.src/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h:182
#6  0x7fa1697e17aa in operator() (this=, __ptr=) at /usr/include/c++/4.8.2/bits/unique_ptr.h:67
#7  ~unique_ptr (this=0x27a9848, __in_chrg=) at 
/usr/include/c++/4.8.2/bits/unique_ptr.h:184
#8  ~pair (this=0x27a9840, __in_chrg=) at 
/usr/include/c++/4.8.2/bits/stl_pair.h:96
#9  ~_Rb_tree_node (this=0x27a9820, __in_chrg=) at 
/usr/include/c++/4.8.2/bits/stl_tree.h:131
#10 destroy > > > 
(this=, __p=0x27a9820) at 
/usr/include/c++/4.8.2/ext/new_allocator.h:124
#11 _M_destroy_node (this=0x25611b0, __p=0x27a9820) at 
/usr/include/c++/4.8.2/bits/stl_tree.h:421
#12 std::_Rb_tree > >, 
std::_Select1st > > 
>, std::less, std::allocator > > 
> >::_M_erase (this=this@entry=0x25611b0, __x=0x27a9820) at 
/usr/include/c++/4.8.2/bits/stl_tree.h:1127
#13 0x7fa1697edc91 in ~_Rb_tree (this=0x25611b0, __in_chrg=) 
at /usr/include/c++/4.8.2/bits/stl_tree.h:671
#14 ~map (this=0x25611b0, __in_chrg=) at 
/usr/include/c++/4.8.2/bits/stl_map.h:96
#15 ~RTDyldObjectLinkingLayer (this=0x25611a8, __in_chrg=) at 
/usr/src/debug/llvm-7.0.1.src/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h:140
#16 llvm::OrcCBindingsStack::~OrcCBindingsStack (this=0x2560380, 
__in_chrg=) at 
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:106
#17 0x7fa1697edfaa in LLVMOrcDisposeInstance (JITStack=0x2560380) at 
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindings.cpp:126
#18 0x7fa16b7bb1a1 in llvm_shutdown (code=, arg=) at llvmjit.c:885
#19 0x0076bf75 in shmem_exit (code=code@entry=1) at ipc.c:239
#20 0x0076c0a7 in proc_exit_prepare (code=code@entry=1) at ipc.c:194
#21 0x0076c128 in proc_exit (code=code@entry=1) at ipc.c:107
#22 0x008ab16c in errfinish (filename=, 
lineno=, funcname=0x7fa16b7ca9f0 
 
"fatal_llvm_error_handler") at elog.c:578
#23 0x7fa16b7bc4c3 in fatal_llvm_error_handler(void*, std::string const&, 
bool) () from /usr/pgsql-13/lib/llvmjit.so
#24 0x7fa168666e85 in llvm::report_fatal_error (Reason=..., 
GenCrashDiag=) at 
/usr/src/debug/llvm-7.0.1.src/lib/Support/ErrorHandling.cpp:108
#25 0x7fa168666fd8 in llvm::report_fatal_error 
(Reason=Reason@entry=0x7fa16a3b05f0 "Unable to allocate section memory!", 
GenCrashDiag=GenCrashDiag@entry=true) at 
/usr/src/debug/llvm-7.0.1.src/lib/Support/ErrorHandling.cpp:83
#26 0x7fa1697fcc9f in llvm::RuntimeDyldImpl::emitSection 
(this=this@entry=0x25b6e50, Obj=..., Section=..., IsCode=IsCode@entry=true) at 
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:764
#27 0x7fa1697fe64d in llvm::RuntimeDyldImpl::findOrEmitSection 
(this=this@entry=0x25b6e50, Obj=..., Section=..., IsCode=, 
LocalSections=std::map with 0 elements) at 
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp:826
#28 0x7fa1697ff267 in llvm::RuntimeDyldImpl::loadObjectImpl 
(this=this@entry=0x25b6e50, Obj=...) at 
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/RuntimeDy

Re: Don't clean up LLVM state when exiting in a bad way

2021-09-05 Thread Justin Pryzby
On Wed, Aug 18, 2021 at 03:00:59PM +, Jelte Fennema wrote:
> I ran into some segfaults when using Postgres that was compiled with LLVM 7. 
> According to the backtraces these crashes happened during the call to 
> llvm_shutdown, during cleanup after another out of memory condition. It seems 
> that calls to LLVMOrcDisposeInstance, can crash (at least on LLVM 7) when 
> LLVM is left in bad state. I attached the relevant part of the stacktrace to 
> this email.
> 
> With the attached patch these segfaults went away. The patch turns 
> llvm_shutdown into a no-op whenever the backend is exiting with an error. 
> Based on my understanding of the code this should be totally fine. No memory 
> should be leaked, since all memory will be cleaned up anyway once the backend 
> exits shortly after. The only reason this cleanup code even seems to exist at 
> all is to get useful LLVM profiling data. To me it seems be acceptable if the 
> profiling data is incorrect/missing when the backend exits with an error.

Andres , could you comment on this ?

This seems to explain the crash I reported to you when testing your WIP patches
for the JIT memory leak.  I realize now that the crash happens without your
patches.
https://www.postgresql.org/message-id/20210419164130.byegpfrw46mza...@alap3.anarazel.de

I can reproduce the crash on master (not just v13, as I said before) compiled
on centos7, with:
LLVM_CONFIG=/usr/lib64/llvm7.0/bin/llvm-config 
CLANG=/opt/rh/llvm-toolset-7.0/root/usr/bin/clang

I cannot reproduce the crash after applying Jelte's patch.

I couldn't crash on ubuntu either, so maybe they have a patch which fixes this,
or maybe RH applied a patch which caused it...

postgres=# CREATE TABLE t AS SELECT i FROM generate_series(1,99)i; VACUUM 
ANALYZE t;
postgres=# SET client_min_messages=debug; SET statement_timeout=333; SET 
jit_above_cost=0; SET jit_optimize_above_cost=-1; SET jit_inline_above_cost=-1; 
explain analyze SELECT sum(i) FROM t a NATURAL JOIN t b;
2021-09-05 22:47:12.807 ADT client backend[7563] psql ERROR:  canceling 
statement due to statement timeout
2021-09-05 22:47:12.880 ADT postmaster[7272] LOG:  background worker "parallel 
worker" (PID 8212) was terminated by signal 11: Segmentation fault

-- 
Justin




Re: Don't clean up LLVM state when exiting in a bad way

2021-09-07 Thread Andres Freund
Hi,

On 2021-08-18 15:00:59 +, Jelte Fennema wrote:
> I ran into some segfaults when using Postgres that was compiled with LLVM
> 7. According to the backtraces these crashes happened during the call to
> llvm_shutdown, during cleanup after another out of memory condition. It
> seems that calls to LLVMOrcDisposeInstance, can crash (at least on LLVM 7)
> when LLVM is left in bad state. I attached the relevant part of the
> stacktrace to this email.

> With the attached patch these segfaults went away. The patch turns
> llvm_shutdown into a no-op whenever the backend is exiting with an
> error. Based on my understanding of the code this should be totally fine. No
> memory should be leaked, since all memory will be cleaned up anyway once the
> backend exits shortly after. The only reason this cleanup code even seems to
> exist at all is to get useful LLVM profiling data. To me it seems be
> acceptable if the profiling data is incorrect/missing when the backend exits
> with an error.

I think this is a tad too strong. We should continue to clean up on exit as
long as the error didn't happen while we're already inside llvm
code. Otherwise we loose some ability to find leaks. How about checking in the
error path whether fatal_new_handler_depth is > 0, and skipping cleanup in
that case? Because that's precisely when it should be unsafe to reenter LLVM.

Greetings,

Andres Freund




Re: Don't clean up LLVM state when exiting in a bad way

2021-09-07 Thread Justin Pryzby
On Tue, Sep 07, 2021 at 12:27:27PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2021-08-18 15:00:59 +, Jelte Fennema wrote:
> > I ran into some segfaults when using Postgres that was compiled with LLVM
> > 7. According to the backtraces these crashes happened during the call to
> > llvm_shutdown, during cleanup after another out of memory condition. It
> > seems that calls to LLVMOrcDisposeInstance, can crash (at least on LLVM 7)
> > when LLVM is left in bad state. I attached the relevant part of the
> > stacktrace to this email.
> 
> > With the attached patch these segfaults went away. The patch turns
> > llvm_shutdown into a no-op whenever the backend is exiting with an
> > error. Based on my understanding of the code this should be totally fine. No
> > memory should be leaked, since all memory will be cleaned up anyway once the
> > backend exits shortly after. The only reason this cleanup code even seems to
> > exist at all is to get useful LLVM profiling data. To me it seems be
> > acceptable if the profiling data is incorrect/missing when the backend exits
> > with an error.
> 
> I think this is a tad too strong. We should continue to clean up on exit as
> long as the error didn't happen while we're already inside llvm
> code. Otherwise we loose some ability to find leaks. How about checking in the
> error path whether fatal_new_handler_depth is > 0, and skipping cleanup in
> that case? Because that's precisely when it should be unsafe to reenter LLVM.

This avoids a crash when compiled with llvm7+clang7 on RH7 on master:

python3 -c "import pg; db=pg.DB('dbname=postgres host=/tmp port=5678'); 
db.query('SET jit_above_cost=0; SET jit_inline_above_cost=-1; SET jit=on; SET 
client_min_messages=debug'); db.query('begin'); db.query_formatted('SELECT 1 
FROM generate_series(1,99)a WHERE a=%s', [1], inline=False);"

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index df691cbf1c..a3869ee700 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -885,6 +885,12 @@ llvm_session_initialize(void)
 static void
 llvm_shutdown(int code, Datum arg)
 {
+   extern int fatal_new_handler_depth;
+
+   // if (code!=0)
+   if (fatal_new_handler_depth > 0)
+   return;
+
 #if LLVM_VERSION_MAJOR > 11
{
if (llvm_opt3_orc)
diff --git a/src/backend/jit/llvm/llvmjit_error.cpp 
b/src/backend/jit/llvm/llvmjit_error.cpp
index 26bc828875..802dc1b058 100644
--- a/src/backend/jit/llvm/llvmjit_error.cpp
+++ b/src/backend/jit/llvm/llvmjit_error.cpp
@@ -24,7 +24,7 @@ extern "C"
 #include "jit/llvmjit.h"
 
 
-static int fatal_new_handler_depth = 0;
+int fatal_new_handler_depth = 0;
 static std::new_handler old_new_handler = NULL;
 
 static void fatal_system_new_handler(void);




Re: Don't clean up LLVM state when exiting in a bad way

2021-09-13 Thread Andres Freund
Hi,


On 2021-09-07 14:44:39 -0500, Justin Pryzby wrote:
> On Tue, Sep 07, 2021 at 12:27:27PM -0700, Andres Freund wrote:
> > I think this is a tad too strong. We should continue to clean up on exit as
> > long as the error didn't happen while we're already inside llvm
> > code. Otherwise we loose some ability to find leaks. How about checking in 
> > the
> > error path whether fatal_new_handler_depth is > 0, and skipping cleanup in
> > that case? Because that's precisely when it should be unsafe to reenter
> > LLVM.

The more important reason is actually profiling information that needs to be
written out.

I've now pushed a fix to all relevant branches. Thanks all!

Regards,

Andres





Re: Don't clean up LLVM state when exiting in a bad way

2021-09-13 Thread Alexander Lakhin
Hello hackers,
14.09.2021 04:32, Andres Freund wrote:
> On 2021-09-07 14:44:39 -0500, Justin Pryzby wrote:
>> On Tue, Sep 07, 2021 at 12:27:27PM -0700, Andres Freund wrote:
>>> I think this is a tad too strong. We should continue to clean up on exit as
>>> long as the error didn't happen while we're already inside llvm
>>> code. Otherwise we loose some ability to find leaks. How about checking in 
>>> the
>>> error path whether fatal_new_handler_depth is > 0, and skipping cleanup in
>>> that case? Because that's precisely when it should be unsafe to reenter
>>> LLVM.
> The more important reason is actually profiling information that needs to be
> written out.
>
> I've now pushed a fix to all relevant branches. Thanks all!
>
I've encountered similar issue last week, but found this discussion only
after the commit.
I'm afraid that it's not completely gone yet. I've reproduced a similar
crash (on edb4d95d) with
echo "statement_timeout = 50
jit_optimize_above_cost = 1
jit_inline_above_cost = 1
parallel_setup_cost=0
parallel_tuple_cost=0
" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config  make check

parallel group (11 tests):  memoize explain hash_part partition_info
reloptions tuplesort compression partition_aggregate indexing
partition_prune partition_join
 partition_join   ... FAILED (test process exited with
exit code 2) 1815 ms
 partition_prune  ... FAILED (test process exited with
exit code 2) 1779 ms
 reloptions   ... ok  146 ms

I've extracted the crash-causing fragment from the partition_prune test
to reproduce the segfault reliably (see the patch attached).
The segfault stack is:
Core was generated by `postgres: parallel worker for PID
12029   '.
Program terminated with signal 11, Segmentation fault.
#0  0x7f045e0a88ca in notifyFreed (K=, Obj=...,
this=)
    at
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:485
485   Listener->NotifyFreeingObject(Obj);
(gdb) bt
#0  0x7f045e0a88ca in notifyFreed (K=, Obj=...,
this=)
    at
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:485
#1  operator() (K=, Obj=..., __closure=)
    at
/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:226
#2  std::_Function_handler >
()>)::{lambda(unsigned long, llvm::object::ObjectFile
const&)#3}>::_M_invoke(std::_Any_data const&, unsigned long,
llvm::object::ObjectFile const&) (__functor=..., __args#0=, __args#1=...)
    at /usr/include/c++/4.8.2/functional:2071
#3  0x7f045e0aa578 in operator() (__args#1=..., __args#0=, this=)
    at /usr/include/c++/4.8.2/functional:2471
...

The corresponding code in OrcCBindingsStack.h is:
void notifyFreed(orc::VModuleKey K, const object::ObjectFile &Obj) {
    for (auto &Listener : EventListeners)
 Listener->NotifyFreeingObject(Obj);
}
So probably one of the EventListeners has become null. I see that
without debugging and profiling enabled the only listener registration
in the postgres code is LLVMOrcRegisterJITEventListener.

With LLVM 9 on the same Centos 7 I don't get such segfault. Also it
doesn't happen on different OSes with LLVM 7. I still have no
explanation for that, but maybe there is difference between LLVM
configure options, e.g. like this:
https://stackoverflow.com/questions/47712670/segmentation-fault-in-llvm-pass-when-using-registerstandardpasses

Best regards,
Alexander


jit-llvm-7-crash.sql
Description: application/sql


Re: Don't clean up LLVM state when exiting in a bad way

2021-09-13 Thread Andres Freund
Hi, 

On September 13, 2021 9:00:00 PM PDT, Alexander Lakhin  
wrote:
>Hello hackers,
>14.09.2021 04:32, Andres Freund wrote:
>> On 2021-09-07 14:44:39 -0500, Justin Pryzby wrote:
>>> On Tue, Sep 07, 2021 at 12:27:27PM -0700, Andres Freund wrote:
 I think this is a tad too strong. We should continue to clean up on exit as
 long as the error didn't happen while we're already inside llvm
 code. Otherwise we loose some ability to find leaks. How about checking in 
 the
 error path whether fatal_new_handler_depth is > 0, and skipping cleanup in
 that case? Because that's precisely when it should be unsafe to reenter
 LLVM.
>> The more important reason is actually profiling information that needs to be
>> written out.
>>
>> I've now pushed a fix to all relevant branches. Thanks all!
>>
>I've encountered similar issue last week, but found this discussion only
>after the commit.
>I'm afraid that it's not completely gone yet. I've reproduced a similar
>crash (on edb4d95d) with
>echo "statement_timeout = 50
>jit_optimize_above_cost = 1
>jit_inline_above_cost = 1
>parallel_setup_cost=0
>parallel_tuple_cost=0
>" >/tmp/extra.config
>TEMP_CONFIG=/tmp/extra.config  make check
>
>parallel group (11 tests):  memoize explain hash_part partition_info
>reloptions tuplesort compression partition_aggregate indexing
>partition_prune partition_join
> partition_join   ... FAILED (test process exited with
>exit code 2) 1815 ms
> partition_prune  ... FAILED (test process exited with
>exit code 2) 1779 ms
> reloptions   ... ok  146 ms
>
>I've extracted the crash-causing fragment from the partition_prune test
>to reproduce the segfault reliably (see the patch attached).
>The segfault stack is:
>Core was generated by `postgres: parallel worker for PID
>12029   '.
>Program terminated with signal 11, Segmentation fault.
>#0  0x7f045e0a88ca in notifyFreed (K=, Obj=...,
>this=)
>    at
>/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:485
>485   Listener->NotifyFreeingObject(Obj);
>(gdb) bt
>#0  0x7f045e0a88ca in notifyFreed (K=, Obj=...,
>this=)
>    at
>/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:485
>#1  operator() (K=, Obj=..., __closure=)
>    at
>/usr/src/debug/llvm-7.0.1.src/lib/ExecutionEngine/Orc/OrcCBindingsStack.h:226
>#2  std::_Function_handlerconst&),
>llvm::OrcCBindingsStack::OrcCBindingsStack(llvm::TargetMachine&,
>std::functionstd::default_delete >
>()>)::{lambda(unsigned long, llvm::object::ObjectFile
>const&)#3}>::_M_invoke(std::_Any_data const&, unsigned long,
>llvm::object::ObjectFile const&) (__functor=..., __args#0=out>, __args#1=...)
>    at /usr/include/c++/4.8.2/functional:2071
>#3  0x7f045e0aa578 in operator() (__args#1=..., __args#0=out>, this=)
>    at /usr/include/c++/4.8.2/functional:2471
>...
>
>The corresponding code in OrcCBindingsStack.h is:
>void notifyFreed(orc::VModuleKey K, const object::ObjectFile &Obj) {
>    for (auto &Listener : EventListeners)
> Listener->NotifyFreeingObject(Obj);
>}
>So probably one of the EventListeners has become null. I see that
>without debugging and profiling enabled the only listener registration
>in the postgres code is LLVMOrcRegisterJITEventListener.
>
>With LLVM 9 on the same Centos 7 I don't get such segfault. Also it
>doesn't happen on different OSes with LLVM 7.

That just like an llvm bug to me. Rather than the usage issue addressed in this 
thread.


 I still have no
>explanation for that, but maybe there is difference between LLVM
>configure options, e.g. like this:
>https://stackoverflow.com/questions/47712670/segmentation-fault-in-llvm-pass-when-using-registerstandardpasses

Why is it not much more likely that bugs were fixed?


Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Don't clean up LLVM state when exiting in a bad way

2021-09-14 Thread Alexander Lakhin
Hello Andres,
14.09.2021 08:05, Andres Freund wrote:
>
>> With LLVM 9 on the same Centos 7 I don't get such segfault. Also it
>> doesn't happen on different OSes with LLVM 7.
> That just like an llvm bug to me. Rather than the usage issue addressed in 
> this thread.
But Justin seen this strangeness too:
>
> I couldn't crash on ubuntu either, so maybe they have a patch which
> fixes this,
> or maybe RH applied a patch which caused it...
>
The script that Justin presented:
> postgres=# CREATE TABLE t AS SELECT i FROM generate_series(1,99)i;
> VACUUM ANALYZE t;
> postgres=# SET client_min_messages=debug; SET statement_timeout=333;
> SET jit_above_cost=0; SET jit_optimize_above_cost=-1; SET
> jit_inline_above_cost=-1; explain analyze SELECT sum(i) FROM t a
> NATURAL JOIN t b;
causes the server crash on Centos 7 with LLVM 7 (even with the fix
applied) and doesn't crash with LLVM 9 (I used llvm-toolset-9* and
devtoolset-9* packages from
https://buildlogs.centos.org/c7-llvm-toolset-9.0.x86_64/ and
https://buildlogs.centos.org/c7-devtoolset-9.x86_64/ repositories).

The another script:
>
> python3 -c "import pg; db=pg.DB('dbname=postgres host=/tmp
> port=5678'); db.query('SET jit_above_cost=0; SET
> jit_inline_above_cost=-1; SET jit=on; SET client_min_messages=debug');
> db.query('begin'); db.query_formatted('SELECT 1 FROM
> generate_series(1,99)a WHERE a=%s', [1], inline=False);"
>
also causes the server crash with LLVM 7, and doesn't crash with LLVM 9.

So I wonder, isn't the fixed usage issue specific to LLVM 7, which is
not going to be supported as having some bugs?

Best regards,
Alexander




Re: Don't clean up LLVM state when exiting in a bad way

2021-08-18 Thread Zhihong Yu
On Wed, Aug 18, 2021 at 8:01 AM Jelte Fennema 
wrote:

> Hi,
>
> I ran into some segfaults when using Postgres that was compiled with LLVM
> 7. According to the backtraces these crashes happened during the call to
> llvm_shutdown, during cleanup after another out of memory condition. It
> seems that calls to LLVMOrcDisposeInstance, can crash (at least on LLVM 7)
> when LLVM is left in bad state. I attached the relevant part of the
> stacktrace to this email.
>
> With the attached patch these segfaults went away. The patch turns
> llvm_shutdown into a no-op whenever the backend is exiting with an error.
> Based on my understanding of the code this should be totally fine. No
> memory should be leaked, since all memory will be cleaned up anyway once
> the backend exits shortly after. The only reason this cleanup code even
> seems to exist at all is to get useful LLVM profiling data. To me it seems
> be acceptable if the profiling data is incorrect/missing when the backend
> exits with an error.
>
> Jelte
>
Hi,
Minor comment:

+* shut LLVM down, this can result into a segfault. So if this
process

result into a segfault -> result in a segfault

Cheers


Re: [EXTERNAL] Re: Don't clean up LLVM state when exiting in a bad way

2021-09-14 Thread Jelte Fennema
> So I wonder, isn't the fixed usage issue specific to LLVM 7

That's definitely possible. I was unable to reproduce the issue I shared in my 
original email when postgres was compiled with LLVM 10. 

That's also why I sent an email to the pgsql-pkg-yum mailing list about options 
to use a newer version of LLVM on CentOS 7: 
https://www.postgresql.org/message-id/flat/AM5PR83MB0178475D87EFA290A4D0793DF7FF9%40AM5PR83MB0178.EURPRD83.prod.outlook.com
 (no response so far though)