Re: JIT breaks PostGIS
Hi, On 2018-07-25 21:59:32 +0200, Christoph Berg wrote: > Re: Andres Freund 2018-07-25 > <20180725195037.jmykfzfporf6a...@alap3.anarazel.de> > > The way inlining works is that, when referencing a function, the bitcode > > summary file corresponding to it (either postgres.index.bc if builtin or > > $extension.index.bc if in an extension) gets opened. That contains > > metadata (name, number of instructions, ...) about the functions > > included in the indexed .bc files (which reside in > > $module/path/to/$file.bc). Those .bc files in turn are generated by > > clang -emit-llvm. The inlining machinery looks up functions in the > > corresponding .index.bc, checks whether they are smaller than the > > instruction limit, and inlines them if below. > > Is that top-level functions (SQL language "C" functions), or all > C-code functions? The "initial" entry point has to be a C (or INTERNAL) function. But called function can recursively get inlined (with the size limit being halved every recursion step). Outside of hooks etc there's no other way to call extension functions outside of SQL level functions, so that's not really a restriction. > > If a function is not in the summary, or it is too large, it'll just > > generate an external function call. It's perfectly normal to have too > > large functions and functions that aren't present (e.g. random libraries > > including libc). > > What happens if some (SQL) function is in there, but calls into a > function that is not? Or is in a different index? I'm not precisely sure what you mean. You're thinking of a C UDF that's calling a C function (not a UDF) that's not in the index? If so, then we'll just not inline that inner function, but generate a normal function call (i.e. on asm level that'll be a callq instruction or whatever your platform wants to do). Greetings, Andres Freund
Re: JIT breaks PostGIS
Re: Andres Freund 2018-07-25 <20180725195037.jmykfzfporf6a...@alap3.anarazel.de> > > Different question, the other way round, is there a way to know that > > all files needed to inline a query/extension are there? How does the > > JIT machinery determine it can (try to) compile things? (That's > > something extension packages might want to test for.) > > I'm not 100% sure I understand your question. Let me describe how it > works, and maybe you can then rephrase afterwards? > > The way inlining works is that, when referencing a function, the bitcode > summary file corresponding to it (either postgres.index.bc if builtin or > $extension.index.bc if in an extension) gets opened. That contains > metadata (name, number of instructions, ...) about the functions > included in the indexed .bc files (which reside in > $module/path/to/$file.bc). Those .bc files in turn are generated by > clang -emit-llvm. The inlining machinery looks up functions in the > corresponding .index.bc, checks whether they are smaller than the > instruction limit, and inlines them if below. Is that top-level functions (SQL language "C" functions), or all C-code functions? > If a function is not in the summary, or it is too large, it'll just > generate an external function call. It's perfectly normal to have too > large functions and functions that aren't present (e.g. random libraries > including libc). What happens if some (SQL) function is in there, but calls into a function that is not? Or is in a different index? Christoph
Re: JIT breaks PostGIS
Hi, On 2018-07-25 21:39:26 +0200, Christoph Berg wrote: > Re: Andres Freund 2018-07-25 > <20180725191143.sietxlbfehv24...@alap3.anarazel.de> > > I haven't investigated the details here. It certainly would be possible > > to have the _PG_init() of postgis's so force JIT to be off, and emit a > > warning. > > Isn't that too late, if postgis.so gets loaded by a query that is in > to process of being jit'ed? Hm, I'd guess it'd still be fine, but I haven't thought it sufficiently through. > Different question, the other way round, is there a way to know that > all files needed to inline a query/extension are there? How does the > JIT machinery determine it can (try to) compile things? (That's > something extension packages might want to test for.) I'm not 100% sure I understand your question. Let me describe how it works, and maybe you can then rephrase afterwards? The way inlining works is that, when referencing a function, the bitcode summary file corresponding to it (either postgres.index.bc if builtin or $extension.index.bc if in an extension) gets opened. That contains metadata (name, number of instructions, ...) about the functions included in the indexed .bc files (which reside in $module/path/to/$file.bc). Those .bc files in turn are generated by clang -emit-llvm. The inlining machinery looks up functions in the corresponding .index.bc, checks whether they are smaller than the instruction limit, and inlines them if below. If a function is not in the summary, or it is too large, it'll just generate an external function call. It's perfectly normal to have too large functions and functions that aren't present (e.g. random libraries including libc). What precisely would you want to test? Greetings, Andres Freund
Re: JIT breaks PostGIS
Re: Andres Freund 2018-07-25 <20180725191143.sietxlbfehv24...@alap3.anarazel.de> > I haven't investigated the details here. It certainly would be possible > to have the _PG_init() of postgis's so force JIT to be off, and emit a > warning. Isn't that too late, if postgis.so gets loaded by a query that is in to process of being jit'ed? > There's no way to just force JIT off whenever anything involving postgis > is present in the query. Not installing the .bc files will prevent > inlining, but I don't think that's sufficient to prevent the problem in > all cases. Ok. Different question, the other way round, is there a way to know that all files needed to inline a query/extension are there? How does the JIT machinery determine it can (try to) compile things? (That's something extension packages might want to test for.) > > , or should we look into getting a new version to distribute along > > with it on apt.postgresql.org? I'd rather like to avoid having to ship > > a compiler... > > Well, you don't need to ship the compiler (clang), strictly > speaking. But yea. We need clang to compile PostgreSQL and extensions, so it needs to come from somewhere. We could pull it from (stretch-)backports, but having to enable backports for really every build doesn't look appealing. (At the moment it's about two dozen out of ~100 packages that needs backports at build time, and half of that is python packages that are dependencies of pgadmin4 only.) > > > It also may happen that a patch for LLVM can be applied to LLVM4 build in > > > Debian and brought in as an update, but such a plan should not be a > > > default > > > one. > > > > That's actually a plan I like very much. Hopefully the patch is small > > and backpatchable. > > It's not entirely trivial, and I suspect there's independent changes > making it not apply cleanly: > https://github.com/llvm-mirror/llvm/commit/ab3dba86f951a1bdfe01d3102e2850e607af791a The svn link for that is https://llvm.org/viewvc/llvm-project?view=revision&revision=302589 I tried to apply the patch to llvm-toolchain-4.0_4.0.1-10~deb9u2. There are no rejects, but two of the patched files do not even exist. patching file include/llvm/ExecutionEngine/Orc/CompileOnDemandLayer.h patching file include/llvm/ExecutionEngine/Orc/OrcRemoteTargetClient.h Hunk #1 succeeded at 143 (offset -1 lines). Hunk #2 succeeded at 319 (offset -1 lines). Hunk #3 succeeded at 330 (offset -1 lines). Hunk #4 succeeded at 387 (offset -1 lines). can't find file to patch at input line 171 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |diff --git a/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h b/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h |index babcc7f26aab..5b3426afe584 100644 |--- a/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h |+++ b/include/llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h -- File to patch: Skip this patch? [y] Skipping patch. 1 out of 1 hunk ignored patching file include/llvm/ExecutionEngine/RTDyldMemoryManager.h patching file include/llvm/ExecutionEngine/RuntimeDyld.h patching file lib/ExecutionEngine/Orc/OrcMCJITReplacement.h patching file lib/ExecutionEngine/RuntimeDyld/RTDyldMemoryManager.cpp patching file lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp patching file lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp patching file lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.h Hunk #1 succeeded at 144 with fuzz 2 (offset -8 lines). Hunk #2 succeeded at 167 (offset -12 lines). patching file lib/ExecutionEngine/RuntimeDyld/RuntimeDyldImpl.h Hunk #1 succeeded at 504 (offset -11 lines). patching file lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFI386.h patching file lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFThumb.h patching file lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFX86_64.h patching file tools/lli/RemoteJITUtils.h patching file tools/llvm-rtdyld/llvm-rtdyld.cpp patching file unittests/ExecutionEngine/Orc/ObjectTransformLayerTest.cpp can't find file to patch at input line 429 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |diff --git a/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp b/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp |index de99c022fb9d..c13a75a5cbfe 100644 |--- a/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp |+++ b/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp -- File to patch: Skip this patch? [y] Skipping patch. 5 out of 5 hunks ignored Christoph
Re: JIT breaks PostGIS
Hi, On 2018-07-25 21:05:33 +0200, Christoph Berg wrote: > > > It'll only be an issue for extensions that throw c++ style exceptions. I > > > don't think that rises to the level of disallowing any LLVM version < > > > 5.0. I suggest postgis adds an error check to its buildprocess that > > > refuses to run if jit is enabled and a too old version is used? > > > > Unfortunately that's not an option. > > Is it possible to disable JIT per extension, say by removing all .bc > files for it, or via a PGXS variable? Or does this bug still trigger > even without the .bc files for PostGIS? I haven't investigated the details here. It certainly would be possible to have the _PG_init() of postgis's so force JIT to be off, and emit a warning. There's no way to just force JIT off whenever anything involving postgis is present in the query. Not installing the .bc files will prevent inlining, but I don't think that's sufficient to prevent the problem in all cases. > > I think that a good way to deal with it will be to bump minimum required > > version of LLVM to 5 on Postgres build, with a notice in docs that will say > > that "you can build with lower version, but that will give you this and > > this bug". > > Is LLVM << 5 generally acceptable for use with PostgreSQL It is. Newer version wouldn't hurt though. > , or should we look into getting a new version to distribute along > with it on apt.postgresql.org? I'd rather like to avoid having to ship > a compiler... Well, you don't need to ship the compiler (clang), strictly speaking. But yea. > > It also may happen that a patch for LLVM can be applied to LLVM4 build in > > Debian and brought in as an update, but such a plan should not be a default > > one. > > That's actually a plan I like very much. Hopefully the patch is small > and backpatchable. It's not entirely trivial, and I suspect there's independent changes making it not apply cleanly: https://github.com/llvm-mirror/llvm/commit/ab3dba86f951a1bdfe01d3102e2850e607af791a Greetings, Andres Freund
Re: JIT breaks PostGIS
Re: Darafei "Komяpa" Praliaskouski 2018-07-23 > > It looks to me like it's a LLVM issue, specifically > > https://bugs.llvm.org/show_bug.cgi?id=34424 > > fixed in LLVM 5+. > > > > Thank you for your investigation. Thanks! > > It'll only be an issue for extensions that throw c++ style exceptions. I > > don't think that rises to the level of disallowing any LLVM version < > > 5.0. I suggest postgis adds an error check to its buildprocess that > > refuses to run if jit is enabled and a too old version is used? > > Unfortunately that's not an option. Is it possible to disable JIT per extension, say by removing all .bc files for it, or via a PGXS variable? Or does this bug still trigger even without the .bc files for PostGIS? > If Christoph decides to keep LLVM enabled for 11 on that platform, as he is > recommended upthread by Tom, that would mean that PostGIS can't be packaged > at all, and all the people who need it will have to stay on Postgres 10. We'll definitely need to find a proper fix, not shipping PostGIS is not an option. > If PostGIS decides not to implement the check, and instead tweaks test > runner to execute `set jit to off;` before tickets.sql, then Postgres 11 on > that platform will have a known way to segfault it, even without superuser > rights, as jit GUC is tweakable by anyone. Not a good option, ack. > I think that a good way to deal with it will be to bump minimum required > version of LLVM to 5 on Postgres build, with a notice in docs that will say > that "you can build with lower version, but that will give you this and > this bug". Is LLVM << 5 generally acceptable for use with PostgreSQL, or should we look into getting a new version to distribute along with it on apt.postgresql.org? I'd rather like to avoid having to ship a compiler... > It also may happen that a patch for LLVM can be applied to LLVM4 build in > Debian and brought in as an update, but such a plan should not be a default > one. That's actually a plan I like very much. Hopefully the patch is small and backpatchable. Christoph
Re: JIT breaks PostGIS
Hello, пн, 23 июл. 2018 г. в 8:13, Andres Freund : > Hi, > > On 2018-07-21 23:14:47 +0300, Darafei "Komяpa" Praliaskouski wrote: > > > > I suspect that a fix would require to bisect llvm/clang version which > stops > > showing this behavior and making it a new minimum for JIT, if this is > not a > > symptom of bigger (memory management?) problem. > > It looks to me like it's a LLVM issue, specifically > https://bugs.llvm.org/show_bug.cgi?id=34424 > fixed in LLVM 5+. > Thank you for your investigation. > It'll only be an issue for extensions that throw c++ style exceptions. I > don't think that rises to the level of disallowing any LLVM version < > 5.0. I suggest postgis adds an error check to its buildprocess that > refuses to run if jit is enabled and a too old version is used? > Unfortunately that's not an option. Debian Stretch amd64 is quite popular platform, and is supported by Postgres 10 / PostGIS 2.4. If Christoph decides to keep LLVM enabled for 11 on that platform, as he is recommended upthread by Tom, that would mean that PostGIS can't be packaged at all, and all the people who need it will have to stay on Postgres 10. If PostGIS decides not to implement the check, and instead tweaks test runner to execute `set jit to off;` before tickets.sql, then Postgres 11 on that platform will have a known way to segfault it, even without superuser rights, as jit GUC is tweakable by anyone. I think that a good way to deal with it will be to bump minimum required version of LLVM to 5 on Postgres build, with a notice in docs that will say that "you can build with lower version, but that will give you this and this bug". It also may happen that a patch for LLVM can be applied to LLVM4 build in Debian and brought in as an update, but such a plan should not be a default one. -- Darafei Praliaskouski Support me: http://patreon.com/komzpa
Re: JIT breaks PostGIS
Hi, On 2018-07-21 23:14:47 +0300, Darafei "Komяpa" Praliaskouski wrote: > Today I spent some time closing PostGIS tickets in preparation to Monday's > release. > > One of the blockers, https://trac.osgeo.org/postgis/ticket/4125, was filed > by Postgres APT repository maintainer Christoph Berg who noticed a test > suite failure on Debian Stretch with Postgres 11. > > Upon investigation we found: > - A build for Ubuntu Bionic installed on Debian Stretch passes the suite, > requiring llvm6; > - A build for Debian Stretch fails the suite on a call to external library > GEOS, showing no traces of JIT in the stacktrace; > - Setting jit=off lets the suite pass; > - The query called in clean session by itself does not crash Postgres. > Queries above it are required to reproduce the crash; > - The crash affects not only Stretch, but customly collected Postgres 11 / > clang 3.9 on Travis CI running Ubuntu Trusty: > https://github.com/postgis/postgis/pull/262. > > I suspect that a fix would require to bisect llvm/clang version which stops > showing this behavior and making it a new minimum for JIT, if this is not a > symptom of bigger (memory management?) problem. It looks to me like it's a LLVM issue, specifically https://bugs.llvm.org/show_bug.cgi?id=34424 fixed in LLVM 5+. It'll only be an issue for extensions that throw c++ style exceptions. I don't think that rises to the level of disallowing any LLVM version < 5.0. I suggest postgis adds an error check to its buildprocess that refuses to run if jit is enabled and a too old version is used? Greetings, Andres Freund
Re: JIT breaks PostGIS
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Christoph Berg writes: > > The question will be coming up eventually, though, and I think the > > options on the packaging side are: > > > 1) Disable jit completely > > 2) Compile --with-llvm, but disable jit in the config by default > > 3) Compile --with-llvm, but disable jit for older llvm versions > > 4) Enable jit everywhere where llvm >= 3.9 is available > > > Option 4 is what the Debian packages implement now, but it might make > > sense to go to 2 or 3 for PG11 (only). > > Well, right now JIT is certainly beta-quality code, so you ought > to expect bugs. We have an open item at > https://wiki.postgresql.org/wiki/PostgreSQL_11_Open_Items > to decide whether to ship v11 with JIT enabled by default or not, > but I don't expect that decision will be taken until much closer > to release. Until then, I think you should be doing (4) so that > we can gather data to inform the eventual decision. +1. Thanks! Stephen signature.asc Description: PGP signature
Re: JIT breaks PostGIS
Christoph Berg writes: > The question will be coming up eventually, though, and I think the > options on the packaging side are: > 1) Disable jit completely > 2) Compile --with-llvm, but disable jit in the config by default > 3) Compile --with-llvm, but disable jit for older llvm versions > 4) Enable jit everywhere where llvm >= 3.9 is available > Option 4 is what the Debian packages implement now, but it might make > sense to go to 2 or 3 for PG11 (only). Well, right now JIT is certainly beta-quality code, so you ought to expect bugs. We have an open item at https://wiki.postgresql.org/wiki/PostgreSQL_11_Open_Items to decide whether to ship v11 with JIT enabled by default or not, but I don't expect that decision will be taken until much closer to release. Until then, I think you should be doing (4) so that we can gather data to inform the eventual decision. regards, tom lane
Re: JIT breaks PostGIS
Re: Andres Freund 2018-07-22 <20180722185106.qxc5ie745tqda...@alap3.anarazel.de> > > Does that mean JIT is not ready for prime time yet and should be > > disabled in default installations? Or does it just mean that llvm 4.0 > > is old? > > I don't know yet, it's probably just some small bug. But let me debug it > first. Sure. The question will be coming up eventually, though, and I think the options on the packaging side are: 1) Disable jit completely 2) Compile --with-llvm, but disable jit in the config by default 3) Compile --with-llvm, but disable jit for older llvm versions 4) Enable jit everywhere where llvm >= 3.9 is available Option 4 is what the Debian packages implement now, but it might make sense to go to 2 or 3 for PG11 (only). Christoph
Re: JIT breaks PostGIS
On 2018-07-22 20:49:51 +0200, Christoph Berg wrote: > Re: To Andres Freund 2018-07-21 <20180721203644.gb10...@msg.df7cb.de> > > That said, I'm just rebuilding postgresql-11 on stretch to use > > llvm-4.0 instead of 3.9. > > This didn't change anything, it's still crashing on the same query > from tickets.sql. > > Does that mean JIT is not ready for prime time yet and should be > disabled in default installations? Or does it just mean that llvm 4.0 > is old? I don't know yet, it's probably just some small bug. But let me debug it first. Greetings, Andres Freund
Re: JIT breaks PostGIS
Re: To Andres Freund 2018-07-21 <20180721203644.gb10...@msg.df7cb.de> > That said, I'm just rebuilding postgresql-11 on stretch to use > llvm-4.0 instead of 3.9. This didn't change anything, it's still crashing on the same query from tickets.sql. Does that mean JIT is not ready for prime time yet and should be disabled in default installations? Or does it just mean that llvm 4.0 is old? Christoph
Re: JIT breaks PostGIS
Hi, Here's somewhat minimized example. https://gist.github.com/Komzpa/cc3762175328ff5d11de4b972352003d You can put this file into regress/jitbug.sql in PostGIS code tree and run after building postgis: perl regress/run_test.pl regress/jitbug.sql --expect perl regress/run_test.pl regress/jitbug.sql сб, 21 июл. 2018 г. в 23:39, Andres Freund : > Hi, > > On 2018-07-21 22:36:44 +0200, Christoph Berg wrote: > > Re: Andres Freund 2018-07-21 < > 20180721202543.ri5jyfclj6kb6...@alap3.anarazel.de> > > > Could you attempt to come up with a smaller reproducer? > > > > The original instructions in > > https://trac.osgeo.org/postgis/ticket/4125 are fairly easy to follow; > > there's also a postgresql-11-postgis-2.5{,-scripts} package (not > > mentioned in there) that exhibits the bug. > > Sure, but a more minimal example (than a 1kloc regression script, wiht > possible inter statement dependencies) still makes the debugging > easier... > > Greetings, > > Andres Freund > -- Darafei Praliaskouski Support me: http://patreon.com/komzpa
Re: JIT breaks PostGIS
Hi, On 2018-07-21 22:36:44 +0200, Christoph Berg wrote: > Re: Andres Freund 2018-07-21 > <20180721202543.ri5jyfclj6kb6...@alap3.anarazel.de> > > Could you attempt to come up with a smaller reproducer? > > The original instructions in > https://trac.osgeo.org/postgis/ticket/4125 are fairly easy to follow; > there's also a postgresql-11-postgis-2.5{,-scripts} package (not > mentioned in there) that exhibits the bug. Sure, but a more minimal example (than a 1kloc regression script, wiht possible inter statement dependencies) still makes the debugging easier... Greetings, Andres Freund
Re: JIT breaks PostGIS
Re: Andres Freund 2018-07-21 <20180721202543.ri5jyfclj6kb6...@alap3.anarazel.de> > Could you attempt to come up with a smaller reproducer? The original instructions in https://trac.osgeo.org/postgis/ticket/4125 are fairly easy to follow; there's also a postgresql-11-postgis-2.5{,-scripts} package (not mentioned in there) that exhibits the bug. (Add "stretch-pgdg-testing main 11" to sources.list.) That said, I'm just rebuilding postgresql-11 on stretch to use llvm-4.0 instead of 3.9. Christoph
Re: JIT breaks PostGIS
Hi, On 2018-07-21 23:14:47 +0300, Darafei "Komяpa" Praliaskouski wrote: > Today I spent some time closing PostGIS tickets in preparation to Monday's > release. > > One of the blockers, https://trac.osgeo.org/postgis/ticket/4125, was filed > by Postgres APT repository maintainer Christoph Berg who noticed a test > suite failure on Debian Stretch with Postgres 11. > > Upon investigation we found: > - A build for Ubuntu Bionic installed on Debian Stretch passes the suite, > requiring llvm6; > - A build for Debian Stretch fails the suite on a call to external library > GEOS, showing no traces of JIT in the stacktrace; > - Setting jit=off lets the suite pass; > - The query called in clean session by itself does not crash Postgres. > Queries above it are required to reproduce the crash; > - The crash affects not only Stretch, but customly collected Postgres 11 / > clang 3.9 on Travis CI running Ubuntu Trusty: > https://github.com/postgis/postgis/pull/262. > > I suspect that a fix would require to bisect llvm/clang version which stops > showing this behavior and making it a new minimum for JIT, if this is not a > symptom of bigger (memory management?) problem. Could you attempt to come up with a smaller reproducer? Greetings, Andres Freund