Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
I tried. It works Thanks for the information. P On Mon, Jan 28, 2019, 7:28 PM Tom Lane wrote: > pabloa98 writes: > > I just migrated our databases from PostgreSQL version 9.6 to version > 11.1. > > We got a segmentation fault while running this query: > > > SELECT f_2110 as x FROM baseline_denull > > ORDER BY eid ASC > > limit 500 > > OFFSET 131000; > > > the table baseline_denull has 1765 columns, mainly numbers, like: > > Hm, that sounds like it matches this recent bug fix: > > Author: Andres Freund > Branch: master [b23852766] 2018-11-27 10:07:03 -0800 > Branch: REL_11_STABLE [aee085bc0] 2018-11-27 10:07:43 -0800 > > Fix jit compilation bug on wide tables. > > The function generated to perform JIT compiled tuple deforming failed > when HeapTupleHeader's t_hoff was bigger than a signed int8. I'd > failed to realize that LLVM's getelementptr would treat an int8 index > argument as signed, rather than unsigned. That means that a hoff > larger than 127 would result in a negative offset being applied. Fix > that by widening the index to 32bit. > > Add a testcase with a wide table. Don't drop it, as it seems useful to > verify other tools deal properly with wide tables. > > Thanks to Justin Pryzby for both reporting a bug and then reducing it > to a reproducible testcase! > > Reported-By: Justin Pryzby > Author: Andres Freund > Discussion: https://postgr.es/m/20181115223959.gb10...@telsasoft.com > Backpatch: 11, just as jit compilation was > > > This would result in failures on wide rows that contain some null > entries. If your table is mostly-not-null, that would fit the > observation that it only crashes on a few rows. > > Can you try REL_11_STABLE branch tip and see if it works for you? > > regards, tom lane >
Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote: > The fix is easy enough, just adding a > v_hoff = LLVMBuildZExt(b, v_hoff, LLVMInt32Type(), ""); > fixes the issue for me. On Tue, Jan 29, 2019 at 12:38:38AM -0800, pabloa98 wrote: > And perhaps should I modify this too? > If that is the case, I am not sure what kind of modification we should do. Andres commited the fix in November, and it's included in postgres11.2, which is scheduled to be released Thursday. So we'll both be able to re-enable JIT on our wide tables again. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b23852766 Justin
Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
I checked the table. It has 1265 columns. Sorry about the typo. Pablo On Tue, Jan 29, 2019 at 1:10 AM Andrew Gierth wrote: > > "pabloa98" == pabloa98 writes: > > pabloa98> I did not modify it. > > Then how did you create a table with more than 1600 columns? If I try > and create a table with 1765 columns, I get: > > ERROR: tables can have at most 1600 columns > > -- > Andrew (irc:RhodiumToad) >
Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
I appreciate your advice. I will check the number of columns in that table. On Tue, Jan 29, 2019, 1:53 AM Andrew Gierth wrote: > > "pabloa98" == pabloa98 writes: > > pabloa98> I found this article: > > pabloa98> > https://manual.limesurvey.org/Instructions_for_increasing_the_maximum_number_of_columns_in_PostgreSQL_on_Linux > > Those instructions contain obvious errors. > > pabloa98> It seems I should modify: uint8 t_hoff; > pabloa98> and replace it with something like: uint32 t_hoff; or uint64 > t_hoff; > > At the very least, that ought to be uint16 t_hoff; since there is never > any possibility of hoff being larger than 32k since that's the largest > allowed pagesize. However, if you modify that, it's then up to you to > ensure that all the code that assumes it's a uint8 is found and fixed. > I have no idea what else would break. > > -- > Andrew (irc:RhodiumToad) >
Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
> "pabloa98" == pabloa98 writes: pabloa98> I found this article: pabloa98> https://manual.limesurvey.org/Instructions_for_increasing_the_maximum_number_of_columns_in_PostgreSQL_on_Linux Those instructions contain obvious errors. pabloa98> It seems I should modify: uint8 t_hoff; pabloa98> and replace it with something like: uint32 t_hoff; or uint64 t_hoff; At the very least, that ought to be uint16 t_hoff; since there is never any possibility of hoff being larger than 32k since that's the largest allowed pagesize. However, if you modify that, it's then up to you to ensure that all the code that assumes it's a uint8 is found and fixed. I have no idea what else would break. -- Andrew (irc:RhodiumToad)
Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
> "pabloa98" == pabloa98 writes: pabloa98> I did not modify it. Then how did you create a table with more than 1600 columns? If I try and create a table with 1765 columns, I get: ERROR: tables can have at most 1600 columns -- Andrew (irc:RhodiumToad)
Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
I found this article: https://manual.limesurvey.org/Instructions_for_increasing_the_maximum_number_of_columns_in_PostgreSQL_on_Linux It seems I should modify: uint8 t_hoff; and replace it with something like: uint32 t_hoff; or uint64 t_hoff; And perhaps should I modify this too? The fix is easy enough, just adding a v_hoff = LLVMBuildZExt(b, v_hoff, LLVMInt32Type(), ""); fixes the issue for me. If that is the case, I am not sure what kind of modification we should do. I feel I need to explain why we create these huge tables. Basically we want to process big matrices for machine learning. Using tables with classic columns let us write very clear code. If we have to start using arrays as columns, things would become complicated and not intuitive (besides, some columns store vectors as arrays... ). We could use JSONB (we do, but for json documents). The problem is, storing large amounts of jsonb columns create performance issues (compared with normal tables). Since almost everybody is doing ML to apply to different products, perhaps are there other companies interested in a version of Postgres that could deal with tables with thousands of columns? I did not find any postgres package ready to use like that though. Pablo On Tue, Jan 29, 2019 at 12:11 AM pabloa98 wrote: > I did not modify it. > > I guess I should make it bigger than 1765. is 2400 or 3200 fine? > > My apologies if my questions look silly. I do not know about the internal > format of the database. > > Pablo > > On Mon, Jan 28, 2019 at 11:58 PM Andrew Gierth < > and...@tao11.riddles.org.uk> wrote: > >> > "pabloa98" == pabloa98 writes: >> >> pabloa98> the table baseline_denull has 1765 columns, >> >> Uhh... >> >> #define MaxHeapAttributeNumber 1600/* 8 * 200 */ >> >> Did you modify that? >> >> (The back of my envelope says that on 64bit, the largest usable t_hoff >> would be 248, of which 23 is fixed overhead leaving 225 as the max null >> bitmap size, giving a hard limit of 1800 for MaxTupleAttributeNumber and >> 1799 for MaxHeapAttributeNumber. And the concerns expressed in the >> comments above those #defines would obviously apply.) >> >> -- >> Andrew (irc:RhodiumToad) >> >
Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
I did not modify it. I guess I should make it bigger than 1765. is 2400 or 3200 fine? My apologies if my questions look silly. I do not know about the internal format of the database. Pablo On Mon, Jan 28, 2019 at 11:58 PM Andrew Gierth wrote: > > "pabloa98" == pabloa98 writes: > > pabloa98> the table baseline_denull has 1765 columns, > > Uhh... > > #define MaxHeapAttributeNumber 1600/* 8 * 200 */ > > Did you modify that? > > (The back of my envelope says that on 64bit, the largest usable t_hoff > would be 248, of which 23 is fixed overhead leaving 225 as the max null > bitmap size, giving a hard limit of 1800 for MaxTupleAttributeNumber and > 1799 for MaxHeapAttributeNumber. And the concerns expressed in the > comments above those #defines would obviously apply.) > > -- > Andrew (irc:RhodiumToad) >
Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
> "pabloa98" == pabloa98 writes: pabloa98> the table baseline_denull has 1765 columns, Uhh... #define MaxHeapAttributeNumber 1600/* 8 * 200 */ Did you modify that? (The back of my envelope says that on 64bit, the largest usable t_hoff would be 248, of which 23 is fixed overhead leaving 225 as the max null bitmap size, giving a hard limit of 1800 for MaxTupleAttributeNumber and 1799 for MaxHeapAttributeNumber. And the concerns expressed in the comments above those #defines would obviously apply.) -- Andrew (irc:RhodiumToad)
Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
pabloa98 writes: > I just migrated our databases from PostgreSQL version 9.6 to version 11.1. > We got a segmentation fault while running this query: > SELECT f_2110 as x FROM baseline_denull > ORDER BY eid ASC > limit 500 > OFFSET 131000; > the table baseline_denull has 1765 columns, mainly numbers, like: Hm, that sounds like it matches this recent bug fix: Author: Andres Freund Branch: master [b23852766] 2018-11-27 10:07:03 -0800 Branch: REL_11_STABLE [aee085bc0] 2018-11-27 10:07:43 -0800 Fix jit compilation bug on wide tables. The function generated to perform JIT compiled tuple deforming failed when HeapTupleHeader's t_hoff was bigger than a signed int8. I'd failed to realize that LLVM's getelementptr would treat an int8 index argument as signed, rather than unsigned. That means that a hoff larger than 127 would result in a negative offset being applied. Fix that by widening the index to 32bit. Add a testcase with a wide table. Don't drop it, as it seems useful to verify other tools deal properly with wide tables. Thanks to Justin Pryzby for both reporting a bug and then reducing it to a reproducible testcase! Reported-By: Justin Pryzby Author: Andres Freund Discussion: https://postgr.es/m/20181115223959.gb10...@telsasoft.com Backpatch: 11, just as jit compilation was This would result in failures on wide rows that contain some null entries. If your table is mostly-not-null, that would fit the observation that it only crashes on a few rows. Can you try REL_11_STABLE branch tip and see if it works for you? regards, tom lane
postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?
Hello I just migrated our databases from PostgreSQL version 9.6 to version 11.1. We got a segmentation fault while running this query: SELECT f_2110 as x FROM baseline_denull ORDER BY eid ASC limit 500 OFFSET 131000; It works in version 11,1 if offset + limit < 131000 approx (it is some number around it). It works too if I disable jit (it was enabled). So this works: set jit = 0; SELECT f_2110 as x FROM baseline_denull ORDER BY eid ASC limit 500 OFFSET 131000; It works all the time in version 9.6 The workaround seems to disable JIT. Is this a configuration problem or a bug? We are using a compiled version of Postgres because we have tables (like this one) with thousands of columns. This server was compiled as follows: In Ubuntu 16.04: sudo apt update sudo apt install --yes libcrypto++-utils libssl-dev libcrypto++-dev libsystemd-dev libpthread-stubs0-dev libpthread-workqueue-dev sudo apt install --yes docbook-xml docbook-xsl fop libxml2-utils xsltproc sudo apt install --yes gcc zlib1g-dev libreadline6-dev make sudo apt install --yes llvm-6.0 clang-6.0 sudo apt install --yes build-essential sudo apt install --yes opensp sudo locale-gen en_US.UTF-8 Download the source code: mkdir -p ~/soft cd ~/soft wget https://ftp.postgresql.org/pub/source/v11.1/postgresql-11.1.tar.gz tar xvzf postgresql-11.1.tar.gz cd postgresql-11.1/ ./configure --prefix=$HOME/soft/postgresql/postgresql-11 --with-extra-version=ps.2.0 --with-llvm --with-openssl --with-systemd --with-blocksize=32 --with-wal-blocksize=32 --with-system-tzdata=/usr/share/zoneinfo make world make check # 11 tests fail. I assumed it is because the planner behaves differently because the change of blocksize. make install-world $HOME/soft/postgresql/postgresql-11/bin/initdb -D $HOME/soft/postgresql/postgresql-11/data/ Changes in ./data/postgresql.conf: listen_addresses = '*' max_connections = 300 work_mem = 32MB maintenance_work_mem = 256MB shared_buffers = 1024MB log_timezone = 'US/Pacific' log_destination = 'csvlog' logging_collector = on log_filename = 'postgresql-%Y-%m-%d.log' log_rotation_size = 0 log_min_duration_statement = 1000 debug_print_parse = off debug_print_rewritten = off debug_print_plan = off log_temp_files = 1 jit = on # As a workaround, I turned off... but I want it on. The database is created as: CREATE DATABASE xxx WITH OWNER = user ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' TABLESPACE = pg_default CONNECTION LIMIT = -1; the table baseline_denull has 1765 columns, mainly numbers, like: CREATE TABLE public.baseline_denull ( eid integer, f_19 integer, f_21 integer, f_23 integer, f_31 integer, f_34 integer, f_35 integer, f_42 text COLLATE pg_catalog."default", f_43 text COLLATE pg_catalog."default", f_45 text COLLATE pg_catalog."default", f_46 integer, f_47 integer, f_48 double precision, f_49 double precision, f_50 double precision, f_51 double precision, f_52 integer, f_53 date, f_54 integer, f_68 integer, f_74 integer, f_77 double precision, f_78 double precision, f_84 integer[], f_87 integer[], f_92 integer[], f_93 integer[], f_94 integer[], f_95 integer[], f_96 integer[], f_102 integer[], f_120 integer, f_129 integer, etc and 1 index: CREATE INDEX baseline_denull_eid_idx ON public.baseline_denull USING btree (eid) TABLESPACE pg_default; I have a core saved, It says: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `postgres: user xxx 172.17.0.64(36654) SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x7f3c0c08c290 in ?? () (gdb) bt #0 0x7f3c0c08c290 in ?? () #1 0x in ?? () (gdb) quit How could I enable JIT again without getting a segmentation fault? Regards, Pablo