Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-02-12 Thread pabloa98
I tried. It works
Thanks for the information.
P

On Mon, Jan 28, 2019, 7:28 PM Tom Lane  wrote:

> pabloa98  writes:
> > I just migrated our databases from PostgreSQL version 9.6 to version
> 11.1.
> > We got a segmentation fault while running this query:
>
> > SELECT f_2110 as x FROM baseline_denull
> > ORDER BY eid ASC
> > limit 500
> > OFFSET 131000;
>
> > the table baseline_denull has 1765 columns, mainly numbers, like:
>
> Hm, that sounds like it matches this recent bug fix:
>
> Author: Andres Freund 
> Branch: master [b23852766] 2018-11-27 10:07:03 -0800
> Branch: REL_11_STABLE [aee085bc0] 2018-11-27 10:07:43 -0800
>
> Fix jit compilation bug on wide tables.
>
> The function generated to perform JIT compiled tuple deforming failed
> when HeapTupleHeader's t_hoff was bigger than a signed int8. I'd
> failed to realize that LLVM's getelementptr would treat an int8 index
> argument as signed, rather than unsigned.  That means that a hoff
> larger than 127 would result in a negative offset being applied.  Fix
> that by widening the index to 32bit.
>
> Add a testcase with a wide table. Don't drop it, as it seems useful to
> verify other tools deal properly with wide tables.
>
> Thanks to Justin Pryzby for both reporting a bug and then reducing it
> to a reproducible testcase!
>
> Reported-By: Justin Pryzby
> Author: Andres Freund
> Discussion: https://postgr.es/m/20181115223959.gb10...@telsasoft.com
> Backpatch: 11, just as jit compilation was
>
>
> This would result in failures on wide rows that contain some null
> entries.  If your table is mostly-not-null, that would fit the
> observation that it only crashes on a few rows.
>
> Can you try REL_11_STABLE branch tip and see if it works for you?
>
> regards, tom lane
>


Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-02-12 Thread Justin Pryzby
On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote:
> The fix is easy enough, just adding a
> v_hoff = LLVMBuildZExt(b, v_hoff, LLVMInt32Type(), "");
> fixes the issue for me.

On Tue, Jan 29, 2019 at 12:38:38AM -0800, pabloa98 wrote:
> And perhaps should I modify this too?
> If that is the case, I am not sure what kind of modification we should do.

Andres commited the fix in November, and it's included in postgres11.2, which
is scheduled to be released Thursday.  So we'll both be able to re-enable JIT
on our wide tables again.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b23852766

Justin



Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-29 Thread pabloa98
I checked the table. It has 1265 columns. Sorry about the typo.

Pablo

On Tue, Jan 29, 2019 at 1:10 AM Andrew Gierth 
wrote:

> > "pabloa98" == pabloa98   writes:
>
>  pabloa98> I did not modify it.
>
> Then how did you create a table with more than 1600 columns? If I try
> and create a table with 1765 columns, I get:
>
> ERROR:  tables can have at most 1600 columns
>
> --
> Andrew (irc:RhodiumToad)
>


Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-29 Thread pabloa98
I appreciate your advice. I will check the number of columns in that table.



On Tue, Jan 29, 2019, 1:53 AM Andrew Gierth 
wrote:

> > "pabloa98" == pabloa98   writes:
>
>  pabloa98> I found this article:
>
>  pabloa98>
> https://manual.limesurvey.org/Instructions_for_increasing_the_maximum_number_of_columns_in_PostgreSQL_on_Linux
>
> Those instructions contain obvious errors.
>
>  pabloa98> It seems I should modify: uint8 t_hoff;
>  pabloa98> and replace it with something like: uint32 t_hoff; or uint64
> t_hoff;
>
> At the very least, that ought to be uint16 t_hoff; since there is never
> any possibility of hoff being larger than 32k since that's the largest
> allowed pagesize. However, if you modify that, it's then up to you to
> ensure that all the code that assumes it's a uint8 is found and fixed.
> I have no idea what else would break.
>
> --
> Andrew (irc:RhodiumToad)
>


Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-29 Thread Andrew Gierth
> "pabloa98" == pabloa98   writes:

 pabloa98> I found this article:

 pabloa98> 
https://manual.limesurvey.org/Instructions_for_increasing_the_maximum_number_of_columns_in_PostgreSQL_on_Linux

Those instructions contain obvious errors.

 pabloa98> It seems I should modify: uint8 t_hoff;
 pabloa98> and replace it with something like: uint32 t_hoff; or uint64 t_hoff;

At the very least, that ought to be uint16 t_hoff; since there is never
any possibility of hoff being larger than 32k since that's the largest
allowed pagesize. However, if you modify that, it's then up to you to
ensure that all the code that assumes it's a uint8 is found and fixed.
I have no idea what else would break.

-- 
Andrew (irc:RhodiumToad)



Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-29 Thread Andrew Gierth
> "pabloa98" == pabloa98   writes:

 pabloa98> I did not modify it.

Then how did you create a table with more than 1600 columns? If I try
and create a table with 1765 columns, I get:

ERROR:  tables can have at most 1600 columns

-- 
Andrew (irc:RhodiumToad)



Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-29 Thread pabloa98
I found this article:

https://manual.limesurvey.org/Instructions_for_increasing_the_maximum_number_of_columns_in_PostgreSQL_on_Linux

It seems I should modify: uint8 t_hoff;
and replace it with something like: uint32 t_hoff; or uint64 t_hoff;

And perhaps should I modify this too?

The fix is easy enough, just adding a
v_hoff = LLVMBuildZExt(b, v_hoff, LLVMInt32Type(), "");
fixes the issue for me.
If that is the case, I am not sure what kind of modification we should do.


I feel I need to explain why we create these huge tables. Basically we want
to process big matrices for machine learning.
Using tables with classic columns let us write very clear code. If we have
to start using arrays as columns, things would become complicated and not
intuitive (besides, some columns store vectors as arrays... ).

We could use JSONB (we do, but for json documents). The problem is, storing
large amounts of jsonb columns create performance issues (compared with
normal tables).

Since almost everybody is doing ML to apply to different products, perhaps
are there other companies interested in a version of Postgres that could
deal with tables with thousands of columns?
I did not find any postgres package ready to use like that though.

Pablo




On Tue, Jan 29, 2019 at 12:11 AM pabloa98  wrote:

> I did not modify it.
>
> I guess I should make it bigger than 1765. is 2400 or 3200 fine?
>
> My apologies if my questions look silly. I do not know about the internal
> format of the database.
>
> Pablo
>
> On Mon, Jan 28, 2019 at 11:58 PM Andrew Gierth <
> and...@tao11.riddles.org.uk> wrote:
>
>> > "pabloa98" == pabloa98   writes:
>>
>>  pabloa98> the table baseline_denull has 1765 columns,
>>
>> Uhh...
>>
>> #define MaxHeapAttributeNumber  1600/* 8 * 200 */
>>
>> Did you modify that?
>>
>> (The back of my envelope says that on 64bit, the largest usable t_hoff
>> would be 248, of which 23 is fixed overhead leaving 225 as the max null
>> bitmap size, giving a hard limit of 1800 for MaxTupleAttributeNumber and
>> 1799 for MaxHeapAttributeNumber. And the concerns expressed in the
>> comments above those #defines would obviously apply.)
>>
>> --
>> Andrew (irc:RhodiumToad)
>>
>


Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-29 Thread pabloa98
I did not modify it.

I guess I should make it bigger than 1765. is 2400 or 3200 fine?

My apologies if my questions look silly. I do not know about the internal
format of the database.

Pablo

On Mon, Jan 28, 2019 at 11:58 PM Andrew Gierth 
wrote:

> > "pabloa98" == pabloa98   writes:
>
>  pabloa98> the table baseline_denull has 1765 columns,
>
> Uhh...
>
> #define MaxHeapAttributeNumber  1600/* 8 * 200 */
>
> Did you modify that?
>
> (The back of my envelope says that on 64bit, the largest usable t_hoff
> would be 248, of which 23 is fixed overhead leaving 225 as the max null
> bitmap size, giving a hard limit of 1800 for MaxTupleAttributeNumber and
> 1799 for MaxHeapAttributeNumber. And the concerns expressed in the
> comments above those #defines would obviously apply.)
>
> --
> Andrew (irc:RhodiumToad)
>


Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-28 Thread Andrew Gierth
> "pabloa98" == pabloa98   writes:

 pabloa98> the table baseline_denull has 1765 columns,

Uhh...

#define MaxHeapAttributeNumber  1600/* 8 * 200 */

Did you modify that?

(The back of my envelope says that on 64bit, the largest usable t_hoff
would be 248, of which 23 is fixed overhead leaving 225 as the max null
bitmap size, giving a hard limit of 1800 for MaxTupleAttributeNumber and
1799 for MaxHeapAttributeNumber. And the concerns expressed in the
comments above those #defines would obviously apply.)

-- 
Andrew (irc:RhodiumToad)



Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-28 Thread Tom Lane
pabloa98  writes:
> I just migrated our databases from PostgreSQL version 9.6 to version 11.1.
> We got a segmentation fault while running this query:

> SELECT f_2110 as x FROM baseline_denull
> ORDER BY eid ASC
> limit 500
> OFFSET 131000;

> the table baseline_denull has 1765 columns, mainly numbers, like:

Hm, that sounds like it matches this recent bug fix:

Author: Andres Freund 
Branch: master [b23852766] 2018-11-27 10:07:03 -0800
Branch: REL_11_STABLE [aee085bc0] 2018-11-27 10:07:43 -0800

Fix jit compilation bug on wide tables.

The function generated to perform JIT compiled tuple deforming failed
when HeapTupleHeader's t_hoff was bigger than a signed int8. I'd
failed to realize that LLVM's getelementptr would treat an int8 index
argument as signed, rather than unsigned.  That means that a hoff
larger than 127 would result in a negative offset being applied.  Fix
that by widening the index to 32bit.

Add a testcase with a wide table. Don't drop it, as it seems useful to
verify other tools deal properly with wide tables.

Thanks to Justin Pryzby for both reporting a bug and then reducing it
to a reproducible testcase!

Reported-By: Justin Pryzby
Author: Andres Freund
Discussion: https://postgr.es/m/20181115223959.gb10...@telsasoft.com
Backpatch: 11, just as jit compilation was


This would result in failures on wide rows that contain some null
entries.  If your table is mostly-not-null, that would fit the
observation that it only crashes on a few rows.

Can you try REL_11_STABLE branch tip and see if it works for you?

regards, tom lane



postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-01-28 Thread pabloa98
Hello

I just migrated our databases from PostgreSQL version 9.6 to version 11.1.
We got a segmentation fault while running this query:

SELECT f_2110 as x FROM baseline_denull
ORDER BY eid ASC
limit 500
OFFSET 131000;

It works in version 11,1 if offset + limit < 131000 approx (it is some
number around it).

It works too if I disable jit (it was enabled). So this works:

set jit = 0;
SELECT f_2110 as x FROM baseline_denull
ORDER BY eid ASC
limit 500
OFFSET 131000;

It works all the time in version 9.6


The workaround seems to disable JIT. Is this a configuration problem or a
bug?

We are using a compiled version of Postgres because we have tables (like
this one) with thousands of columns.

This server was compiled as follows:

In Ubuntu 16.04:

sudo apt update
sudo apt install --yes libcrypto++-utils libssl-dev libcrypto++-dev
libsystemd-dev libpthread-stubs0-dev libpthread-workqueue-dev
sudo apt install --yes docbook-xml docbook-xsl fop libxml2-utils xsltproc
sudo apt install --yes gcc zlib1g-dev libreadline6-dev make
sudo apt install --yes llvm-6.0 clang-6.0
sudo apt install --yes build-essential
sudo apt install --yes opensp
sudo locale-gen en_US.UTF-8

Download the source code:

mkdir -p ~/soft
cd ~/soft
wget https://ftp.postgresql.org/pub/source/v11.1/postgresql-11.1.tar.gz
tar xvzf postgresql-11.1.tar.gz
cd postgresql-11.1/

./configure --prefix=$HOME/soft/postgresql/postgresql-11
--with-extra-version=ps.2.0 --with-llvm --with-openssl --with-systemd
--with-blocksize=32 --with-wal-blocksize=32
--with-system-tzdata=/usr/share/zoneinfo


make world
make check   # 11 tests fail. I assumed it is because the planner behaves
differently because the change of blocksize.

make install-world


$HOME/soft/postgresql/postgresql-11/bin/initdb -D
$HOME/soft/postgresql/postgresql-11/data/

Changes in ./data/postgresql.conf:

listen_addresses = '*'
max_connections = 300
work_mem = 32MB
maintenance_work_mem = 256MB
shared_buffers = 1024MB
log_timezone = 'US/Pacific'
log_destination = 'csvlog'
logging_collector = on
log_filename = 'postgresql-%Y-%m-%d.log'
log_rotation_size = 0
log_min_duration_statement = 1000
debug_print_parse = off
debug_print_rewritten = off
debug_print_plan = off
log_temp_files = 1

jit = on  # As a workaround, I turned off... but I want it on.



The database is created as:

CREATE DATABASE xxx
WITH
OWNER = user
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
TABLESPACE = pg_default
CONNECTION LIMIT = -1;

the table baseline_denull has 1765 columns, mainly numbers, like:

CREATE TABLE public.baseline_denull
(
eid integer,
f_19 integer,
f_21 integer,
f_23 integer,
f_31 integer,
f_34 integer,
f_35 integer,
f_42 text COLLATE pg_catalog."default",
f_43 text COLLATE pg_catalog."default",
f_45 text COLLATE pg_catalog."default",
f_46 integer,
f_47 integer,
f_48 double precision,
f_49 double precision,
f_50 double precision,
f_51 double precision,
f_52 integer,
f_53 date,
f_54 integer,
f_68 integer,
f_74 integer,
f_77 double precision,
f_78 double precision,
f_84 integer[],
f_87 integer[],
f_92 integer[],
f_93 integer[],
f_94 integer[],
f_95 integer[],
f_96 integer[],
f_102 integer[],
f_120 integer,
f_129 integer,

etc

and 1 index:

CREATE INDEX baseline_denull_eid_idx
ON public.baseline_denull USING btree
(eid)
TABLESPACE pg_default;


I have a core saved, It says:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: user xxx 172.17.0.64(36654)
SELECT  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f3c0c08c290 in ?? ()
(gdb) bt
#0  0x7f3c0c08c290 in ?? ()
#1  0x in ?? ()
(gdb) quit


How could I enable JIT again without getting a segmentation fault?

Regards,

Pablo