Re: [HACKERS] Getting server crash after running sqlsmith
On Tue, May 23, 2017 at 9:45 AM, Tom Lane wrote: > Robert Haas writes: >> Just out of curiosity, what happens if you try it with the attached patch? > > Surely that's pretty unsafe? Yes. I was just curious to see whether it would work. I think what we need to do is teach pqsignal() to block all of the necessary signals using sa_mask and then remove all of the explicit blocking/unblocking logic from the signal handlers themselves. IIUC, the point of sa_mask is precisely that you want the operating system to handle the save/restore of the signal mask rather than doing it yourself in the handler, precisely because doing it in the handler creates windows at the beginning and end of the handler where the mask may not be what you want. In the case of Linux and MacOS, at least, the default behavior (unless SA_NODEFER is set) is to automatically block the signal currently being handled, so there's likely no way to blow out the stack during the brief window before PG_SETMASK(&BlockSig) is called. You could receive some *other* signal during that window, but then that one would blocked too, so I don't think you can stack up more frames this way than the number of distinct signal handlers you have. However, the window at the end of the function - after PG_SETMASK(&UnBlockSig) has been invoked - can recurse arbitrarily deep. At that point we've unblocked the signal we're currently handling, so we're playing with fire. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting server crash after running sqlsmith
Robert Haas writes: > Just out of curiosity, what happens if you try it with the attached patch? Surely that's pretty unsafe? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting server crash after running sqlsmith
On 05/23/2017 06:25 PM, Robert Haas wrote: Just out of curiosity, what happens if you try it with the attached patch? Thanks, issue seems to be fixed after applying your patch. -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting server crash after running sqlsmith
On Tue, May 23, 2017 at 1:46 AM, tushar wrote: > On 03/29/2017 12:06 AM, Tom Lane wrote: >> >> Hm ... I don't see a crash here, but I wonder whether you have parameters >> set that would cause this query to be run as a parallel query? Because >> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems >> probably insane. > > Well, I am able to see a crash . Enable "logging_collector=on" in > postgresql.conf file / restart the server and fire below sql query - 5 or 6 > times Just out of curiosity, what happens if you try it with the attached patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company no-sigmask.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting server crash after running sqlsmith
On 03/29/2017 12:06 AM, Tom Lane wrote: Hm ... I don't see a crash here, but I wonder whether you have parameters set that would cause this query to be run as a parallel query? Because pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems probably insane. Well, I am able to see a crash . Enable "logging_collector=on" in postgresql.conf file / restart the server and fire below sql query - 5 or 6 times select 80 as c0, pg_catalog.pg_backend_pid() as c1, 68 as c2, subq_1.c0 as c3, subq_1.c0 as c4 from (select ref_0.specific_schema as c0 from information_schema.role_routine_grants as ref_0, lateral (select ref_0.grantor as c0, 50 as c1 from information_schema.routines as ref_1 where (63 = 86) or (pg_catalog.pg_advisory_lock( cast(ref_1.result_cast_datetime_precision as integer), cast(pg_catalog.bttidcmp( cast(null as tid), cast(null as tid)) as integer)) is NULL) limit 143) as subq_0 where pg_catalog.pg_rotate_logfile() is NULL) as subq_1 where 50 <> 45; -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting server crash after running sqlsmith
On 03/29/2017 12:06 AM, Tom Lane wrote: Hm ... I don't see a crash here, I am getting this issue only on Linux3. but I wonder whether you have parameters set that would cause this query to be run as a parallel query? Because pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems probably insane. No, i have not changed any parameters except logging_collector=on in postgresql.conf file. -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting server crash after running sqlsmith
On Wed, Mar 29, 2017 at 2:23 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Mar 28, 2017 at 2:36 PM, Tom Lane wrote: >>> Hm ... I don't see a crash here, but I wonder whether you have parameters >>> set that would cause this query to be run as a parallel query? Because >>> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems >>> probably insane. > >> /me blinks > >> Uh, what's insane about that? All it does is test a GUC (which is >> surely parallel-safe) and call SendPostmasterSignal (which seems safe, >> too). > > Well, if you don't like that theory, what's yours? I can't reproduce this either. But here's a theory: this query signals the postmaster repeatedly fast, and with just the right kind of difficulty scheduling/waking to the postmaster to deliver the signal on an overloaded machine, maybe there is always a new SIGUSR1 and PMSIGNAL_ROTATE_LOGFILE waiting once the signal handler reaches PG_SETMASK(&UnBlockSig), at which point it immediately recurses into the signal handler until it blows the stack. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting server crash after running sqlsmith
On Tue, Mar 28, 2017 at 9:23 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Mar 28, 2017 at 2:36 PM, Tom Lane wrote: >>> Hm ... I don't see a crash here, but I wonder whether you have parameters >>> set that would cause this query to be run as a parallel query? Because >>> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems >>> probably insane. > >> /me blinks > >> Uh, what's insane about that? All it does is test a GUC (which is >> surely parallel-safe) and call SendPostmasterSignal (which seems safe, >> too). > > Well, if you don't like that theory, what's yours? Gremlins? The stack trace seems to show that the process is receiving SIGUSR1 at a very high rate. Every time sigusr1_handler() reaches PG_SETMASK(&UnBlockSig), it immediately gets a SIGUSR1 and jumps back into sigusr1_handler(). Now, this seems like a design flaw in sigusr1_handler(). Likely the operating system blocks SIGUSR1 on entry to the signal handler so that it's not possible for a high rate of signal delivery to blow out the stack, but we forcibly unblock it before returning, thus exposing ourselves to blowing out the stack. And we have, apparently, no stack depth check here nor any other way of preventing the infinite recursion. I imagine here the behavior is platform-dependent, but I'd guess that select pg_current_logfile() from generate_series(1,100) g might reproduce this on affected platforms with or without parallel query in the mix. It looks like we've conveniently provided both a function that can be used to SIGUSR1 the heck out of the postmaster and a postmaster that is, at least on such platforms, vulnerable to crashing if you do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting server crash after running sqlsmith
Robert Haas writes: > On Tue, Mar 28, 2017 at 2:36 PM, Tom Lane wrote: >> Hm ... I don't see a crash here, but I wonder whether you have parameters >> set that would cause this query to be run as a parallel query? Because >> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems >> probably insane. > /me blinks > Uh, what's insane about that? All it does is test a GUC (which is > surely parallel-safe) and call SendPostmasterSignal (which seems safe, > too). Well, if you don't like that theory, what's yours? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting server crash after running sqlsmith
On Tue, Mar 28, 2017 at 2:36 PM, Tom Lane wrote: > tushar writes: >> After runinng sqlsmith against latest sources of PG v10 , able to see a >> crash - > > Hm ... I don't see a crash here, but I wonder whether you have parameters > set that would cause this query to be run as a parallel query? Because > pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems > probably insane. /me blinks Uh, what's insane about that? All it does is test a GUC (which is surely parallel-safe) and call SendPostmasterSignal (which seems safe, too). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting server crash after running sqlsmith
tushar writes: > After runinng sqlsmith against latest sources of PG v10 , able to see a > crash - Hm ... I don't see a crash here, but I wonder whether you have parameters set that would cause this query to be run as a parallel query? Because pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems probably insane. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Getting server crash after running sqlsmith
Hi, After runinng sqlsmith against latest sources of PG v10 , able to see a crash - here is the standalone testcase - Make sure 'logging_collector=on' in postgresql.conf file Connect to psql terminal ,run this query postgres=# select 80 as c0, pg_catalog.pg_backend_pid() as c1, 68 as c2, subq_1.c0 as c3, subq_1.c0 as c4 from (select ref_0.specific_schema as c0 from information_schema.role_routine_grants as ref_0, lateral (select ref_0.grantor as c0, 50 as c1 from information_schema.routines as ref_1 where (63 = 86) or (pg_catalog.pg_advisory_lock( cast(ref_1.result_cast_datetime_precision as integer), cast(pg_catalog.bttidcmp( cast(null as tid), cast(null as tid)) as integer)) is NULL) limit 143) as subq_0 where pg_catalog.pg_rotate_logfile() is NULL) as subq_1 where 50 <> 45; c0 | c1 | c2 | c3 | c4 ++++ (0 rows) postgres=# select 1; FATAL: terminating connection due to unexpected postmaster exit server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> \q here is the stack trace - [centos@tushar-centos bin]$ gdb -q -c mdata/core.4254 /home/centos/pg10_28march/postgresql/edbpsql/bin/postgres Reading symbols from /home/centos/pg10_28march/postgresql/edbpsql/bin/postgres...done. [New Thread 4254] Missing separate debuginfo for Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/5f/7d4ef6f6ba15505d3c42a7a09e2a7ca9ae5ba6 -- -- Loaded symbols for /lib/libkrb5support.so.0 Reading symbols from /lib/libkeyutils.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libkeyutils.so.1 Reading symbols from /lib/libselinux.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libselinux.so.1 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 Core was generated by `/home/centos/pg10_28march/postgresql/edbpsql/bin/postgres -D mdata'. Program terminated with signal 11, Segmentation fault. #0 0x00a75424 in __kernel_vsyscall () Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6.i686 keyutils-libs-1.4-5.el6.i686 krb5-libs-1.10.3-57.el6.i686 libcom_err-1.41.12-22.el6.i686 libselinux-2.0.94-7.el6.i686 openssl-1.0.1e-48.el6_8.4.i686 zlib-1.2.3-29.el6.i686 (gdb) bt #0 0x00a75424 in __kernel_vsyscall () #1 0x00aa1d7b in sigprocmask () from /lib/libc.so.6 #2 0x083d2d79 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5081 #3 #4 0x00a75424 in __kernel_vsyscall () #5 0x00aa1d7b in sigprocmask () from /lib/libc.so.6 #6 0x083d2d79 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5081 #7 #8 0x00a75424 in __kernel_vsyscall () #9 0x00aa1d7b in sigprocmask () from /lib/libc.so.6 #10 0x083d2d79 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5081 #11 #12 0x00a75424 in __kernel_vsyscall () -- -- #52380 0x00a75424 in __kernel_vsyscall () #52381 0x00aa1d7b in sigprocmask () from /lib/libc.so.6 #52382 0x083d2d79 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5081 #52383 #52384 0x00a75424 in __kernel_vsyscall () #52385 0x00b5208d in ___newselect_nocancel () from /lib/libc.so.6 #52386 0x083ce40e in ServerLoop () at postmaster.c:1693 #52387 0x083cdbcb in PostmasterMain (argc=3, argv=0x957ca10) at postmaster.c:1337 #52388 0x083236fc in main (argc=3, argv=0x957ca10) at main.c:228 -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company