Re: [HACKERS] Getting server crash after running sqlsmith

2017-05-23 Thread Robert Haas
On Tue, May 23, 2017 at 9:45 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Just out of curiosity, what happens if you try it with the attached patch?
>
> Surely that's pretty unsafe?

Yes.  I was just curious to see whether it would work.  I think what
we need to do is teach pqsignal() to block all of the necessary
signals using sa_mask and then remove all of the explicit
blocking/unblocking logic from the signal handlers themselves.  IIUC,
the point of sa_mask is precisely that you want the operating system
to handle the save/restore of the signal mask rather than doing it
yourself in the handler, precisely because doing it in the handler
creates windows at the beginning and end of the handler where the mask
may not be what you want.

In the case of Linux and MacOS, at least, the default behavior (unless
SA_NODEFER is set) is to automatically block the signal currently
being handled, so there's likely no way to blow out the stack during
the brief window before PG_SETMASK() is called.  You could
receive some *other* signal during that window, but then that one
would blocked too, so I don't think you can stack up more frames this
way than the number of distinct signal handlers you have.  However,
the window at the end of the function - after PG_SETMASK()
has been invoked - can recurse arbitrarily deep.  At that point we've
unblocked the signal we're currently handling, so we're playing with
fire.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash after running sqlsmith

2017-05-23 Thread Tom Lane
Robert Haas  writes:
> Just out of curiosity, what happens if you try it with the attached patch?

Surely that's pretty unsafe?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash after running sqlsmith

2017-05-23 Thread tushar

On 05/23/2017 06:25 PM, Robert Haas wrote:

Just out of curiosity, what happens if you try it with the attached patch?

Thanks, issue seems to be fixed after applying your patch.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash after running sqlsmith

2017-05-23 Thread Robert Haas
On Tue, May 23, 2017 at 1:46 AM, tushar  wrote:
> On 03/29/2017 12:06 AM, Tom Lane wrote:
>>
>> Hm ... I don't see a crash here, but I wonder whether you have parameters
>> set that would cause this query to be run as a parallel query?  Because
>> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems
>> probably insane.
>
> Well, I am able to see a crash .  Enable "logging_collector=on" in
> postgresql.conf file / restart the server and fire below sql query - 5 or 6
> times

Just out of curiosity, what happens if you try it with the attached patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


no-sigmask.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash after running sqlsmith

2017-05-22 Thread tushar

On 03/29/2017 12:06 AM, Tom Lane wrote:

Hm ... I don't see a crash here, but I wonder whether you have parameters
set that would cause this query to be run as a parallel query?  Because
pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems
probably insane.
Well, I am able to see a crash .  Enable "logging_collector=on" in 
postgresql.conf file / restart the server and fire below sql query - 5 
or 6 times


select
  80 as c0,
  pg_catalog.pg_backend_pid() as c1,
  68 as c2,
  subq_1.c0 as c3,
  subq_1.c0 as c4
from
  (select
ref_0.specific_schema as c0
  from
information_schema.role_routine_grants as ref_0,
lateral (select
  ref_0.grantor as c0,
  50 as c1
from
  information_schema.routines as ref_1
where (63 = 86)
  or (pg_catalog.pg_advisory_lock(
  cast(ref_1.result_cast_datetime_precision as 
integer),

  cast(pg_catalog.bttidcmp(
cast(null as tid),
cast(null as tid)) as integer)) is NULL)
limit 143) as subq_0
  where pg_catalog.pg_rotate_logfile() is NULL) as subq_1
where 50 <> 45;

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash after running sqlsmith

2017-03-30 Thread tushar

On 03/29/2017 12:06 AM, Tom Lane wrote:

Hm ... I don't see a crash here,

I am getting this issue only on Linux3.

  but I wonder whether you have parameters
set that would cause this query to be run as a parallel query?  Because
pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems
probably insane.
No, i have not changed any parameters except logging_collector=on in 
postgresql.conf file.


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash after running sqlsmith

2017-03-28 Thread Thomas Munro
On Wed, Mar 29, 2017 at 2:23 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Mar 28, 2017 at 2:36 PM, Tom Lane  wrote:
>>> Hm ... I don't see a crash here, but I wonder whether you have parameters
>>> set that would cause this query to be run as a parallel query?  Because
>>> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems
>>> probably insane.
>
>> /me blinks
>
>> Uh, what's insane about that?  All it does is test a GUC (which is
>> surely parallel-safe) and call SendPostmasterSignal (which seems safe,
>> too).
>
> Well, if you don't like that theory, what's yours?

I can't reproduce this either.  But here's a theory: this query
signals the postmaster repeatedly fast, and with just the right kind
of difficulty scheduling/waking to the postmaster to deliver the
signal on an overloaded machine, maybe there is always a new SIGUSR1
and PMSIGNAL_ROTATE_LOGFILE waiting once the signal handler reaches
PG_SETMASK(), at which point it immediately recurses into
the signal handler until it blows the stack.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash after running sqlsmith

2017-03-28 Thread Robert Haas
On Tue, Mar 28, 2017 at 9:23 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Mar 28, 2017 at 2:36 PM, Tom Lane  wrote:
>>> Hm ... I don't see a crash here, but I wonder whether you have parameters
>>> set that would cause this query to be run as a parallel query?  Because
>>> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems
>>> probably insane.
>
>> /me blinks
>
>> Uh, what's insane about that?  All it does is test a GUC (which is
>> surely parallel-safe) and call SendPostmasterSignal (which seems safe,
>> too).
>
> Well, if you don't like that theory, what's yours?

Gremlins?

The stack trace seems to show that the process is receiving SIGUSR1 at
a very high rate.  Every time sigusr1_handler() reaches
PG_SETMASK(), it immediately gets a SIGUSR1 and jumps back
into sigusr1_handler().  Now, this seems like a design flaw in
sigusr1_handler().  Likely the operating system blocks SIGUSR1 on
entry to the signal handler so that it's not possible for a high rate
of signal delivery to blow out the stack, but we forcibly unblock it
before returning, thus exposing ourselves to blowing out the stack.
And we have, apparently, no stack depth check here nor any other way
of preventing the infinite recursion.

I imagine here the behavior is platform-dependent, but I'd guess that
select pg_current_logfile() from generate_series(1,100) g might
reproduce this on affected platforms with or without parallel query in
the mix.  It looks like we've conveniently provided both a function
that can be used to SIGUSR1 the heck out of the postmaster and a
postmaster that is, at least on such platforms, vulnerable to crashing
if you do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash after running sqlsmith

2017-03-28 Thread Tom Lane
Robert Haas  writes:
> On Tue, Mar 28, 2017 at 2:36 PM, Tom Lane  wrote:
>> Hm ... I don't see a crash here, but I wonder whether you have parameters
>> set that would cause this query to be run as a parallel query?  Because
>> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems
>> probably insane.

> /me blinks

> Uh, what's insane about that?  All it does is test a GUC (which is
> surely parallel-safe) and call SendPostmasterSignal (which seems safe,
> too).

Well, if you don't like that theory, what's yours?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash after running sqlsmith

2017-03-28 Thread Robert Haas
On Tue, Mar 28, 2017 at 2:36 PM, Tom Lane  wrote:
> tushar  writes:
>> After runinng sqlsmith against latest sources of PG v10  , able to see a
>> crash -
>
> Hm ... I don't see a crash here, but I wonder whether you have parameters
> set that would cause this query to be run as a parallel query?  Because
> pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems
> probably insane.

/me blinks

Uh, what's insane about that?  All it does is test a GUC (which is
surely parallel-safe) and call SendPostmasterSignal (which seems safe,
too).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash after running sqlsmith

2017-03-28 Thread Tom Lane
tushar  writes:
> After runinng sqlsmith against latest sources of PG v10  , able to see a 
> crash -

Hm ... I don't see a crash here, but I wonder whether you have parameters
set that would cause this query to be run as a parallel query?  Because
pg_rotate_logfile() is marked as parallel-safe in pg_proc, which seems
probably insane.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Getting server crash after running sqlsmith

2017-03-28 Thread tushar

Hi,

After runinng sqlsmith against latest sources of PG v10  , able to see a 
crash -


here is the standalone testcase -

Make sure 'logging_collector=on' in postgresql.conf file
Connect to psql terminal ,run this query
postgres=# select
  80 as c0,
  pg_catalog.pg_backend_pid() as c1,
  68 as c2,
  subq_1.c0 as c3,
  subq_1.c0 as c4
from
  (select
ref_0.specific_schema as c0
  from
information_schema.role_routine_grants as ref_0,
lateral (select
  ref_0.grantor as c0,
  50 as c1
from
  information_schema.routines as ref_1
where (63 = 86)
  or (pg_catalog.pg_advisory_lock(
  cast(ref_1.result_cast_datetime_precision as 
integer),

  cast(pg_catalog.bttidcmp(
cast(null as tid),
cast(null as tid)) as integer)) is NULL)
limit 143) as subq_0
  where pg_catalog.pg_rotate_logfile() is NULL) as subq_1
where 50 <> 45;
 c0 | c1 | c2 | c3 | c4
++++
(0 rows)

postgres=# select 1;
FATAL:  terminating connection due to unexpected postmaster exit
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

here is the stack trace -
[centos@tushar-centos bin]$ gdb -q -c mdata/core.4254 
/home/centos/pg10_28march/postgresql/edbpsql/bin/postgres
Reading symbols from 
/home/centos/pg10_28march/postgresql/edbpsql/bin/postgres...done.

[New Thread 4254]
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debug*' install 
/usr/lib/debug/.build-id/5f/7d4ef6f6ba15505d3c42a7a09e2a7ca9ae5ba6

--
--
Loaded symbols for /lib/libkrb5support.so.0
Reading symbols from /lib/libkeyutils.so.1...(no debugging symbols 
found)...done.

Loaded symbols for /lib/libkeyutils.so.1
Reading symbols from /lib/libselinux.so.1...(no debugging symbols 
found)...done.

Loaded symbols for /lib/libselinux.so.1
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols 
found)...done.

Loaded symbols for /lib/libnss_files.so.2
Core was generated by 
`/home/centos/pg10_28march/postgresql/edbpsql/bin/postgres -D mdata'.

Program terminated with signal 11, Segmentation fault.
#0  0x00a75424 in __kernel_vsyscall ()
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.132.el6.i686 keyutils-libs-1.4-5.el6.i686 
krb5-libs-1.10.3-57.el6.i686 libcom_err-1.41.12-22.el6.i686 
libselinux-2.0.94-7.el6.i686 openssl-1.0.1e-48.el6_8.4.i686 
zlib-1.2.3-29.el6.i686

(gdb) bt
#0  0x00a75424 in __kernel_vsyscall ()
#1  0x00aa1d7b in sigprocmask () from /lib/libc.so.6
#2  0x083d2d79 in sigusr1_handler (postgres_signal_arg=10) at 
postmaster.c:5081

#3  
#4  0x00a75424 in __kernel_vsyscall ()
#5  0x00aa1d7b in sigprocmask () from /lib/libc.so.6
#6  0x083d2d79 in sigusr1_handler (postgres_signal_arg=10) at 
postmaster.c:5081

#7  
#8  0x00a75424 in __kernel_vsyscall ()
#9  0x00aa1d7b in sigprocmask () from /lib/libc.so.6
#10 0x083d2d79 in sigusr1_handler (postgres_signal_arg=10) at 
postmaster.c:5081

#11 
#12 0x00a75424 in __kernel_vsyscall ()
--
--
#52380 0x00a75424 in __kernel_vsyscall ()
#52381 0x00aa1d7b in sigprocmask () from /lib/libc.so.6
#52382 0x083d2d79 in sigusr1_handler (postgres_signal_arg=10) at 
postmaster.c:5081

#52383 
#52384 0x00a75424 in __kernel_vsyscall ()
#52385 0x00b5208d in ___newselect_nocancel () from /lib/libc.so.6
#52386 0x083ce40e in ServerLoop () at postmaster.c:1693
#52387 0x083cdbcb in PostmasterMain (argc=3, argv=0x957ca10) at 
postmaster.c:1337

#52388 0x083236fc in main (argc=3, argv=0x957ca10) at main.c:228

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company