On Tue, Mar 1, 2016 at 12:34 AM, Rushabh Lathia <rushabh.lat...@gmail.com> wrote: > During the testing of parallel query (with force_parallel_mode = regress), > noticed random server crash with the below stack: > > #0 0x0000003fc84896d5 in memcpy () from /lib64/libc.so.6 > #1 0x0000000000a36867 in SerializeSnapshot (snapshot=0x1e49f40, > start_address=0x7f391e9ec728 <Address 0x7f391e9ec728 out of bounds>) at > snapmgr.c:1523 > #2 0x0000000000522a20 in InitializeParallelDSM (pcxt=0x1e49ce0) at > parallel.c:330 > #3 0x00000000006dd256 in ExecInitParallelPlan (planstate=0x1f012b0, > estate=0x1f00be8, nworkers=1) at execParallel.c:398 > #4 0x00000000006f8abb in ExecGather (node=0x1f00d00) at nodeGather.c:160 > #5 0x00000000006de42e in ExecProcNode (node=0x1f00d00) at > execProcnode.c:516 > #6 0x00000000006da4fd in ExecutePlan (estate=0x1f00be8, > planstate=0x1f00d00, use_parallel_mode=1 '\001', operation=CMD_SELECT, > sendTuples=1 '\001', numberTuples=0, > direction=ForwardScanDirection, dest=0x1e5e118) at execMain.c:1633 > > So started looking into SerializeSnapshot() and with code reading I found > that > we ignore copying the SubXID array if it has overflowed, unless the snapshot > was taken during recovey, and for this we mark the > serialized_snapshot->subxcnt > to 0. But later while copying the SubXID array we check then condition based > on > snapshot->subxcnt. We should check serialized_snapshot->subxcnt rather then > snapshot->subxcnt. > > I tried hard to come up with individual test but somehow I was unable to > create testcase. > > PFA patch to fix the issue.
Thanks, that looks right to me. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers