Re: [HACKERS] Parallel Append implementation
On 08/01/2018 03:14 PM, Robert Haas wrote: Committed to master and v11. Thanks for the review. Thanks!
Re: [HACKERS] Parallel Append implementation
On Mon, Jul 30, 2018 at 8:02 PM, Thomas Munro wrote: > On Tue, Jul 31, 2018 at 5:05 AM, Robert Haas wrote: >> New version attached. > > Looks good to me. Committed to master and v11. Thanks for the review. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Parallel Append implementation
On Tue, Jul 31, 2018 at 5:05 AM, Robert Haas wrote: > New version attached. Looks good to me. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Parallel Append implementation
On Sun, Jul 29, 2018 at 5:49 PM, Thomas Munro wrote: > On Thu, May 10, 2018 at 7:08 AM, Robert Haas wrote: >> [parallel-append-doc-v2.patch] > > +plans just as they can in any other plan. However, in a parallel plan, > +it is also possible that the planner may choose to substitute a > +Parallel Append node. > > Maybe drop "it is also possible that "? It seems a bit unnecessary > and sounds a bit odd followed by "may ", but maybe it's just me. Changed. > +Also, unlike a regular Append node, which can only > have > +partial children when used within a parallel plan, Parallel > +Append node can have both partial and non-partial child plans. > > Missing "a" before "Parallel". Fixed. > +Non-partial children will be scanned by only a single worker, since > > Are we using "worker" in a more general sense that possibly includes > the leader? Hmm, yes, other text on this page does that too. Ho hum. Tried to be more careful about this. New version attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company parallel-append-doc-v3.patch Description: Binary data
Re: [HACKERS] Parallel Append implementation
On Thu, May 10, 2018 at 7:08 AM, Robert Haas wrote: > [parallel-append-doc-v2.patch] +plans just as they can in any other plan. However, in a parallel plan, +it is also possible that the planner may choose to substitute a +Parallel Append node. Maybe drop "it is also possible that "? It seems a bit unnecessary and sounds a bit odd followed by "may ", but maybe it's just me. +Also, unlike a regular Append node, which can only have +partial children when used within a parallel plan, Parallel +Append node can have both partial and non-partial child plans. Missing "a" before "Parallel". +Non-partial children will be scanned by only a single worker, since Are we using "worker" in a more general sense that possibly includes the leader? Hmm, yes, other text on this page does that too. Ho hum. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Parallel Append implementation
On Tue, May 8, 2018 at 5:05 PM, Thomas Munro wrote: > +scanning them more than once would preduce duplicate results. Plans that > > s/preduce/produce/ Fixed, thanks. > +Append or MergeAppend plan node. > vs. > +Append of regular Index Scan plans; each > > I think we should standardise on Foo Bar, > FooBar or foo bar when > discussing executor nodes on this page. Well, EXPLAIN prints MergeAppend but Index Scan, and I think we should follow that precedent here. As for vs. , I think the reason I ended up using in the section on scans was because I thought that Parallel Seq Scan might be confusing (what's a "seq"?), so I tried to fudge my way around that by referring to it as an abstract idea rather than the exact EXPLAIN output. You then copied that style in the join section, and, well, like you say, now we have a sort of hodgepodge of styles. Maybe that's a problem for another patch, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company parallel-append-doc-v2.patch Description: Binary data
Re: [HACKERS] Parallel Append implementation
On Wed, May 9, 2018 at 1:15 AM, Robert Haas wrote: > On Tue, May 8, 2018 at 12:10 AM, Thomas Munro > wrote: >> It's not a scan, it's not a join and it's not an aggregation so I >> think it needs to be in a new as the same level as those >> others. It's a different kind of thing. > > I'm a little skeptical about that idea because I'm not sure it's > really in the same category as far as importance is concerned, but I > don't have a better idea. Here's a patch. I'm worried this is too > much technical jargon, but I don't know how to explain it any more > simply. +scanning them more than once would preduce duplicate results. Plans that s/preduce/produce/ +Append or MergeAppend plan node. vs. +Append of regular Index Scan plans; each I think we should standardise on Foo Bar, FooBar or foo bar when discussing executor nodes on this page. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Parallel Append implementation
On Tue, May 8, 2018 at 12:10 AM, Thomas Munro wrote: > It's not a scan, it's not a join and it's not an aggregation so I > think it needs to be in a new as the same level as those > others. It's a different kind of thing. I'm a little skeptical about that idea because I'm not sure it's really in the same category as far as importance is concerned, but I don't have a better idea. Here's a patch. I'm worried this is too much technical jargon, but I don't know how to explain it any more simply. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company parallel-append-doc.patch Description: Binary data
Re: [HACKERS] Parallel Append implementation
On Tue, May 8, 2018 at 5:23 AM, Robert Haas wrote: > On Sat, Apr 7, 2018 at 10:21 AM, Adrien Nayrat > wrote: >> I notice Parallel append is not listed on Parallel Plans documentation : >> https://www.postgresql.org/docs/devel/static/parallel-plans.html > > I agree it might be nice to mention this somewhere on this page, but > I'm not exactly sure where it would make logical sense to put it. It's not a scan, it's not a join and it's not an aggregation so I think it needs to be in a new as the same level as those others. It's a different kind of thing. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Parallel Append implementation
On Sat, Apr 7, 2018 at 10:21 AM, Adrien Nayrat wrote: > I notice Parallel append is not listed on Parallel Plans documentation : > https://www.postgresql.org/docs/devel/static/parallel-plans.html I agree it might be nice to mention this somewhere on this page, but I'm not exactly sure where it would make logical sense to put it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Parallel Append implementation
Hello, I notice Parallel append is not listed on Parallel Plans documentation : https://www.postgresql.org/docs/devel/static/parallel-plans.html If you agree I can add it to Open Items. Thanks, -- Adrien NAYRAT signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Parallel Append implementation
On 6 December 2017 at 04:01, Robert Haas wrote: > On Tue, Nov 28, 2017 at 6:02 AM, amul sul wrote: >> Here are the changes I did on v21 patch to handle crash reported by >> Rajkumar[1]: >> >> diff --git a/src/backend/executor/nodeAppend.c >> b/src/backend/executor/nodeAppend.c >> index e3b17cf0e2..e0ee918808 100644 >> --- a/src/backend/executor/nodeAppend.c >> +++ b/src/backend/executor/nodeAppend.c >> @@ -479,9 +479,12 @@ choose_next_subplan_for_worker(AppendState *node) >> pstate->pa_next_plan = append->first_partial_plan; >> else >> pstate->pa_next_plan++; >> - if (pstate->pa_next_plan == node->as_whichplan) >> + >> + if (pstate->pa_next_plan == node->as_whichplan || >> + (pstate->pa_next_plan == append->first_partial_plan && >> +append->first_partial_plan >= node->as_nplans)) >> { >> - /* We've tried everything! */ >> + /* We've tried everything or there were no partial plans */ >> pstate->pa_next_plan = INVALID_SUBPLAN_INDEX; >> LWLockRelease(&pstate->pa_lock); >> return false; > > I changed this around a little, added a test case, and committed this. Thanks Robert ! The crash that is reported on pgsql-committers, is being discussed on that list itself. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Parallel Append implementation
On Tue, Nov 28, 2017 at 6:02 AM, amul sul wrote: > Here are the changes I did on v21 patch to handle crash reported by > Rajkumar[1]: > > diff --git a/src/backend/executor/nodeAppend.c > b/src/backend/executor/nodeAppend.c > index e3b17cf0e2..e0ee918808 100644 > --- a/src/backend/executor/nodeAppend.c > +++ b/src/backend/executor/nodeAppend.c > @@ -479,9 +479,12 @@ choose_next_subplan_for_worker(AppendState *node) > pstate->pa_next_plan = append->first_partial_plan; > else > pstate->pa_next_plan++; > - if (pstate->pa_next_plan == node->as_whichplan) > + > + if (pstate->pa_next_plan == node->as_whichplan || > + (pstate->pa_next_plan == append->first_partial_plan && > +append->first_partial_plan >= node->as_nplans)) > { > - /* We've tried everything! */ > + /* We've tried everything or there were no partial plans */ > pstate->pa_next_plan = INVALID_SUBPLAN_INDEX; > LWLockRelease(&pstate->pa_lock); > return false; I changed this around a little, added a test case, and committed this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Parallel Append implementation
On Tue, Nov 28, 2017 at 8:02 PM, amul sul wrote: > Apart from this I have added few assert to keep eye on node->as_whichplan > value in the attached patch, thanks. This is still hot, moved to next CF. -- Michael
Re: [HACKERS] Parallel Append implementation
On Mon, Nov 27, 2017 at 10:21 PM, amul sul wrote: > Thanks a lot Rajkumar for this test. I am able to reproduce this crash by > enabling partition wise join. > > The reason for this crash is the same as > the > previous[1] i.e node->as_whichplan > value. This time append->first_partial_plan value looks suspicious. With > the > following change to the v21 patch, I am able to reproduce this crash as > assert > failure when enable_partition_wise_join = ON otherwise working fine. > > diff --git a/src/backend/executor/nodeAppend.c > b/src/backend/executor/nodeAppend.c > index e3b17cf0e2..4b337ac633 100644 > --- a/src/backend/executor/nodeAppend.c > +++ b/src/backend/executor/nodeAppend.c > @@ -458,6 +458,7 @@ choose_next_subplan_for_worker(AppendState *node) > > /* Backward scan is not supported by parallel-aware plans */ > Assert(ScanDirectionIsForward(node->ps.state->es_direction)); > + Assert(append->first_partial_plan < node->as_nplans); > > LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE); > > > Will look into this more, tomorrow. > I haven't reached the actual reason why there wasn't any partial plan (i.e. value of append->first_partial_plan & node->as_nplans are same) when the partition-wise join is enabled. I think in this case we could simply return false from choose_next_subplan_for_worker() when there aren't any partial plan and we done with all non-partition plan, although I may be wrong because I am yet to understand this patch. Here are the changes I did on v21 patch to handle crash reported by Rajkumar[1]: diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c index e3b17cf0e2..e0ee918808 100644 --- a/src/backend/executor/nodeAppend.c +++ b/src/backend/executor/nodeAppend.c @@ -479,9 +479,12 @@ choose_next_subplan_for_worker(AppendState *node) pstate->pa_next_plan = append->first_partial_plan; else pstate->pa_next_plan++; - if (pstate->pa_next_plan == node->as_whichplan) + + if (pstate->pa_next_plan == node->as_whichplan || + (pstate->pa_next_plan == append->first_partial_plan && +append->first_partial_plan >= node->as_nplans)) { - /* We've tried everything! */ + /* We've tried everything or there were no partial plans */ pstate->pa_next_plan = INVALID_SUBPLAN_INDEX; LWLockRelease(&pstate->pa_lock); return false; Apart from this I have added few assert to keep eye on node->as_whichplan value in the attached patch, thanks. 1] http://postgr.es/m/CAKcux6nyDxOyE4PA8O%3DQgF-ugZp_y1G2U%2Burmf76-%3Df2knDsWA%40mail.gmail.com Regards, Amul ParallelAppend_v22.patch Description: Binary data
Re: [HACKERS] Parallel Append implementation
Thanks a lot Rajkumar for this test. I am able to reproduce this crash by enabling partition wise join. The reason for this crash is the same as the previous[1] i.e node->as_whichplan value. This time append->first_partial_plan value looks suspicious. With the following change to the v21 patch, I am able to reproduce this crash as assert failure when enable_partition_wise_join = ON otherwise working fine. diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c index e3b17cf0e2..4b337ac633 100644 --- a/src/backend/executor/nodeAppend.c +++ b/src/backend/executor/nodeAppend.c @@ -458,6 +458,7 @@ choose_next_subplan_for_worker(AppendState *node) /* Backward scan is not supported by parallel-aware plans */ Assert(ScanDirectionIsForward(node->ps.state->es_direction)); + Assert(append->first_partial_plan < node->as_nplans); LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE); Will look into this more, tomorrow. 1. http://postgr.es/m/CAAJ_b97kLNW8Z9nvc_JUUG5wVQUXvG= f37WsX8ALF0A=kah...@mail.gmail.com Regards, Amul On Fri, Nov 24, 2017 at 5:00 PM, Rajkumar Raghuwanshi wrote: > On Thu, Nov 23, 2017 at 2:22 PM, amul sul wrote: >> Look like it is the same crash what v20 claim to be fixed, indeed I >> missed to add fix[1] in v20 patch, sorry about that. Attached updated >> patch includes aforementioned fix. > > Hi, > > I have applied latest v21 patch, it got crashed when enabled > partition-wise-join, > same query is working fine with and without partition-wise-join > enabled on PG-head. > please take a look. > > SET enable_partition_wise_join TO true; > > CREATE TABLE pt1 (a int, b int, c text, d int) PARTITION BY LIST(c); > CREATE TABLE pt1_p1 PARTITION OF pt1 FOR VALUES IN ('', '0001', > '0002', '0003'); > CREATE TABLE pt1_p2 PARTITION OF pt1 FOR VALUES IN ('0004', '0005', > '0006', '0007'); > CREATE TABLE pt1_p3 PARTITION OF pt1 FOR VALUES IN ('0008', '0009', > '0010', '0011'); > INSERT INTO pt1 SELECT i % 20, i % 30, to_char(i % 12, 'FM'), i % > 30 FROM generate_series(0, 9) i; > ANALYZE pt1; > > CREATE TABLE pt2 (a int, b int, c text, d int) PARTITION BY LIST(c); > CREATE TABLE pt2_p1 PARTITION OF pt2 FOR VALUES IN ('', '0001', > '0002', '0003'); > CREATE TABLE pt2_p2 PARTITION OF pt2 FOR VALUES IN ('0004', '0005', > '0006', '0007'); > CREATE TABLE pt2_p3 PARTITION OF pt2 FOR VALUES IN ('0008', '0009', > '0010', '0011'); > INSERT INTO pt2 SELECT i % 20, i % 30, to_char(i % 12, 'FM'), i % > 30 FROM generate_series(0, 9) i; > ANALYZE pt2; > > EXPLAIN ANALYZE > SELECT t1.c, sum(t2.a), COUNT(*) FROM pt1 t1 FULL JOIN pt2 t2 ON t1.c > = t2.c GROUP BY t1.c ORDER BY 1, 2, 3; > WARNING: terminating connection because of crash of another server process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > !> > > stack-trace is given below. > > Core was generated by `postgres: parallel worker for PID 73935 > '. > Program terminated with signal 11, Segmentation fault. > #0 0x006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at > ../../../src/include/executor/executor.h:238 > 238if (node->chgParam != NULL) /* something changed? */ > Missing separate debuginfos, use: debuginfo-install > keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 > libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 > openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 0x006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at > ../../../src/include/executor/executor.h:238 > #1 0x006dc72e in ExecAppend (pstate=0x26cd6e0) at nodeAppend.c:207 > #2 0x006d1e7c in ExecProcNodeInstr (node=0x26cd6e0) at > execProcnode.c:446 > #3 0x006dcee5 in ExecProcNode (node=0x26cd6e0) at > ../../../src/include/executor/executor.h:241 > #4 0x006dd38c in fetch_input_tuple (aggstate=0x26cd7f8) at > nodeAgg.c:699 > #5 0x006e02eb in agg_fill_hash_table (aggstate=0x26cd7f8) at > nodeAgg.c:2536 > #6 0x006dfb2b in ExecAgg (pstate=0x26cd7f8) at nodeAgg.c:2148 > #7 0x006d1e7c in ExecProcNodeInstr (node=0x26cd7f8) at > execProcnode.c:446 > #8 0x006d1e4d in ExecProcNodeFirst (node=0x26cd7f8) at > execProcnode.c:430 > #9 0x006c9439 in ExecProcNode (node=0x26cd7f8) at > ../../../src/include/executor/executor.h:241 > #10 0x006cbd73 in ExecutePlan (estate=0x26ccda0, > planstate=0x26cd7f8, use_parallel_mode=0 '\000', operation=CMD_SELECT, > sendTuples=1 '\001', numberTuples=0, > direction=Forwar
Re: [HACKERS] Parallel Append implementation
On Thu, Nov 23, 2017 at 2:22 PM, amul sul wrote: > Look like it is the same crash what v20 claim to be fixed, indeed I > missed to add fix[1] in v20 patch, sorry about that. Attached updated > patch includes aforementioned fix. Hi, I have applied latest v21 patch, it got crashed when enabled partition-wise-join, same query is working fine with and without partition-wise-join enabled on PG-head. please take a look. SET enable_partition_wise_join TO true; CREATE TABLE pt1 (a int, b int, c text, d int) PARTITION BY LIST(c); CREATE TABLE pt1_p1 PARTITION OF pt1 FOR VALUES IN ('', '0001', '0002', '0003'); CREATE TABLE pt1_p2 PARTITION OF pt1 FOR VALUES IN ('0004', '0005', '0006', '0007'); CREATE TABLE pt1_p3 PARTITION OF pt1 FOR VALUES IN ('0008', '0009', '0010', '0011'); INSERT INTO pt1 SELECT i % 20, i % 30, to_char(i % 12, 'FM'), i % 30 FROM generate_series(0, 9) i; ANALYZE pt1; CREATE TABLE pt2 (a int, b int, c text, d int) PARTITION BY LIST(c); CREATE TABLE pt2_p1 PARTITION OF pt2 FOR VALUES IN ('', '0001', '0002', '0003'); CREATE TABLE pt2_p2 PARTITION OF pt2 FOR VALUES IN ('0004', '0005', '0006', '0007'); CREATE TABLE pt2_p3 PARTITION OF pt2 FOR VALUES IN ('0008', '0009', '0010', '0011'); INSERT INTO pt2 SELECT i % 20, i % 30, to_char(i % 12, 'FM'), i % 30 FROM generate_series(0, 9) i; ANALYZE pt2; EXPLAIN ANALYZE SELECT t1.c, sum(t2.a), COUNT(*) FROM pt1 t1 FULL JOIN pt2 t2 ON t1.c = t2.c GROUP BY t1.c ORDER BY 1, 2, 3; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> stack-trace is given below. Core was generated by `postgres: parallel worker for PID 73935 '. Program terminated with signal 11, Segmentation fault. #0 0x006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at ../../../src/include/executor/executor.h:238 238if (node->chgParam != NULL) /* something changed? */ Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at ../../../src/include/executor/executor.h:238 #1 0x006dc72e in ExecAppend (pstate=0x26cd6e0) at nodeAppend.c:207 #2 0x006d1e7c in ExecProcNodeInstr (node=0x26cd6e0) at execProcnode.c:446 #3 0x006dcee5 in ExecProcNode (node=0x26cd6e0) at ../../../src/include/executor/executor.h:241 #4 0x006dd38c in fetch_input_tuple (aggstate=0x26cd7f8) at nodeAgg.c:699 #5 0x006e02eb in agg_fill_hash_table (aggstate=0x26cd7f8) at nodeAgg.c:2536 #6 0x006dfb2b in ExecAgg (pstate=0x26cd7f8) at nodeAgg.c:2148 #7 0x006d1e7c in ExecProcNodeInstr (node=0x26cd7f8) at execProcnode.c:446 #8 0x006d1e4d in ExecProcNodeFirst (node=0x26cd7f8) at execProcnode.c:430 #9 0x006c9439 in ExecProcNode (node=0x26cd7f8) at ../../../src/include/executor/executor.h:241 #10 0x006cbd73 in ExecutePlan (estate=0x26ccda0, planstate=0x26cd7f8, use_parallel_mode=0 '\000', operation=CMD_SELECT, sendTuples=1 '\001', numberTuples=0, direction=ForwardScanDirection, dest=0x26b2ce0, execute_once=1 '\001') at execMain.c:1718 #11 0x006c9a12 in standard_ExecutorRun (queryDesc=0x26d7fa0, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:361 #12 0x006c982e in ExecutorRun (queryDesc=0x26d7fa0, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:304 #13 0x006d096c in ParallelQueryMain (seg=0x26322a8, toc=0x7fda24d46000) at execParallel.c:1271 #14 0x0053272d in ParallelWorkerMain (main_arg=1203628635) at parallel.c:1149 #15 0x007e8c99 in StartBackgroundWorker () at bgworker.c:841 #16 0x007fc029 in do_start_bgworker (rw=0x2656d00) at postmaster.c:5741 #17 0x007fc36b in maybe_start_bgworkers () at postmaster.c:5945 #18 0x007fb3fa in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5134 #19 #20 0x003dd26e1603 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 #21 0x007f6bee in ServerLoop () at postmaster.c:1721 #22 0x007f63dd in PostmasterMain (argc=3, argv=0x2630180) at postmaster.c:1365 #23 0x0072cb40 in main (argc=3, argv=0x2630180) at main.c:228 Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] Parallel Append implementation
Look like it is the same crash what v20 claim to be fixed, indeed I missed to add fix[1] in v20 patch, sorry about that. Attached updated patch includes aforementioned fix. 1] http://postgr.es/m/CAAJ_b97kLNW8Z9nvc_JUUG5wVQUXvG=f37WsX8ALF0A=kah...@mail.gmail.com Regards, Amul On Thu, Nov 23, 2017 at 1:50 PM, Rajkumar Raghuwanshi wrote: > On Thu, Nov 23, 2017 at 9:45 AM, amul sul wrote: >> >> Attaching updated version of "ParallelAppend_v19_rebased" includes this >> fix. > > > Hi, > > I have applied attached patch and got a crash with below query. please take > a look. > > CREATE TABLE tbl (a int, b int, c text, d int) PARTITION BY LIST(c); > CREATE TABLE tbl_p1 PARTITION OF tbl FOR VALUES IN ('', '0001', '0002', > '0003'); > CREATE TABLE tbl_p2 PARTITION OF tbl FOR VALUES IN ('0004', '0005', '0006', > '0007'); > CREATE TABLE tbl_p3 PARTITION OF tbl FOR VALUES IN ('0008', '0009', '0010', > '0011'); > INSERT INTO tbl SELECT i % 20, i % 30, to_char(i % 12, 'FM'), i % 30 > FROM generate_series(0, 999) i; > ANALYZE tbl; > > EXPLAIN ANALYZE SELECT c, sum(a), avg(b), COUNT(*) FROM tbl GROUP BY c > HAVING avg(d) < 15 ORDER BY 1, 2, 3; > WARNING: terminating connection because of crash of another server process > DETAIL: The postmaster has commanded this server process to roll back the > current transaction and exit, because another server process exited > abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > !> > > > stack-trace is given below. > > Reading symbols from /lib64/libnss_files.so.2...Reading symbols from > /usr/lib/debug/lib64/libnss_files-2.12.so.debug...done. > done. > Loaded symbols for /lib64/libnss_files.so.2 > Core was generated by `postgres: parallel worker for PID 104999 > '. > Program terminated with signal 11, Segmentation fault. > #0 0x006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at > ../../../src/include/executor/executor.h:238 > 238if (node->chgParam != NULL) /* something changed? */ > Missing separate debuginfos, use: debuginfo-install > keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 > libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 > openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64 > (gdb) bt > #0 0x006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at > ../../../src/include/executor/executor.h:238 > #1 0x006dc72e in ExecAppend (pstate=0x1947ed0) at nodeAppend.c:207 > #2 0x006d1e7c in ExecProcNodeInstr (node=0x1947ed0) at > execProcnode.c:446 > #3 0x006dcef1 in ExecProcNode (node=0x1947ed0) at > ../../../src/include/executor/executor.h:241 > #4 0x006dd398 in fetch_input_tuple (aggstate=0x1947fe8) at > nodeAgg.c:699 > #5 0x006e02f7 in agg_fill_hash_table (aggstate=0x1947fe8) at > nodeAgg.c:2536 > #6 0x006dfb37 in ExecAgg (pstate=0x1947fe8) at nodeAgg.c:2148 > #7 0x006d1e7c in ExecProcNodeInstr (node=0x1947fe8) at > execProcnode.c:446 > #8 0x006d1e4d in ExecProcNodeFirst (node=0x1947fe8) at > execProcnode.c:430 > #9 0x006c9439 in ExecProcNode (node=0x1947fe8) at > ../../../src/include/executor/executor.h:241 > #10 0x006cbd73 in ExecutePlan (estate=0x1947590, > planstate=0x1947fe8, use_parallel_mode=0 '\000', operation=CMD_SELECT, > sendTuples=1 '\001', numberTuples=0, > direction=ForwardScanDirection, dest=0x192acb0, execute_once=1 '\001') > at execMain.c:1718 > #11 0x006c9a12 in standard_ExecutorRun (queryDesc=0x194ffc0, > direction=ForwardScanDirection, count=0, execute_once=1 '\001') at > execMain.c:361 > #12 0x006c982e in ExecutorRun (queryDesc=0x194ffc0, > direction=ForwardScanDirection, count=0, execute_once=1 '\001') at > execMain.c:304 > #13 0x006d096c in ParallelQueryMain (seg=0x18aa2a8, > toc=0x7f899a227000) at execParallel.c:1271 > #14 0x0053272d in ParallelWorkerMain (main_arg=1218206688) at > parallel.c:1149 > #15 0x007e8ca5 in StartBackgroundWorker () at bgworker.c:841 > #16 0x007fc035 in do_start_bgworker (rw=0x18ced00) at > postmaster.c:5741 > #17 0x007fc377 in maybe_start_bgworkers () at postmaster.c:5945 > #18 0x007fb406 in sigusr1_handler (postgres_signal_arg=10) at > postmaster.c:5134 > #19 > #20 0x003dd26e1603 in __select_nocancel () at > ../sysdeps/unix/syscall-template.S:82 > #21 0x007f6bfa in ServerLoop () at postmaster.c:1721 > #22 0x007f63e9 in PostmasterMain (argc=3, argv=0x18a8180) at > postmaster.c:1365 > #23 0x0072cb4c in main (argc=3, argv=0x18a8180) at main.c:228 > (gdb) > > > Thanks & Regards, > Rajkumar Raghuwanshi > QMG, EnterpriseDB Corporation ParallelAppend_v21.patch Descriptio
Re: [HACKERS] Parallel Append implementation
On Thu, Nov 23, 2017 at 9:45 AM, amul sul wrote: > Attaching updated version of "ParallelAppend_v19_rebased" includes this > fix. > Hi, I have applied attached patch and got a crash with below query. please take a look. CREATE TABLE tbl (a int, b int, c text, d int) PARTITION BY LIST(c); CREATE TABLE tbl_p1 PARTITION OF tbl FOR VALUES IN ('', '0001', '0002', '0003'); CREATE TABLE tbl_p2 PARTITION OF tbl FOR VALUES IN ('0004', '0005', '0006', '0007'); CREATE TABLE tbl_p3 PARTITION OF tbl FOR VALUES IN ('0008', '0009', '0010', '0011'); INSERT INTO tbl SELECT i % 20, i % 30, to_char(i % 12, 'FM'), i % 30 FROM generate_series(0, 999) i; ANALYZE tbl; EXPLAIN ANALYZE SELECT c, sum(a), avg(b), COUNT(*) FROM tbl GROUP BY c HAVING avg(d) < 15 ORDER BY 1, 2, 3; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> stack-trace is given below. Reading symbols from /lib64/libnss_files.so.2...Reading symbols from /usr/lib/debug/lib64/libnss_files-2.12.so.debug...done. done. Loaded symbols for /lib64/libnss_files.so.2 Core was generated by `postgres: parallel worker for PID 104999 '. Program terminated with signal 11, Segmentation fault. #0 0x006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at ../../../src/include/executor/executor.h:238 238if (node->chgParam != NULL) /* something changed? */ Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at ../../../src/include/executor/executor.h:238 #1 0x006dc72e in ExecAppend (pstate=0x1947ed0) at nodeAppend.c:207 #2 0x006d1e7c in ExecProcNodeInstr (node=0x1947ed0) at execProcnode.c:446 #3 0x006dcef1 in ExecProcNode (node=0x1947ed0) at ../../../src/include/executor/executor.h:241 #4 0x006dd398 in fetch_input_tuple (aggstate=0x1947fe8) at nodeAgg.c:699 #5 0x006e02f7 in agg_fill_hash_table (aggstate=0x1947fe8) at nodeAgg.c:2536 #6 0x006dfb37 in ExecAgg (pstate=0x1947fe8) at nodeAgg.c:2148 #7 0x006d1e7c in ExecProcNodeInstr (node=0x1947fe8) at execProcnode.c:446 #8 0x006d1e4d in ExecProcNodeFirst (node=0x1947fe8) at execProcnode.c:430 #9 0x006c9439 in ExecProcNode (node=0x1947fe8) at ../../../src/include/executor/executor.h:241 #10 0x006cbd73 in ExecutePlan (estate=0x1947590, planstate=0x1947fe8, use_parallel_mode=0 '\000', operation=CMD_SELECT, sendTuples=1 '\001', numberTuples=0, direction=ForwardScanDirection, dest=0x192acb0, execute_once=1 '\001') at execMain.c:1718 #11 0x006c9a12 in standard_ExecutorRun (queryDesc=0x194ffc0, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:361 #12 0x006c982e in ExecutorRun (queryDesc=0x194ffc0, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:304 #13 0x006d096c in ParallelQueryMain (seg=0x18aa2a8, toc=0x7f899a227000) at execParallel.c:1271 #14 0x0053272d in ParallelWorkerMain (main_arg=1218206688) at parallel.c:1149 #15 0x007e8ca5 in StartBackgroundWorker () at bgworker.c:841 #16 0x007fc035 in do_start_bgworker (rw=0x18ced00) at postmaster.c:5741 #17 0x007fc377 in maybe_start_bgworkers () at postmaster.c:5945 #18 0x007fb406 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5134 #19 #20 0x003dd26e1603 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 #21 0x007f6bfa in ServerLoop () at postmaster.c:1721 #22 0x007f63e9 in PostmasterMain (argc=3, argv=0x18a8180) at postmaster.c:1365 #23 0x0072cb4c in main (argc=3, argv=0x18a8180) at main.c:228 (gdb) Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] Parallel Append implementation
On Wed, Nov 22, 2017 at 1:44 AM, Robert Haas wrote: > On Tue, Nov 21, 2017 at 6:57 AM, amul sul wrote: >> By doing following change on the v19 patch does the fix for me: >> >> --- a/src/backend/executor/nodeAppend.c >> +++ b/src/backend/executor/nodeAppend.c >> @@ -489,11 +489,9 @@ choose_next_subplan_for_worker(AppendState *node) >> } >> >> /* Pick the plan we found, and advance pa_next_plan one more time. */ >> - node->as_whichplan = pstate->pa_next_plan; >> + node->as_whichplan = pstate->pa_next_plan++; >> if (pstate->pa_next_plan == node->as_nplans) >> pstate->pa_next_plan = append->first_partial_plan; >> - else >> - pstate->pa_next_plan++; >> >> /* If non-partial, immediately mark as finished. */ >> if (node->as_whichplan < append->first_partial_plan) >> >> Attaching patch does same changes to Amit's ParallelAppend_v19_rebased.patch. > > Yes, that looks like a correct fix. Thanks. > Attaching updated version of "ParallelAppend_v19_rebased" includes this fix. Regards, Amul ParallelAppend_v20.patch Description: Binary data
Re: [HACKERS] Parallel Append implementation
On Tue, Nov 21, 2017 at 6:57 AM, amul sul wrote: > By doing following change on the v19 patch does the fix for me: > > --- a/src/backend/executor/nodeAppend.c > +++ b/src/backend/executor/nodeAppend.c > @@ -489,11 +489,9 @@ choose_next_subplan_for_worker(AppendState *node) > } > > /* Pick the plan we found, and advance pa_next_plan one more time. */ > - node->as_whichplan = pstate->pa_next_plan; > + node->as_whichplan = pstate->pa_next_plan++; > if (pstate->pa_next_plan == node->as_nplans) > pstate->pa_next_plan = append->first_partial_plan; > - else > - pstate->pa_next_plan++; > > /* If non-partial, immediately mark as finished. */ > if (node->as_whichplan < append->first_partial_plan) > > Attaching patch does same changes to Amit's ParallelAppend_v19_rebased.patch. Yes, that looks like a correct fix. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Parallel Append implementation
On Tue, Nov 21, 2017 at 2:22 PM, Amit Khandekar wrote: > On 21 November 2017 at 12:44, Rafia Sabih > wrote: >> On Mon, Nov 13, 2017 at 12:54 PM, Amit Khandekar >> wrote: >>> Thanks a lot Robert for the patch. I will have a look. Quickly tried >>> to test some aggregate queries with a partitioned pgbench_accounts >>> table, and it is crashing. Will get back with the fix, and any other >>> review comments. >>> >>> Thanks >>> -Amit Khandekar >> >> I was trying to get the performance of this patch at commit id - >> 11e264517dff7a911d9e6494de86049cab42cde3 and TPC-H scale factor 20 >> with the following parameter settings, >> work_mem = 1 GB >> shared_buffers = 10GB >> effective_cache_size = 10GB >> max_parallel_workers_per_gather = 4 >> enable_partitionwise_join = on >> >> and the details of the partitioning scheme is as follows, >> tables partitioned = lineitem on l_orderkey and orders on o_orderkey >> number of partitions in each table = 10 >> >> As per the explain outputs PA was used in following queries- 1, 3, 4, >> 5, 6, 7, 8, 10, 12, 14, 15, 18, and 21. >> Unfortunately, at the time of executing any of these query, it is >> crashing with the following information in core dump of each of the >> workers, >> >> Program terminated with signal 11, Segmentation fault. >> #0 0x10600984 in pg_atomic_read_u32_impl (ptr=0x3ec29294) >> at ../../../../src/include/port/atomics/generic.h:48 >> 48 return ptr->value; >> >> In case this a different issue as you pointed upthread, you may want >> to have a look at this as well. >> Please let me know if you need any more information in this regard. > > Right, for me the crash had occurred with a similar stack, although > the real crash happened in one of the workers. Attached is the script > file > pgbench_partitioned.sql to create a schema with which I had reproduced > the crash. > > The query that crashed : > select sum(aid), avg(aid) from pgbench_accounts; > > Set max_parallel_workers_per_gather to at least 5. > > Also attached is v19 patch rebased. > I've spent little time to debug this crash. The crash happens in ExecAppend() due to subnode in node->appendplans array is referred using incorrect array index (out of bound value) in the following code: /* * figure out which subplan we are currently processing */ subnode = node->appendplans[node->as_whichplan]; This incorrect value to node->as_whichplan is get assigned in the choose_next_subplan_for_worker(). By doing following change on the v19 patch does the fix for me: --- a/src/backend/executor/nodeAppend.c +++ b/src/backend/executor/nodeAppend.c @@ -489,11 +489,9 @@ choose_next_subplan_for_worker(AppendState *node) } /* Pick the plan we found, and advance pa_next_plan one more time. */ - node->as_whichplan = pstate->pa_next_plan; + node->as_whichplan = pstate->pa_next_plan++; if (pstate->pa_next_plan == node->as_nplans) pstate->pa_next_plan = append->first_partial_plan; - else - pstate->pa_next_plan++; /* If non-partial, immediately mark as finished. */ if (node->as_whichplan < append->first_partial_plan) Attaching patch does same changes to Amit's ParallelAppend_v19_rebased.patch. Regards, Amul fix_crash.patch Description: Binary data
Re: [HACKERS] Parallel Append implementation
On 21 November 2017 at 12:44, Rafia Sabih wrote: > On Mon, Nov 13, 2017 at 12:54 PM, Amit Khandekar > wrote: >> Thanks a lot Robert for the patch. I will have a look. Quickly tried >> to test some aggregate queries with a partitioned pgbench_accounts >> table, and it is crashing. Will get back with the fix, and any other >> review comments. >> >> Thanks >> -Amit Khandekar > > I was trying to get the performance of this patch at commit id - > 11e264517dff7a911d9e6494de86049cab42cde3 and TPC-H scale factor 20 > with the following parameter settings, > work_mem = 1 GB > shared_buffers = 10GB > effective_cache_size = 10GB > max_parallel_workers_per_gather = 4 > enable_partitionwise_join = on > > and the details of the partitioning scheme is as follows, > tables partitioned = lineitem on l_orderkey and orders on o_orderkey > number of partitions in each table = 10 > > As per the explain outputs PA was used in following queries- 1, 3, 4, > 5, 6, 7, 8, 10, 12, 14, 15, 18, and 21. > Unfortunately, at the time of executing any of these query, it is > crashing with the following information in core dump of each of the > workers, > > Program terminated with signal 11, Segmentation fault. > #0 0x10600984 in pg_atomic_read_u32_impl (ptr=0x3ec29294) > at ../../../../src/include/port/atomics/generic.h:48 > 48 return ptr->value; > > In case this a different issue as you pointed upthread, you may want > to have a look at this as well. > Please let me know if you need any more information in this regard. Right, for me the crash had occurred with a similar stack, although the real crash happened in one of the workers. Attached is the script file pgbench_partitioned.sql to create a schema with which I had reproduced the crash. The query that crashed : select sum(aid), avg(aid) from pgbench_accounts; Set max_parallel_workers_per_gather to at least 5. Also attached is v19 patch rebased. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company pgbench_partitioned.sql Description: Binary data ParallelAppend_v19_rebased.patch Description: Binary data
Re: [HACKERS] Parallel Append implementation
On Mon, Nov 13, 2017 at 12:54 PM, Amit Khandekar wrote: > Thanks a lot Robert for the patch. I will have a look. Quickly tried > to test some aggregate queries with a partitioned pgbench_accounts > table, and it is crashing. Will get back with the fix, and any other > review comments. > > Thanks > -Amit Khandekar I was trying to get the performance of this patch at commit id - 11e264517dff7a911d9e6494de86049cab42cde3 and TPC-H scale factor 20 with the following parameter settings, work_mem = 1 GB shared_buffers = 10GB effective_cache_size = 10GB max_parallel_workers_per_gather = 4 enable_partitionwise_join = on and the details of the partitioning scheme is as follows, tables partitioned = lineitem on l_orderkey and orders on o_orderkey number of partitions in each table = 10 As per the explain outputs PA was used in following queries- 1, 3, 4, 5, 6, 7, 8, 10, 12, 14, 15, 18, and 21. Unfortunately, at the time of executing any of these query, it is crashing with the following information in core dump of each of the workers, Program terminated with signal 11, Segmentation fault. #0 0x10600984 in pg_atomic_read_u32_impl (ptr=0x3ec29294) at ../../../../src/include/port/atomics/generic.h:48 48 return ptr->value; In case this a different issue as you pointed upthread, you may want to have a look at this as well. Please let me know if you need any more information in this regard. -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/