Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 22, 2021 at 3:57 PM Greg Nancarrow wrote: > > On Mon, Mar 22, 2021 at 6:28 PM houzj.f...@fujitsu.com > wrote: > > > > > > > > Let me know if these changes seem OK to you. > > > > Yes, these changes look good to me. > > Posting an updated set of patches with these changes... > I have marked this as Returned with Feedback. There is a lot of work to do for this patch as per the feedback given on pgsql-committers [1]. [1] - https://www.postgresql.org/message-id/E1lMiB9-0001c3-SY%40gemulon.postgresql.org -- With Regards, Amit Kapila.
RE: Parallel INSERT (INTO ... SELECT ...)
> > I noticed that some comments may need updated since we introduced > parallel insert in this patch. > > > > 1) src/backend/executor/execMain.c > > * Don't allow writes in parallel mode. Supporting UPDATE and > DELETE > > * would require (a) storing the combocid hash in shared memory, > rather > > * than synchronizing it just once at the start of parallelism, and > > (b) an > > * alternative to heap_update()'s reliance on xmax for mutual > exclusion. > > * INSERT may have no such troubles, but we forbid it to simplify > > the > > * checks. > > > > As we will allow INSERT in parallel mode, we'd better change the comment > here. > > > > Thanks, it does need to be updated for parallel INSERT. > I was thinking of the following change: > > - * Don't allow writes in parallel mode. Supporting UPDATE and DELETE > - * would require (a) storing the combocid hash in shared memory, rather > - * than synchronizing it just once at the start of parallelism, and (b) > an > - * alternative to heap_update()'s reliance on xmax for mutual exclusion. > - * INSERT may have no such troubles, but we forbid it to simplify the > - * checks. > + * Except for INSERT, don't allow writes in parallel mode. Supporting > + * UPDATE and DELETE would require (a) storing the combocid hash in > shared > + * memory, rather than synchronizing it just once at the start of > + * parallelism, and (b) an alternative to heap_update()'s reliance on > xmax > + * for mutual exclusion. > > > > > 2) src/backend/storage/lmgr/README > > dangers are modest. The leader and worker share the same transaction, > > snapshot, and combo CID hash, and neither can perform any DDL or, > > indeed, write any data at all. Thus, for either to read a table > > locked exclusively by > > > > The same as 1), parallel insert is the exception. > > > > I agree, it needs to be updated too, to account for parallel INSERT now being > supported. > > -write any data at all. ... > +write any data at all (with the exception of parallel insert). ... > > > > 3) src/backend/storage/lmgr/README > > mutual exclusion method for such cases. Currently, the parallel mode > > is strictly read-only, but now we have the infrastructure to allow > > parallel inserts and parallel copy. > > > > May be we can say: > > +mutual exclusion method for such cases. Currently, we only allowed > > +parallel inserts, but we already have the infrastructure to allow parallel > > copy. > > > > Yes, agree, something like: > > -mutual exclusion method for such cases. Currently, the parallel mode is > -strictly read-only, but now we have the infrastructure to allow parallel > -inserts > and parallel copy. > +mutual exclusion method for such cases. Currently, only parallel > +insert is allowed, but we have the infrastructure to allow parallel copy. > > > Let me know if these changes seem OK to you. Yes, these changes look good to me. Best regards, houzj
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 22, 2021 at 2:30 PM houzj.f...@fujitsu.com wrote: > > I noticed that some comments may need updated since we introduced parallel > insert in this patch. > > 1) src/backend/executor/execMain.c > * Don't allow writes in parallel mode. Supporting UPDATE and DELETE > * would require (a) storing the combocid hash in shared memory, > rather > * than synchronizing it just once at the start of parallelism, and > (b) an > * alternative to heap_update()'s reliance on xmax for mutual > exclusion. > * INSERT may have no such troubles, but we forbid it to simplify the > * checks. > > As we will allow INSERT in parallel mode, we'd better change the comment here. > Thanks, it does need to be updated for parallel INSERT. I was thinking of the following change: - * Don't allow writes in parallel mode. Supporting UPDATE and DELETE - * would require (a) storing the combocid hash in shared memory, rather - * than synchronizing it just once at the start of parallelism, and (b) an - * alternative to heap_update()'s reliance on xmax for mutual exclusion. - * INSERT may have no such troubles, but we forbid it to simplify the - * checks. + * Except for INSERT, don't allow writes in parallel mode. Supporting + * UPDATE and DELETE would require (a) storing the combocid hash in shared + * memory, rather than synchronizing it just once at the start of + * parallelism, and (b) an alternative to heap_update()'s reliance on xmax + * for mutual exclusion. > 2) src/backend/storage/lmgr/README > dangers are modest. The leader and worker share the same transaction, > snapshot, and combo CID hash, and neither can perform any DDL or, indeed, > write any data at all. Thus, for either to read a table locked exclusively by > > The same as 1), parallel insert is the exception. > I agree, it needs to be updated too, to account for parallel INSERT now being supported. -write any data at all. ... +write any data at all (with the exception of parallel insert). ... > 3) src/backend/storage/lmgr/README > mutual exclusion method for such cases. Currently, the parallel mode is > strictly read-only, but now we have the infrastructure to allow parallel > inserts and parallel copy. > > May be we can say: > +mutual exclusion method for such cases. Currently, we only allowed parallel > +inserts, but we already have the infrastructure to allow parallel copy. > Yes, agree, something like: -mutual exclusion method for such cases. Currently, the parallel mode is -strictly read-only, but now we have the infrastructure to allow parallel -inserts and parallel copy. +mutual exclusion method for such cases. Currently, only parallel insert is +allowed, but we have the infrastructure to allow parallel copy. Let me know if these changes seem OK to you. Regards, Greg Nancarrow Fujitsu Australia
RE: Parallel INSERT (INTO ... SELECT ...)
> Since the "Parallel SELECT for INSERT" patch and related GUC/reloption patch > have been committed, I have now rebased and attached the rest of the original > patchset, > which includes: >- Additional tests for "Parallel SELECT for INSERT" >- Parallel INSERT functionality >- Test and documentation updates for Parallel INSERT Hi, I noticed that some comments may need updated since we introduced parallel insert in this patch. 1) src/backend/executor/execMain.c * Don't allow writes in parallel mode. Supporting UPDATE and DELETE * would require (a) storing the combocid hash in shared memory, rather * than synchronizing it just once at the start of parallelism, and (b) an * alternative to heap_update()'s reliance on xmax for mutual exclusion. * INSERT may have no such troubles, but we forbid it to simplify the * checks. As we will allow INSERT in parallel mode, we'd better change the comment here. 2) src/backend/storage/lmgr/README dangers are modest. The leader and worker share the same transaction, snapshot, and combo CID hash, and neither can perform any DDL or, indeed, write any data at all. Thus, for either to read a table locked exclusively by The same as 1), parallel insert is the exception. 3) src/backend/storage/lmgr/README mutual exclusion method for such cases. Currently, the parallel mode is strictly read-only, but now we have the infrastructure to allow parallel inserts and parallel copy. May be we can say: +mutual exclusion method for such cases. Currently, we only allowed parallel +inserts, but we already have the infrastructure to allow parallel copy. Best regards, houzj
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 18, 2021 at 9:04 AM houzj.f...@fujitsu.com wrote: > > > > If a table parameter value is set and the > > > equivalent toast. parameter is not, the TOAST > > > table > > > will use the table's parameter value. > > > -Specifying these parameters for partitioned tables is not supported, > > > -but you may specify them for individual leaf partitions. > > > +These parameters, with the exception of > > > +parallel_insert_enabled, > > > +are not supported on partitioned tables, but may be specified for > > > +individual leaf partitions. > > > > > > > > > > Your patch looks good to me. While checking this, I notice a typo in the > > previous patch: > > - planner parameter parallel_workers. > > + planner parameter parallel_workers and > > + parallel_insert_enabled. > > > > Here, it should be /planner parameter/planner parameters. > > Thanks amit and justin for pointing this out ! > The changes looks good to me. > Pushed! -- With Regards, Amit Kapila.
RE: Parallel INSERT (INTO ... SELECT ...)
> > If a table parameter value is set and the > > equivalent toast. parameter is not, the TOAST table > > will use the table's parameter value. > > -Specifying these parameters for partitioned tables is not supported, > > -but you may specify them for individual leaf partitions. > > +These parameters, with the exception of > > +parallel_insert_enabled, > > +are not supported on partitioned tables, but may be specified for > > +individual leaf partitions. > > > > > > Your patch looks good to me. While checking this, I notice a typo in the > previous patch: > - planner parameter parallel_workers. > + planner parameter parallel_workers and > + parallel_insert_enabled. > > Here, it should be /planner parameter/planner parameters. Thanks amit and justin for pointing this out ! The changes looks good to me. Best regards, houzj
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 18, 2021 at 8:30 AM Justin Pryzby wrote: > > diff --git a/doc/src/sgml/ref/create_table.sgml > b/doc/src/sgml/ref/create_table.sgml > index ff1b642722..d5d356f2de 100644 > --- a/doc/src/sgml/ref/create_table.sgml > +++ b/doc/src/sgml/ref/create_table.sgml > @@ -1338,8 +1338,10 @@ WITH ( MODULUS class="parameter">numeric_literal, REM > If a table parameter value is set and the > equivalent toast. parameter is not, the TOAST table > will use the table's parameter value. > -Specifying these parameters for partitioned tables is not supported, > -but you may specify them for individual leaf partitions. > +These parameters, with the exception of > +parallel_insert_enabled, > +are not supported on partitioned tables, but may be specified for > +individual leaf partitions. > > Your patch looks good to me. While checking this, I notice a typo in the previous patch: - planner parameter parallel_workers. + planner parameter parallel_workers and + parallel_insert_enabled. Here, it should be /planner parameter/planner parameters. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index ff1b642722..d5d356f2de 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1338,8 +1338,10 @@ WITH ( MODULUS numeric_literal, REM If a table parameter value is set and the equivalent toast. parameter is not, the TOAST table will use the table's parameter value. -Specifying these parameters for partitioned tables is not supported, -but you may specify them for individual leaf partitions. +These parameters, with the exception of +parallel_insert_enabled, +are not supported on partitioned tables, but may be specified for +individual leaf partitions. >From 55c7326e56f9b49710a564e91c46212a17f12b24 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Wed, 17 Mar 2021 21:47:27 -0500 Subject: [PATCH] Fix for parallel_insert_enabled --- doc/src/sgml/ref/create_table.sgml | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index ff1b642722..d5d356f2de 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1338,8 +1338,10 @@ WITH ( MODULUS numeric_literal, REM If a table parameter value is set and the equivalent toast. parameter is not, the TOAST table will use the table's parameter value. -Specifying these parameters for partitioned tables is not supported, -but you may specify them for individual leaf partitions. +These parameters, with the exception of +parallel_insert_enabled, +are not supported on partitioned tables, but may be specified for +individual leaf partitions. -- 2.17.0
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Mar 12, 2021 at 04:05:09PM +0900, Amit Langote wrote: > On Fri, Mar 12, 2021 at 6:10 AM Justin Pryzby wrote: > > Note also this CF entry > > https://commitfest.postgresql.org/32/2987/ > > | Allow setting parallel_workers on partitioned tables I'm rebasing that other patch on top of master (with this patch), and I noticed that it adds this bit, and this patch should have done something similar: --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1340,4 +1340,5 @@ WITH ( MODULUS numeric_literal, REM will use the table's parameter value. -Specifying these parameters for partitioned tables is not supported, -but you may specify them for individual leaf partitions. +These parameters, with the exception of parallel_workers, +are not supported on partitioned tables, but you may specify them for +individual leaf partitions. -- Justin
RE: Parallel INSERT (INTO ... SELECT ...)
> > Attaching new version patch with this change. > > > > Thanks, the patch looks good to me. I have made some minor cosmetic > changes in the attached. I am planning to push this by tomorrow unless you or > others have any more comments or suggestions for this patch. Thanks for the review. I have no more comments. BTW, I have done the final test on the patch and all of the tests passed. Best regards, houzj
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 15, 2021 at 11:34 AM Justin Pryzby wrote: > > On Mon, Mar 15, 2021 at 11:25:26AM +0530, Amit Kapila wrote: > > On Fri, Mar 12, 2021 at 3:01 PM houzj.f...@fujitsu.com wrote: > > > > > > Attaching new version patch with this change. > > > > Thanks, the patch looks good to me. I have made some minor cosmetic > > changes in the attached. I am planning to push this by tomorrow unless > > you or others have any more comments or suggestions for this patch. > > I still wonder if it should be possible for the GUC to be off, and then > selectively enable parallel inserts on specific tables. > I think that could be inconvenient for users because in most tables such a restriction won't need to be applied. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 15, 2021 at 11:25:26AM +0530, Amit Kapila wrote: > On Fri, Mar 12, 2021 at 3:01 PM houzj.f...@fujitsu.com wrote: > > > > Attaching new version patch with this change. > > Thanks, the patch looks good to me. I have made some minor cosmetic > changes in the attached. I am planning to push this by tomorrow unless > you or others have any more comments or suggestions for this patch. I still wonder if it should be possible for the GUC to be off, and then selectively enable parallel inserts on specific tables. In that case, the GUC should be called something other than enable_*, or maybe it should be an enum with values like "off" and "allowed"/enabled/defer/??? -- Justin
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Mar 12, 2021 at 3:01 PM houzj.f...@fujitsu.com wrote: > > Attaching new version patch with this change. > Thanks, the patch looks good to me. I have made some minor cosmetic changes in the attached. I am planning to push this by tomorrow unless you or others have any more comments or suggestions for this patch. -- With Regards, Amit Kapila. v28-0001-Add-a-new-GUC-and-a-reloption-to-enable-inserts-.patch Description: Binary data
Re: Parallel INSERT (INTO ... SELECT ...)
On Sat, Mar 13, 2021 at 10:08 AM Tom Lane wrote: > > Greg Nancarrow writes: > > On Fri, Mar 12, 2021 at 5:00 AM Tom Lane wrote: > >> BTW, having special logic for FK triggers in > >> target_rel_trigger_max_parallel_hazard seems quite loony to me. > >> Why isn't that handled by setting appropriate proparallel values > >> for those trigger functions? > > > ... and also attached a patch to update the code for this issue. > > Hm, what I was expecting to see is that RI_FKey_check_ins and > RI_FKey_check_upd get marked as proparallel = 'r' and the remainder > get marked as proparallel = 'u'. > oh, I think Greg's patch has just marked functions for which the current code has parallel-safety checks and I also thought that would be sufficient. > Remember that the default for > builtin functions is proparallel = 's', but surely that's wrong > for triggers that can propagate updates to other tables? > Okay, probably the others can be marked as unsafe. I think we have not considered others except for FK-related triggers which we knew are hazardous for enabling inserts in parallel-mode. The others seem to be related to update/delete, so we have not done anything, but maybe it is better to mark them as 'unsafe' now, and later when we support the update/delete in parallel-mode, we can see if any of those can be executed in parallel-mode. But OTOH, we can keep them as it is if they don't impact the current operation we have supported in parallel-mode. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
Greg Nancarrow writes: > On Fri, Mar 12, 2021 at 5:00 AM Tom Lane wrote: >> BTW, having special logic for FK triggers in >> target_rel_trigger_max_parallel_hazard seems quite loony to me. >> Why isn't that handled by setting appropriate proparallel values >> for those trigger functions? > ... and also attached a patch to update the code for this issue. Hm, what I was expecting to see is that RI_FKey_check_ins and RI_FKey_check_upd get marked as proparallel = 'r' and the remainder get marked as proparallel = 'u'. Remember that the default for builtin functions is proparallel = 's', but surely that's wrong for triggers that can propagate updates to other tables? regards, tom lane
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Mar 12, 2021 at 9:33 AM houzj.f...@fujitsu.com wrote: > > > On Thu, Mar 11, 2021 at 01:01:42PM +, houzj.f...@fujitsu.com wrote: > > > > I guess to have the finer granularity we'd have to go with > > > > enable_parallel_insert, which then would mean possibly having to > > > > later add enable_parallel_update, should parallel update have > > > > similar potential overhead in the parallel-safety checks (which to > > > > me, looks like it could, and parallel delete may not ...) > > > > > > > > It's a shame there is no "set" type for GUC options. > > > > e.g. > > > > enable_parallel_dml='insert,update' > > > > Maybe that's going too far. > > > > Isn't that just GUC_LIST_INPUT ? > > I'm not sure why it'd be going to far ? > > > > The GUC-setting assign hook can parse the enable_parallel_dml_list value set > > by the user, and set an internal int/bits enable_parallel_dml variable with > > some > > define/enum values like: > > > > GUC_PARALLEL_DML_INSERT 0x01 > > GUC_PARALLEL_DML_DELETE 0x02 > > GUC_PARALLEL_DML_UPDATE 0x04 > > > > I think this ideas works, but we still need to consider about the reloption. > After looking into the reloption, I think postgres do not have a list-like > type for reloption. > And I think it's better that the guc and reloption is consistent. > I also think it is better to be consistent here. > Besides, a list type guc option that only support one valid value 'insert' > seems a little weird to me(we only support parallel insert for now). > > So, I tend to keep the current style of guc option. > +1. I feel at this stage it might not be prudent to predict the overhead for parallel updates or deletes especially when there doesn't appear to be an easy way to provide a futuristic guc/reloption and we don't have any patch on the table which can prove or disprove that theory. The only thing that we can see that even if parallel updates/deletes have overhead, it might not be due to similar reasons. Also, I guess the parallel-copy might need somewhat similar parallel-safety checking w.r.t partitioned tables and I feel the current proposed guc/reloption can be used for the same as it is quite a similar operation. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Mar 12, 2021 at 1:33 PM houzj.f...@fujitsu.com wrote: > > > > The problem is that target_rel_trigger_max_parallel_hazard and its > > > caller think they can use a relcache TriggerDesc field across other > > > cache accesses, which they can't because the relcache doesn't > > > guarantee that that won't move. > > > > > > One approach would be to add logic to RelationClearRelation similar to > > > what it does for tupdescs, rules, etc, to avoid moving them when their > > > contents haven't changed. But given that we've not needed that for > > > the past several decades, I'm disinclined to add the overhead. I > > > think this code ought to be adjusted to not make its own copy of the > > > trigdesc pointer, but instead fetch it out of the relcache struct each > > > time it is accessed. There's no real reason why > > > target_rel_trigger_max_parallel_hazard shouldn't be passed the > > > (stable) Relation pointer instead of just the trigdesc pointer. > > > > > > > I have attached a patch to fix the issue, based on your suggestion (tested > > with > > CLOBBER_CACHE_ALWAYS defined). > > > > > BTW, having special logic for FK triggers in > > > target_rel_trigger_max_parallel_hazard seems quite loony to me. > > > Why isn't that handled by setting appropriate proparallel values for > > > those trigger functions? > > > > > > > ... and also attached a patch to update the code for this issue. > > > > (2nd patch relies on application of the 1st patch) > > > > Thanks again for pointing out these problems. > > I have tested the triggerdesc bugfix patch with CLOBBER_CACHE_ALWAYS flag. > It passed the testset where is fail in buildfarm (foreign_key, foreign_data). > Thanks for the patch and review. It looks good to me as well and passes the tests (foreign_key, foreign_data) with CLOBBER_CACHE_ALWAYS flag. I'll review the second patch of Greg. -- With Regards, Amit Kapila.
RE: Parallel INSERT (INTO ... SELECT ...)
> On Fri, Mar 12, 2021 at 6:10 AM Justin Pryzby <mailto:pry...@telsasoft.com> > wrote: > > Note also this CF entry > > https://commitfest.postgresql.org/32/2987/ > > | Allow setting parallel_workers on partitioned tables > > +/* > + * PartitionedOptions > + * Contents of rd_options for partitioned tables > + */ > +typedef struct PartitionedOptions > +{ > + int32 vl_len_;/* varlena header (do not touch directly!) */ > + boolparallel_insert_enabled;/* enables planner's use > of parallel insert */ > +} PartitionedOptions; > > houzj, could you please consider naming the struct PartitionedTableRdOptions > as the patch for adding parallel_workers option to partitioned tables does? Thanks for reminding. I agreed that " PartitionedTableRdOptions " is better. Attaching new version patch with this change. Best regards, houzj v27-0003-Parallel-SELECT-for-INSERT-INTO-.-SELECT-advanced-tests.patch Description: v27-0003-Parallel-SELECT-for-INSERT-INTO-.-SELECT-advanced-tests.patch v27-0002-Add-new-GUC-option-enable_parallel_insert-boolean.patch Description: v27-0002-Add-new-GUC-option-enable_parallel_insert-boolean.patch
RE: Parallel INSERT (INTO ... SELECT ...)
> > The problem is that target_rel_trigger_max_parallel_hazard and its > > caller think they can use a relcache TriggerDesc field across other > > cache accesses, which they can't because the relcache doesn't > > guarantee that that won't move. > > > > One approach would be to add logic to RelationClearRelation similar to > > what it does for tupdescs, rules, etc, to avoid moving them when their > > contents haven't changed. But given that we've not needed that for > > the past several decades, I'm disinclined to add the overhead. I > > think this code ought to be adjusted to not make its own copy of the > > trigdesc pointer, but instead fetch it out of the relcache struct each > > time it is accessed. There's no real reason why > > target_rel_trigger_max_parallel_hazard shouldn't be passed the > > (stable) Relation pointer instead of just the trigdesc pointer. > > > > I have attached a patch to fix the issue, based on your suggestion (tested > with > CLOBBER_CACHE_ALWAYS defined). > > > BTW, having special logic for FK triggers in > > target_rel_trigger_max_parallel_hazard seems quite loony to me. > > Why isn't that handled by setting appropriate proparallel values for > > those trigger functions? > > > > ... and also attached a patch to update the code for this issue. > > (2nd patch relies on application of the 1st patch) > > Thanks again for pointing out these problems. I have tested the triggerdesc bugfix patch with CLOBBER_CACHE_ALWAYS flag. It passed the testset where is fail in buildfarm (foreign_key, foreign_data). And the patch LGTM. Best regards, houzj
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Mar 12, 2021 at 5:00 AM Tom Lane wrote: > > > The problem is that target_rel_trigger_max_parallel_hazard and its caller > think they can use a relcache TriggerDesc field across other cache > accesses, which they can't because the relcache doesn't guarantee that > that won't move. > > One approach would be to add logic to RelationClearRelation similar to > what it does for tupdescs, rules, etc, to avoid moving them when their > contents haven't changed. But given that we've not needed that for the > past several decades, I'm disinclined to add the overhead. I think this > code ought to be adjusted to not make its own copy of the trigdesc > pointer, but instead fetch it out of the relcache struct each time it is > accessed. There's no real reason why > target_rel_trigger_max_parallel_hazard shouldn't be passed the (stable) > Relation pointer instead of just the trigdesc pointer. > I have attached a patch to fix the issue, based on your suggestion (tested with CLOBBER_CACHE_ALWAYS defined). > BTW, having special logic for FK triggers in > target_rel_trigger_max_parallel_hazard seems quite loony to me. > Why isn't that handled by setting appropriate proparallel values > for those trigger functions? > ... and also attached a patch to update the code for this issue. (2nd patch relies on application of the 1st patch) Thanks again for pointing out these problems. Regards, Greg Nancarrow Fujitsu Australia v1-0001-Fix-TriggerDesc-relcache-bug-introduced-by-recent-commit.patch Description: Binary data v1-0001-Better-implement-FK-trigger-parallel-safety-checking.patch Description: Binary data
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Mar 12, 2021 at 6:10 AM Justin Pryzby wrote: > Note also this CF entry > https://commitfest.postgresql.org/32/2987/ > | Allow setting parallel_workers on partitioned tables +/* + * PartitionedOptions + * Contents of rd_options for partitioned tables + */ +typedef struct PartitionedOptions +{ + int32 vl_len_;/* varlena header (do not touch directly!) */ + boolparallel_insert_enabled;/* enables planner's use of parallel insert */ +} PartitionedOptions; houzj, could you please consider naming the struct PartitionedTableRdOptions as the patch for adding parallel_workers option to partitioned tables does? -- Amit Langote EDB: http://www.enterprisedb.com
RE: Parallel INSERT (INTO ... SELECT ...)
> On Thu, Mar 11, 2021 at 01:01:42PM +, houzj.f...@fujitsu.com wrote: > > > I guess to have the finer granularity we'd have to go with > > > enable_parallel_insert, which then would mean possibly having to > > > later add enable_parallel_update, should parallel update have > > > similar potential overhead in the parallel-safety checks (which to > > > me, looks like it could, and parallel delete may not ...) > > > > > > It's a shame there is no "set" type for GUC options. > > > e.g. > > > enable_parallel_dml='insert,update' > > > Maybe that's going too far. > > Isn't that just GUC_LIST_INPUT ? > I'm not sure why it'd be going to far ? > > The GUC-setting assign hook can parse the enable_parallel_dml_list value set > by the user, and set an internal int/bits enable_parallel_dml variable with > some > define/enum values like: > > GUC_PARALLEL_DML_INSERT 0x01 > GUC_PARALLEL_DML_DELETE 0x02 > GUC_PARALLEL_DML_UPDATE 0x04 > > The namespace.c assign hook is a good prototype for this. The parsed, > integer GUC can probably be a static variable in clauses.c. > > Then, the planner can check if: > |commandType == CMD_INSERT && > | (enable_parallel_dml & GUC_PARALLEL_DML_INSERT) != 0 > [...] > > + this table. When enabled (and provided that > + is also > + true), I think this ideas works, but we still need to consider about the reloption. After looking into the reloption, I think postgres do not have a list-like type for reloption. And I think it's better that the guc and reloption is consistent. Besides, a list type guc option that only support one valid value 'insert' seems a little weird to me(we only support parallel insert for now). So, I tend to keep the current style of guc option. If we turn out that we do need same option to restrict update/delete, we can improve this in the future What do you think ? Best regards, houzj
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Mar 12, 2021 at 5:00 AM Tom Lane wrote: > > The buildfarm has revealed that this patch doesn't work under > CLOBBER_CACHE_ALWAYS: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=husky=2021-03-10%2021%3A09%3A32 > > I initially thought that that was a problem with c3ffe34863, but after > reproducing it here I get this stack trace: > > #0 target_rel_trigger_max_parallel_hazard (context=0x7fff9cbfc220, > trigdesc=0x7fdd7dfa9718) at clauses.c:971 > #1 target_rel_max_parallel_hazard_recurse (command_type=CMD_INSERT, > context=0x7fff9cbfc220, rel=0x7fdd7df819e0) at clauses.c:929 > #2 target_rel_max_parallel_hazard_recurse (rel=0x7fdd7df819e0, > command_type=, context=0x7fff9cbfc220) at clauses.c:893 > #3 0x0075d26e in target_rel_max_parallel_hazard ( > context=0x7fff9cbfc220) at clauses.c:884 > #4 max_parallel_hazard_walker (node=node@entry=0x146e590, > context=context@entry=0x7fff9cbfc220) at clauses.c:831 > #5 0x00700812 in range_table_entry_walker (rte=0x146e590, > walker=0x75cf00 , context=0x7fff9cbfc220, > flags=16) at nodeFuncs.c:2467 > #6 0x00700927 in range_table_walker (rtable=0x11fdd70, > walker=walker@entry=0x75cf00 , > context=context@entry=0x7fff9cbfc220, flags=16) at nodeFuncs.c:2446 > #7 0x00700ada in query_tree_walker (flags=, > context=0x7fff9cbfc220, walker=0x75cf00 , > query=0x11fdc58) at nodeFuncs.c:2423 > #8 query_tree_walker (query=query@entry=0x700927 , > walker=walker@entry=0x75cf00 , > context=context@entry=0x11fdc58, flags=) at > nodeFuncs.c:2336 > #9 0x0075d138 in max_parallel_hazard_walker ( > node=node@entry=0x11fdc58, context=0x11fdc58, > context@entry=0x7fff9cbfc220) > at clauses.c:853 > #10 0x0075dc98 in max_parallel_hazard (parse=parse@entry=0x11fdc58, > glob=glob@entry=0x11fdb40) at clauses.c:585 > #11 0x0074cd22 in standard_planner (parse=0x11fdc58, > query_string=, cursorOptions=256, > boundParams=) at planner.c:345 > #12 0x00814947 in pg_plan_query (querytree=0x11fdc58, > query_string=0x11fc740 "insert into fk_notpartitioned_pk (a, b)\n select > 2048, x from generate_series(1,10) x;", cursorOptions=256, boundParams=0x0) > at postgres.c:809 > #13 0x00814a43 in pg_plan_queries (querytrees=0x14725d0, > query_string=query_string@entry=0x11fc740 "insert into > fk_notpartitioned_pk (a, b)\n select 2048, x from generate_series(1,10) x;", > cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) > at postgres.c:900 > #14 0x00814d35 in exec_simple_query ( > query_string=0x11fc740 "insert into fk_notpartitioned_pk (a, b)\n select > 2048, x from generate_series(1,10) x;") at postgres.c:1092 > > The problem is that target_rel_trigger_max_parallel_hazard and its caller > think they can use a relcache TriggerDesc field across other cache > accesses, which they can't because the relcache doesn't guarantee that > that won't move. > > One approach would be to add logic to RelationClearRelation similar to > what it does for tupdescs, rules, etc, to avoid moving them when their > contents haven't changed. But given that we've not needed that for the > past several decades, I'm disinclined to add the overhead. I think this > code ought to be adjusted to not make its own copy of the trigdesc > pointer, but instead fetch it out of the relcache struct each time it is > accessed. There's no real reason why > target_rel_trigger_max_parallel_hazard shouldn't be passed the (stable) > Relation pointer instead of just the trigdesc pointer. > > BTW, having special logic for FK triggers in > target_rel_trigger_max_parallel_hazard seems quite loony to me. > Why isn't that handled by setting appropriate proparallel values > for those trigger functions? > Thanks Tom for your investigation, detailed analysis and suggested code fixes. Will work on getting these issues corrected ASAP (and, er, removing the looniness ...). Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 11, 2021 at 01:01:42PM +, houzj.f...@fujitsu.com wrote: > > I guess to have the finer granularity we'd have to go with > > enable_parallel_insert, > > which then would mean possibly having to later add enable_parallel_update, > > should parallel update have similar potential overhead in the > > parallel-safety > > checks (which to me, looks like it could, and parallel delete may not ...) > > > > It's a shame there is no "set" type for GUC options. > > e.g. > > enable_parallel_dml='insert,update' > > Maybe that's going too far. Isn't that just GUC_LIST_INPUT ? I'm not sure why it'd be going to far ? The GUC-setting assign hook can parse the enable_parallel_dml_list value set by the user, and set an internal int/bits enable_parallel_dml variable with some define/enum values like: GUC_PARALLEL_DML_INSERT 0x01 GUC_PARALLEL_DML_DELETE 0x02 GUC_PARALLEL_DML_UPDATE 0x04 The namespace.c assign hook is a good prototype for this. The parsed, integer GUC can probably be a static variable in clauses.c. Then, the planner can check if: |commandType == CMD_INSERT && | (enable_parallel_dml & GUC_PARALLEL_DML_INSERT) != 0 [...] + this table. When enabled (and provided that + is also true), It seems like this usefully allows the GUC to be enabled, and reloption to be disabled. But if the GUC is disabled, then it's impossible to enable for a single table. That seems unfortunate. I think part of the issue is the naming. If the GUC is called "enable_*", then setting it to "off" should disable it entirely, for consistency with other GUCs. So maybe it needs another name, like parallel_dml='insert'. I think maybe "all" should be an accepted value. Note also this CF entry https://commitfest.postgresql.org/32/2987/ | Allow setting parallel_workers on partitioned tables -- Justin
Re: Parallel INSERT (INTO ... SELECT ...)
The buildfarm has revealed that this patch doesn't work under CLOBBER_CACHE_ALWAYS: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=husky=2021-03-10%2021%3A09%3A32 I initially thought that that was a problem with c3ffe34863, but after reproducing it here I get this stack trace: #0 target_rel_trigger_max_parallel_hazard (context=0x7fff9cbfc220, trigdesc=0x7fdd7dfa9718) at clauses.c:971 #1 target_rel_max_parallel_hazard_recurse (command_type=CMD_INSERT, context=0x7fff9cbfc220, rel=0x7fdd7df819e0) at clauses.c:929 #2 target_rel_max_parallel_hazard_recurse (rel=0x7fdd7df819e0, command_type=, context=0x7fff9cbfc220) at clauses.c:893 #3 0x0075d26e in target_rel_max_parallel_hazard ( context=0x7fff9cbfc220) at clauses.c:884 #4 max_parallel_hazard_walker (node=node@entry=0x146e590, context=context@entry=0x7fff9cbfc220) at clauses.c:831 #5 0x00700812 in range_table_entry_walker (rte=0x146e590, walker=0x75cf00 , context=0x7fff9cbfc220, flags=16) at nodeFuncs.c:2467 #6 0x00700927 in range_table_walker (rtable=0x11fdd70, walker=walker@entry=0x75cf00 , context=context@entry=0x7fff9cbfc220, flags=16) at nodeFuncs.c:2446 #7 0x00700ada in query_tree_walker (flags=, context=0x7fff9cbfc220, walker=0x75cf00 , query=0x11fdc58) at nodeFuncs.c:2423 #8 query_tree_walker (query=query@entry=0x700927 , walker=walker@entry=0x75cf00 , context=context@entry=0x11fdc58, flags=) at nodeFuncs.c:2336 #9 0x0075d138 in max_parallel_hazard_walker ( node=node@entry=0x11fdc58, context=0x11fdc58, context@entry=0x7fff9cbfc220) at clauses.c:853 #10 0x0075dc98 in max_parallel_hazard (parse=parse@entry=0x11fdc58, glob=glob@entry=0x11fdb40) at clauses.c:585 #11 0x0074cd22 in standard_planner (parse=0x11fdc58, query_string=, cursorOptions=256, boundParams=) at planner.c:345 #12 0x00814947 in pg_plan_query (querytree=0x11fdc58, query_string=0x11fc740 "insert into fk_notpartitioned_pk (a, b)\n select 2048, x from generate_series(1,10) x;", cursorOptions=256, boundParams=0x0) at postgres.c:809 #13 0x00814a43 in pg_plan_queries (querytrees=0x14725d0, query_string=query_string@entry=0x11fc740 "insert into fk_notpartitioned_pk (a, b)\n select 2048, x from generate_series(1,10) x;", cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0) at postgres.c:900 #14 0x00814d35 in exec_simple_query ( query_string=0x11fc740 "insert into fk_notpartitioned_pk (a, b)\n select 2048, x from generate_series(1,10) x;") at postgres.c:1092 The problem is that target_rel_trigger_max_parallel_hazard and its caller think they can use a relcache TriggerDesc field across other cache accesses, which they can't because the relcache doesn't guarantee that that won't move. One approach would be to add logic to RelationClearRelation similar to what it does for tupdescs, rules, etc, to avoid moving them when their contents haven't changed. But given that we've not needed that for the past several decades, I'm disinclined to add the overhead. I think this code ought to be adjusted to not make its own copy of the trigdesc pointer, but instead fetch it out of the relcache struct each time it is accessed. There's no real reason why target_rel_trigger_max_parallel_hazard shouldn't be passed the (stable) Relation pointer instead of just the trigdesc pointer. BTW, having special logic for FK triggers in target_rel_trigger_max_parallel_hazard seems quite loony to me. Why isn't that handled by setting appropriate proparallel values for those trigger functions? regards, tom lane
RE: Parallel INSERT (INTO ... SELECT ...)
> I guess to have the finer granularity we'd have to go with > enable_parallel_insert, > which then would mean possibly having to later add enable_parallel_update, > should parallel update have similar potential overhead in the parallel-safety > checks (which to me, looks like it could, and parallel delete may not ...) > > It's a shame there is no "set" type for GUC options. > e.g. > enable_parallel_dml='insert,update' > Maybe that's going too far. > > > 2. Should we keep the default value of GUC to on or off? It is > > currently off. I am fine keeping it off for this release and we can > > always turn it on in the later releases if required. Having said that, > > I see the value in keeping it on because in many cases Insert ... > > Select will be used for large data and there we will see a benefit of > > parallelism and users facing trouble (who have a very large number of > > partitions with less data to query) can still disable the parallel > > insert for that particular table. Also, the other benefit of keeping > > it on till at least the beta period is that this functionality will > > get tested and if we found reports of regression then we can turn it > > off for this release as well. > > > > I'd agree to keeping it on by default (and it can be selectively turned off > for a > particular table using the reloption, if needed). Thanks, attaching new version patch keeping the default value of guc option to ON. Best regards, houzj v26-0003-Parallel-SELECT-for-INSERT-INTO-.-SELECT-advanced-tests.patch Description: v26-0003-Parallel-SELECT-for-INSERT-INTO-.-SELECT-advanced-tests.patch v26-0002-Add-new-GUC-option-enable_parallel_insert-boolean.patch Description: v26-0002-Add-new-GUC-option-enable_parallel_insert-boolean.patch
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Mar 10, 2021 at 8:18 PM Amit Kapila wrote: > > Now, coming back to Hou-San's patch to introduce a GUC and reloption > for this feature, I think both of those make sense to me because when > the feature is enabled via GUC, one might want to disable it for > partitioned tables? Do we agree on that part or someone thinks > otherwise? > Having both makes sense to me. > The other points to bikeshed could be: > 1. The name of GUC and reloption. The two proposals at hand are > enable_parallel_dml and enable_parallel_insert. I would prefer the > second (enable_parallel_insert) because updates/deletes might not have > a similar overhead. > I guess to have the finer granularity we'd have to go with enable_parallel_insert, which then would mean possibly having to later add enable_parallel_update, should parallel update have similar potential overhead in the parallel-safety checks (which to me, looks like it could, and parallel delete may not ...) It's a shame there is no "set" type for GUC options. e.g. enable_parallel_dml='insert,update' Maybe that's going too far. > 2. Should we keep the default value of GUC to on or off? It is > currently off. I am fine keeping it off for this release and we can > always turn it on in the later releases if required. Having said that, > I see the value in keeping it on because in many cases Insert ... > Select will be used for large data and there we will see a benefit of > parallelism and users facing trouble (who have a very large number of > partitions with less data to query) can still disable the parallel > insert for that particular table. Also, the other benefit of keeping > it on till at least the beta period is that this functionality will > get tested and if we found reports of regression then we can turn it > off for this release as well. > I'd agree to keeping it on by default (and it can be selectively turned off for a particular table using the reloption, if needed). Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Mar 10, 2021 at 6:18 PM Amit Kapila wrote: > On Mon, Mar 8, 2021 at 7:19 PM Amit Langote wrote: > > I just read through v25 and didn't find anything to complain about. > > Thanks a lot, pushed now! Amit L., your inputs are valuable for this work. Glad I could be of help. Really appreciate that you credited me as author for whatever small part I contributed. > Now, coming back to Hou-San's patch to introduce a GUC and reloption > for this feature, I think both of those make sense to me because when > the feature is enabled via GUC, one might want to disable it for > partitioned tables? Do we agree on that part or someone thinks > otherwise? Agree to have both. > The other points to bikeshed could be: > 1. The name of GUC and reloption. The two proposals at hand are > enable_parallel_dml and enable_parallel_insert. I would prefer the > second (enable_parallel_insert) because updates/deletes might not have > a similar overhead. Sounds reasonable. > 2. Should we keep the default value of GUC to on or off? It is > currently off. I am fine keeping it off for this release and we can > always turn it on in the later releases if required. Having said that, > I see the value in keeping it on because in many cases Insert ... > Select will be used for large data and there we will see a benefit of > parallelism and users facing trouble (who have a very large number of > partitions with less data to query) can still disable the parallel > insert for that particular table. Also, the other benefit of keeping > it on till at least the beta period is that this functionality will > get tested and if we found reports of regression then we can turn it > off for this release as well. This makes sense too. -- Amit Langote EDB: http://www.enterprisedb.com
RE: Parallel INSERT (INTO ... SELECT ...)
From: Amit Kapila > Now, coming back to Hou-San's patch to introduce a GUC and reloption > for this feature, I think both of those make sense to me because when > the feature is enabled via GUC, one might want to disable it for > partitioned tables? Do we agree on that part or someone thinks > otherwise? > > The other points to bikeshed could be: > 1. The name of GUC and reloption. The two proposals at hand are > enable_parallel_dml and enable_parallel_insert. I would prefer the > second (enable_parallel_insert) because updates/deletes might not have > a similar overhead. > > 2. Should we keep the default value of GUC to on or off? It is > currently off. I am fine keeping it off for this release and we can > always turn it on in the later releases if required. Having said that, > I see the value in keeping it on because in many cases Insert ... > Select will be used for large data and there we will see a benefit of > parallelism and users facing trouble (who have a very large number of > partitions with less data to query) can still disable the parallel > insert for that particular table. Also, the other benefit of keeping > it on till at least the beta period is that this functionality will > get tested and if we found reports of regression then we can turn it > off for this release as well. > > Thoughts? TBH, I'm a bit unsure, but the names with _insert would be OK. The reason why Oracle has ENABLE PARALLEL DML clause and the parallel DML is disabled by default is probably due to the following critical restriction (and they have other less severe restrictions in parallel execution.) Our implementation does not have things like this. "Serial or parallel statements that attempt to access a table that has been modified in parallel within the same transaction are rejected with an error message." "A transaction can contain multiple parallel DML statements that modify different tables, but after a parallel DML statement modifies a table, no subsequent serial or parallel statement (DML or query) can access the same table again in that transaction." OTOH, I'm a bit concerned about whether we would have a common reason to disable parallel INSERT, UPDATE and DELETE. Oracle states space usage difference as follows. I wonder if something similar would apply to our parallel UPDATE/DELETE, particularly UPDATE. At least, we already know parallel INSERT results in larger tables and indexes compared to serial execution. "Parallel UPDATE uses the existing free space in the object, while direct-path INSERT gets new extents for the data." "Space usage characteristics may be different in parallel than serial execution because multiple concurrent child transactions modify the object." Even if that's the case, we can add _update parameters, although we feel sorry to cause users trouble to have to be aware of multiple parameters. Or, when it has turned out that we need _update and/or _delete parameters, we can opt to rename _insert to _dml, and keep the _insert parameter as an old, hidden synonym. Regards Takayuki Tsunakawa
RE: Parallel INSERT (INTO ... SELECT ...)
> > 2. Should we keep the default value of GUC to on or off? It is > > currently off. I am fine keeping it off for this release and we can > > always turn it on in the later releases if required. Having said that, > > I see the value in keeping it on because in many cases Insert ... > > Select will be used for large data and there we will see a benefit of > > parallelism and users facing trouble (who have a very large number of > > partitions with less data to query) can still disable the parallel > > insert for that particular table. Also, the other benefit of keeping > > it on till at least the beta period is that this functionality will > > get tested and if we found reports of regression then we can turn it > > off for this release as well. > > > > Thoughts? > > IMHO, we should keep it on because in most of the cases it is going the give > benefit to the user, and if for some specific scenario where a table has a > lot of > partition then it can be turned off using reloption. And, if for some users > most > of the tables are like that where they always have a lot of partition then > the user > can turn it off using guc. > I think your suggestion makes sense. If no one have other opinions, I will post a new version set default enable_parallel_insert=on soon. Best regards, houzj
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Mar 10, 2021 at 2:48 PM Amit Kapila wrote: > > On Mon, Mar 8, 2021 at 7:19 PM Amit Langote wrote: > > > > Hi Amit > > > > On Mon, Mar 8, 2021 at 10:18 PM Amit Kapila wrote: > > > On Mon, Mar 8, 2021 at 3:54 PM Greg Nancarrow wrote: > > > > I've attached an updated set of patches with the suggested locking > > > > changes. > > > > (Thanks Greg.) > > > > > Amit L, others, do let me know if you have still more comments on > > > 0001* patch or if you want to review it further? > > > > I just read through v25 and didn't find anything to complain about. > > > > Thanks a lot, pushed now! Amit L., your inputs are valuable for this work. > > Now, coming back to Hou-San's patch to introduce a GUC and reloption > for this feature, I think both of those make sense to me because when > the feature is enabled via GUC, one might want to disable it for > partitioned tables? Do we agree on that part or someone thinks > otherwise? I think it makes sense to have both. > > The other points to bikeshed could be: > 1. The name of GUC and reloption. The two proposals at hand are > enable_parallel_dml and enable_parallel_insert. I would prefer the > second (enable_parallel_insert) because updates/deletes might not have > a similar overhead. I agree enable_parallel_insert makes more sense. > 2. Should we keep the default value of GUC to on or off? It is > currently off. I am fine keeping it off for this release and we can > always turn it on in the later releases if required. Having said that, > I see the value in keeping it on because in many cases Insert ... > Select will be used for large data and there we will see a benefit of > parallelism and users facing trouble (who have a very large number of > partitions with less data to query) can still disable the parallel > insert for that particular table. Also, the other benefit of keeping > it on till at least the beta period is that this functionality will > get tested and if we found reports of regression then we can turn it > off for this release as well. > > Thoughts? IMHO, we should keep it on because in most of the cases it is going the give benefit to the user, and if for some specific scenario where a table has a lot of partition then it can be turned off using reloption. And, if for some users most of the tables are like that where they always have a lot of partition then the user can turn it off using guc. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 8, 2021 at 7:19 PM Amit Langote wrote: > > Hi Amit > > On Mon, Mar 8, 2021 at 10:18 PM Amit Kapila wrote: > > On Mon, Mar 8, 2021 at 3:54 PM Greg Nancarrow wrote: > > > I've attached an updated set of patches with the suggested locking > > > changes. > > (Thanks Greg.) > > > Amit L, others, do let me know if you have still more comments on > > 0001* patch or if you want to review it further? > > I just read through v25 and didn't find anything to complain about. > Thanks a lot, pushed now! Amit L., your inputs are valuable for this work. Now, coming back to Hou-San's patch to introduce a GUC and reloption for this feature, I think both of those make sense to me because when the feature is enabled via GUC, one might want to disable it for partitioned tables? Do we agree on that part or someone thinks otherwise? The other points to bikeshed could be: 1. The name of GUC and reloption. The two proposals at hand are enable_parallel_dml and enable_parallel_insert. I would prefer the second (enable_parallel_insert) because updates/deletes might not have a similar overhead. 2. Should we keep the default value of GUC to on or off? It is currently off. I am fine keeping it off for this release and we can always turn it on in the later releases if required. Having said that, I see the value in keeping it on because in many cases Insert ... Select will be used for large data and there we will see a benefit of parallelism and users facing trouble (who have a very large number of partitions with less data to query) can still disable the parallel insert for that particular table. Also, the other benefit of keeping it on till at least the beta period is that this functionality will get tested and if we found reports of regression then we can turn it off for this release as well. Thoughts? -- With Regards, Amit Kapila.
RE: Parallel INSERT (INTO ... SELECT ...)
> > > > I've attached an updated set of patches with the suggested locking changes. > > > > Amit L, others, do let me know if you have still more comments on > 0001* patch or if you want to review it further? I took a look into the latest 0001 patch, and it looks good to me. Best regards, houzj
Re: Parallel INSERT (INTO ... SELECT ...)
Hi Amit On Mon, Mar 8, 2021 at 10:18 PM Amit Kapila wrote: > On Mon, Mar 8, 2021 at 3:54 PM Greg Nancarrow wrote: > > I've attached an updated set of patches with the suggested locking changes. (Thanks Greg.) > Amit L, others, do let me know if you have still more comments on > 0001* patch or if you want to review it further? I just read through v25 and didn't find anything to complain about. -- Amit Langote EDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 8, 2021 at 3:54 PM Greg Nancarrow wrote: > > I've attached an updated set of patches with the suggested locking changes. > Amit L, others, do let me know if you have still more comments on 0001* patch or if you want to review it further? -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 8, 2021 at 6:25 PM Amit Langote wrote: > > A couple of things that look odd in v24-0001 (sorry if there were like > that from the beginning): > > +static bool > +target_rel_max_parallel_hazard(max_parallel_hazard_context *context) > +{ > + boolmax_hazard_found; > + > + RelationtargetRel; > + > + /* > +* The target table is already locked by the caller (this is done in the > +* parse/analyze phase), and remains locked until end-of-transaction. > +*/ > + targetRel = table_open(context->target_rte->relid, > + context->target_rte->rellockmode); > > The comment seems to imply that the lock need not be retaken here, but > the code does. Maybe it's fine to pass NoLock here, but use > rellockmode when locking partitions, because they would not have been > locked by the parser/analyzer. Actually, it was originally NoLock, but I think in your suggested changes (v15_delta.diff) to better integrate the extra parallel-safety checks into max_parallel_hazard(), you changed "NoLock" to "context->targetRTE->rellockmode".. Having said that, since the table is already locked, I think that using "context->target_rte->rellockmode" in this case just results in the lock reference count being incremented, so I doubt it really makes any difference, since it's locked until end-of-transaction. I'll revert it back to NoLock. >Which brings me to: > > +static bool > +target_rel_partitions_max_parallel_hazard(Relation rel, > + max_parallel_hazard_context > *context) > +{ > ... > + for (i = 0; i < pdesc->nparts; i++) > + { > + boolmax_hazard_found; > + Relationpart_rel; > + > + /* > +* The partition needs to be locked, and remain locked until > +* end-of-transaction to ensure its parallel-safety state is not > +* hereafter altered. > +*/ > + part_rel = table_open(pdesc->oids[i], AccessShareLock); > > Not that I can prove there to be any actual hazard, but we tend to > avoid taking locks with different strengths in the same query as would > occur with this piece of code. We're locking the partition with > AccessShareLock here, but the executor will lock the partition with > the stronger RowExclusiveLock mode before trying to insert into it. > Better to use the stronger mode to begin with or just use the target > partitioned table's RTE's rellockmode which would be RowExclusiveLock. > You can see that that's what AcquireExecutorLocksOnPartitions() will > do in the generic plan case. > OK, I see what you mean, best to use the target partitioned table's RTE's rellockmode in this case then. I'll update the patch accordingly. Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 8, 2021 at 12:55 PM Amit Langote wrote: > > Hi Amit, Greg, > > Sorry, I hadn't noticed last week that some questions were addressed to me. > > On Sat, Mar 6, 2021 at 7:19 PM Amit Kapila wrote: > > Thanks, your changes look good to me. I went ahead and changed the > > patch to track the partitionOids once rather than in two separate > > lists and use that list in PlanCacheRelCallback to invalidate the > > plans. > > Not mixing the partition OIDs into relationOids seems fine to me. I > had considered that option too, but somehow forgot to mention it here. > Okay, thanks for the confirmation. > A couple of things that look odd in v24-0001 (sorry if there were like > that from the beginning): > > +static bool > +target_rel_max_parallel_hazard(max_parallel_hazard_context *context) > +{ > + boolmax_hazard_found; > + > + RelationtargetRel; > + > + /* > +* The target table is already locked by the caller (this is done in the > +* parse/analyze phase), and remains locked until end-of-transaction. > +*/ > + targetRel = table_open(context->target_rte->relid, > + context->target_rte->rellockmode); > > The comment seems to imply that the lock need not be retaken here, but > the code does. Maybe it's fine to pass NoLock here, but use > rellockmode when locking partitions, because they would not have been > locked by the parser/analyzer. Which brings me to: > > +static bool > +target_rel_partitions_max_parallel_hazard(Relation rel, > + max_parallel_hazard_context > *context) > +{ > ... > + for (i = 0; i < pdesc->nparts; i++) > + { > + boolmax_hazard_found; > + Relationpart_rel; > + > + /* > +* The partition needs to be locked, and remain locked until > +* end-of-transaction to ensure its parallel-safety state is not > +* hereafter altered. > +*/ > + part_rel = table_open(pdesc->oids[i], AccessShareLock); > > Not that I can prove there to be any actual hazard, but we tend to > avoid taking locks with different strengths in the same query as would > occur with this piece of code. We're locking the partition with > AccessShareLock here, but the executor will lock the partition with > the stronger RowExclusiveLock mode before trying to insert into it. > Better to use the stronger mode to begin with or just use the target > partitioned table's RTE's rellockmode which would be RowExclusiveLock. > You can see that that's what AcquireExecutorLocksOnPartitions() will > do in the generic plan case. > Both of your comments make sense to me. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
Hi Amit, Greg, Sorry, I hadn't noticed last week that some questions were addressed to me. On Sat, Mar 6, 2021 at 7:19 PM Amit Kapila wrote: > Thanks, your changes look good to me. I went ahead and changed the > patch to track the partitionOids once rather than in two separate > lists and use that list in PlanCacheRelCallback to invalidate the > plans. Not mixing the partition OIDs into relationOids seems fine to me. I had considered that option too, but somehow forgot to mention it here. A couple of things that look odd in v24-0001 (sorry if there were like that from the beginning): +static bool +target_rel_max_parallel_hazard(max_parallel_hazard_context *context) +{ + boolmax_hazard_found; + + RelationtargetRel; + + /* +* The target table is already locked by the caller (this is done in the +* parse/analyze phase), and remains locked until end-of-transaction. +*/ + targetRel = table_open(context->target_rte->relid, + context->target_rte->rellockmode); The comment seems to imply that the lock need not be retaken here, but the code does. Maybe it's fine to pass NoLock here, but use rellockmode when locking partitions, because they would not have been locked by the parser/analyzer. Which brings me to: +static bool +target_rel_partitions_max_parallel_hazard(Relation rel, + max_parallel_hazard_context *context) +{ ... + for (i = 0; i < pdesc->nparts; i++) + { + boolmax_hazard_found; + Relationpart_rel; + + /* +* The partition needs to be locked, and remain locked until +* end-of-transaction to ensure its parallel-safety state is not +* hereafter altered. +*/ + part_rel = table_open(pdesc->oids[i], AccessShareLock); Not that I can prove there to be any actual hazard, but we tend to avoid taking locks with different strengths in the same query as would occur with this piece of code. We're locking the partition with AccessShareLock here, but the executor will lock the partition with the stronger RowExclusiveLock mode before trying to insert into it. Better to use the stronger mode to begin with or just use the target partitioned table's RTE's rellockmode which would be RowExclusiveLock. You can see that that's what AcquireExecutorLocksOnPartitions() will do in the generic plan case. -- Amit Langote EDB: http://www.enterprisedb.com
RE: Parallel INSERT (INTO ... SELECT ...)
From: Amit Kapila > For now, I have left 0005 and 0006 patches, we can come back to those once we > are done with the first set of patches. The first patch looks good to me and I > think we can commit it and then bikeshed about GUC/reloption patch. Agreed, it looks good to me, too. Regards Takayuki Tsunakawa
Re: Parallel INSERT (INTO ... SELECT ...)
On Sat, Mar 6, 2021 at 9:19 PM Amit Kapila wrote: > > On Fri, Mar 5, 2021 at 6:34 PM Greg Nancarrow wrote: > > > > For the time being at least, I am posting an updated set of patches, > > as I found that the additional parallel-safety checks on DOMAIN check > > constraints to be somewhat inefficient and could also be better > > integrated into max_parallel_hazard(). I also updated the basic tests > > with a test case for this. > > > > Thanks, your changes look good to me. I went ahead and changed the > patch to track the partitionOids once rather than in two separate > lists and use that list in PlanCacheRelCallback to invalidate the > plans. I made few other changes: > a. don't need to retain the lock on indexes as we can't change index > expressions > b. added/updated comments at few places in the code. > c. made a few other cosmetic changes > d. ran pgindent > e. slightly modify the commit message. > f. combine the code and test case patch > > For now, I have left 0005 and 0006 patches, we can come back to those > once we are done with the first set of patches. The first patch looks > good to me and I think we can commit it and then bikeshed about > GUC/reloption patch. > I've checked and tested the changes, and the resultant patches look OK to me, so I'm happy for you to commit the first patch. Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
For cfffe83ba82021a1819a656e7ec5c28fb3a99152, if a bool was written (true | false), READ_INT_FIELD calls atoi() where atoi("true") returns 0 and atoi("false") returns 0 as well. I am not sure if the new release containing these commits had a higher cat version compared to the previous release. If the new releases did have a higher cat version, I guess there was no issue, by chance. Cheers On Sat, Mar 6, 2021 at 8:12 PM Amit Kapila wrote: > On Sun, Mar 7, 2021 at 8:24 AM Zhihong Yu wrote: > > > > I was looking at src/backend/nodes/readfuncs.c > > > > READ_NODE_FIELD(relationOids); > > + READ_NODE_FIELD(partitionOids); > > > > READ_NODE_FIELD would call nodeRead() for partitionOids. However, such > field may not exist. > > Since there is no 'if (strncmp(":partitionOids", token, length) == 0) {' > check, I was curious whether CATALOG_VERSION_NO should be bumped. > > > > Won't that be true only if we are reading the stored planned stmt from > disk? But is it possible in any way? The last two commits in this > function (cfffe83b and 45639a05) also didn't bump the catversion. > > -- > With Regards, > Amit Kapila. >
Re: Parallel INSERT (INTO ... SELECT ...)
On Sun, Mar 7, 2021 at 8:24 AM Zhihong Yu wrote: > > I was looking at src/backend/nodes/readfuncs.c > > READ_NODE_FIELD(relationOids); > + READ_NODE_FIELD(partitionOids); > > READ_NODE_FIELD would call nodeRead() for partitionOids. However, such field > may not exist. > Since there is no 'if (strncmp(":partitionOids", token, length) == 0) {' > check, I was curious whether CATALOG_VERSION_NO should be bumped. > Won't that be true only if we are reading the stored planned stmt from disk? But is it possible in any way? The last two commits in this function (cfffe83b and 45639a05) also didn't bump the catversion. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
I was looking at src/backend/nodes/readfuncs.c READ_NODE_FIELD(relationOids); + READ_NODE_FIELD(partitionOids); READ_NODE_FIELD would call nodeRead() for partitionOids. However, such field may not exist. Since there is no 'if (strncmp(":partitionOids", token, length) == 0) {' check, I was curious whether CATALOG_VERSION_NO should be bumped. Cheers On Sat, Mar 6, 2021 at 6:31 PM Amit Kapila wrote: > On Sat, Mar 6, 2021 at 9:13 PM Zhihong Yu wrote: > > > > Hi, > > Does CATALOG_VERSION_NO need to be bumped (introduction of partitionOids > field) ? > > > > Good question. I usually update CATALOG_VERSION_NO when the patch > changes any of the system catalogs. This is what is also mentioned in > catversion.h. See the following text in that file: "The catalog > version number is used to flag incompatible changes in the PostgreSQL > system catalogs. Whenever anyone changes the format of a system > catalog relation, or adds, deletes, or modifies standard > catalog entries in such a way that an updated backend wouldn't work > with an old database (or vice versa), the catalog version number > should be changed.". > > I might be missing something here but why you think that due to > partitionOids field (in plannedstmt or at another place) we need to > update CATALOG_VERSION_NO? > > Anyway, thanks for bringing this up. > > -- > With Regards, > Amit Kapila. >
Re: Parallel INSERT (INTO ... SELECT ...)
On Sat, Mar 6, 2021 at 9:13 PM Zhihong Yu wrote: > > Hi, > Does CATALOG_VERSION_NO need to be bumped (introduction of partitionOids > field) ? > Good question. I usually update CATALOG_VERSION_NO when the patch changes any of the system catalogs. This is what is also mentioned in catversion.h. See the following text in that file: "The catalog version number is used to flag incompatible changes in the PostgreSQL system catalogs. Whenever anyone changes the format of a system catalog relation, or adds, deletes, or modifies standard catalog entries in such a way that an updated backend wouldn't work with an old database (or vice versa), the catalog version number should be changed.". I might be missing something here but why you think that due to partitionOids field (in plannedstmt or at another place) we need to update CATALOG_VERSION_NO? Anyway, thanks for bringing this up. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
Hi, Does CATALOG_VERSION_NO need to be bumped (introduction of partitionOids field) ? Cheers On Sat, Mar 6, 2021 at 2:19 AM Amit Kapila wrote: > On Fri, Mar 5, 2021 at 6:34 PM Greg Nancarrow wrote: > > > > For the time being at least, I am posting an updated set of patches, > > as I found that the additional parallel-safety checks on DOMAIN check > > constraints to be somewhat inefficient and could also be better > > integrated into max_parallel_hazard(). I also updated the basic tests > > with a test case for this. > > > > Thanks, your changes look good to me. I went ahead and changed the > patch to track the partitionOids once rather than in two separate > lists and use that list in PlanCacheRelCallback to invalidate the > plans. I made few other changes: > a. don't need to retain the lock on indexes as we can't change index > expressions > b. added/updated comments at few places in the code. > c. made a few other cosmetic changes > d. ran pgindent > e. slightly modify the commit message. > f. combine the code and test case patch > > For now, I have left 0005 and 0006 patches, we can come back to those > once we are done with the first set of patches. The first patch looks > good to me and I think we can commit it and then bikeshed about > GUC/reloption patch. > > > > -- > With Regards, > Amit Kapila. >
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Mar 5, 2021 at 6:34 PM Greg Nancarrow wrote: > > On Fri, Mar 5, 2021 at 9:35 PM Amit Kapila wrote: > > > > On Fri, Mar 5, 2021 at 8:24 AM Greg Nancarrow wrote: > > > > > > > In patch v21-0003-Add-new-parallel-dml-GUC-and-table-options, we are > > introducing GUC (enable_parallel_dml) and table option > > (parallel_dml_enabled) for this feature. I am a bit worried about > > using *_dml in the names because it is quite possible that for > > parallel updates and parallel deletes we might not need any such GUC. > > The reason we mainly need here is due to checking of parallel-safety > > of partitioned tables and updates/deletes handle partitioned tables > > differently than inserts so those might not be that costly. It is > > possible that they are costly due to a different reason but not sure > > mapping those to one GUC or table option is a good idea. Can we > > consider using *_insert instead? I think GUC having _insert can be > > probably used for a parallel copy (from) as well which I think will > > have a similar overhead. > > > > I'll need to think about that one. > Okay, one more minor comment on the same patch: + /* + * Check if parallel_dml_enabled is enabled for the target table, + * if not, skip the safety checks. + * + * (Note: if the target table is partitioned, the parallel_dml_enabled + * option setting of the partitions are ignored). + */ + rte = rt_fetch(parse->resultRelation, parse->rtable); + rel = table_open(rte->relid, NoLock); I think here the patch is using NoLock based on the assumption that the relation is already locked in parse/analyze phase, so it's better to have a comment like the one we have in target_rel_max_parallel_hazard. > I may be wrong, but I would have thought at least updates would have > similar parallel-safety checking requirements to inserts and would > have similar potential cost issues. > For updates and deletes, we go via inheritance planner where we already do some work related to partitions, so I think we might not need to open all the partitions for parallel-safety checks as we do for inserts. I am sure some part of the code for inserts like the one we have max_parallel_hazard will be used for updates as well but there will be probably some change for checking parallel-safety for partitioned relations. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Mar 5, 2021 at 8:24 AM Greg Nancarrow wrote: > In patch v21-0003-Add-new-parallel-dml-GUC-and-table-options, we are introducing GUC (enable_parallel_dml) and table option (parallel_dml_enabled) for this feature. I am a bit worried about using *_dml in the names because it is quite possible that for parallel updates and parallel deletes we might not need any such GUC. The reason we mainly need here is due to checking of parallel-safety of partitioned tables and updates/deletes handle partitioned tables differently than inserts so those might not be that costly. It is possible that they are costly due to a different reason but not sure mapping those to one GUC or table option is a good idea. Can we consider using *_insert instead? I think GUC having _insert can be probably used for a parallel copy (from) as well which I think will have a similar overhead. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Mar 5, 2021 at 5:07 AM Greg Nancarrow wrote: > > On Thu, Mar 4, 2021 at 11:05 PM Amit Kapila wrote: > > > > On Thu, Mar 4, 2021 at 5:24 PM Greg Nancarrow wrote: > > > > > > On Thu, Mar 4, 2021 at 10:07 PM Amit Kapila > > > wrote: > > > > > > > > > > >Also, in > > > > standard_planner, we should add these partitionOids only for > > > > parallel-mode. > > > > > > > > > > It is doing that in v20 patch (what makes you think it isn't?). > > > > > > > The below code snippet: > > + /* For AcquireExecutorLocks(). */ > > + if (IsModifySupportedInParallelMode(parse->commandType)) > > + result->partitionOids = glob->partitionOids; > > > > I understand that you have a check for the parallel mode in > > AcquireExecutorLocks but why can't we have it before adding that to > > planned statement > > > > OK, I think I'm on the same wavelength now (sorry, I didn't realise > you're talking about PlannedStmt). > > What I believe you're suggesting is in the planner where > partitionOids are "added" to the returned PlannedStmt, they should > only be added if glob->parallelModeNeeded is true:. > > i.e. > > /* For AcquireExecutorLocks(). */ > if (glob->partitionOids != NIL && glob->parallelModeNeeded) > result->partitionOids = glob->partitionOids; > > (seems reasonable to me, as then it will match the condition for which > glob->partitionOids are added to glob->relationOids) > > then in plancache.c the check on parallelModeNeeded can be removed: > > /* Lock partitions ahead of modifying them in parallel mode. */ > if (rti == resultRelation && > plannedstmt->partitionOids != NIL) > AcquireExecutorLocksOnPartitions(plannedstmt->partitionOids, > rte->rellockmode, acquire); > > Let me know if this matches what you were thinking. > Yes, that is what I was thinking. But I have another question as well regarding tracking these partitions at two places (once in plannedstmt->relationOids and the second time in plannedstmt->partitionOids). I think it is better if you can prepare a patch with all the comments raised till now leaving this last question aside. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 4, 2021 at 11:05 PM Amit Kapila wrote: > > On Thu, Mar 4, 2021 at 5:24 PM Greg Nancarrow wrote: > > > > On Thu, Mar 4, 2021 at 10:07 PM Amit Kapila wrote: > > > > > > > >Also, in > > > standard_planner, we should add these partitionOids only for > > > parallel-mode. > > > > > > > It is doing that in v20 patch (what makes you think it isn't?). > > > > The below code snippet: > + /* For AcquireExecutorLocks(). */ > + if (IsModifySupportedInParallelMode(parse->commandType)) > + result->partitionOids = glob->partitionOids; > > I understand that you have a check for the parallel mode in > AcquireExecutorLocks but why can't we have it before adding that to > planned statement > OK, I think I'm on the same wavelength now (sorry, I didn't realise you're talking about PlannedStmt). What I believe you're suggesting is in the planner where partitionOids are "added" to the returned PlannedStmt, they should only be added if glob->parallelModeNeeded is true:. i.e. /* For AcquireExecutorLocks(). */ if (glob->partitionOids != NIL && glob->parallelModeNeeded) result->partitionOids = glob->partitionOids; (seems reasonable to me, as then it will match the condition for which glob->partitionOids are added to glob->relationOids) then in plancache.c the check on parallelModeNeeded can be removed: /* Lock partitions ahead of modifying them in parallel mode. */ if (rti == resultRelation && plannedstmt->partitionOids != NIL) AcquireExecutorLocksOnPartitions(plannedstmt->partitionOids, rte->rellockmode, acquire); Let me know if this matches what you were thinking. Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 4, 2021 at 5:24 PM Greg Nancarrow wrote: > > On Thu, Mar 4, 2021 at 10:07 PM Amit Kapila wrote: > > > > >Also, in > > standard_planner, we should add these partitionOids only for > > parallel-mode. > > > > It is doing that in v20 patch (what makes you think it isn't?). > The below code snippet: + /* For AcquireExecutorLocks(). */ + if (IsModifySupportedInParallelMode(parse->commandType)) + result->partitionOids = glob->partitionOids; I understand that you have a check for the parallel mode in AcquireExecutorLocks but why can't we have it before adding that to planned statement -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 4, 2021 at 10:07 PM Amit Kapila wrote: > > On Fri, Feb 26, 2021 at 10:37 AM Amit Langote wrote: > > > > I realized that there is a race condition that will allow a concurrent > > backend to invalidate a parallel plan (for insert into a partitioned > > table) at a point in time when it's too late for plancache.c to detect > > it. It has to do with how plancache.c locks the relations affected by > > a cached query and its cached plan. Specifically, > > AcquireExecutorLocks(), which locks the relations that need to be > > locked before the plan could be considered safe to execute, does not > > notice the partitions that would have been checked for parallel safety > > when the plan was made. Given that AcquireExecutorLocks() only loops > > over PlannedStmt.rtable and locks exactly the relations found there, > > partitions are not locked. This means that a concurrent backend can, > > for example, add an unsafe trigger to a partition before a parallel > > worker inserts into it only to fail when it does. > > > > Steps to reproduce (tried with v19 set of patches): > > > > drop table if exists rp, foo; > > create table rp (a int) partition by range (a); > > create table rp1 partition of rp for values from (minvalue) to (0); > > create table rp2 partition of rp for values from (0) to (maxvalue); > > create table foo (a) as select generate_series(1, 100); > > set plan_cache_mode to force_generic_plan; > > set enable_parallel_dml to on; > > deallocate all; > > prepare q as insert into rp select * from foo where a%2 = 0; > > explain analyze execute q; > > > > At this point, attach a debugger to this backend and set a breakpoint > > in AcquireExecutorLocks() and execute q again: > > > > -- will execute the cached plan > > explain analyze execute q; > > > > Breakpoint will be hit. Continue till return from the function and > > stop. Start another backend and execute this: > > > > -- will go through, because no lock still taken on the partition > > create or replace function make_table () returns trigger language > > plpgsql as $$ begin create table bar(); return null; end; $$ parallel > > unsafe; > > create trigger ai_rp2 after insert on rp2 for each row execute > > function make_table(); > > > > Back to the debugger, hit continue to let the plan execute. You > > should see this error: > > > > ERROR: cannot start commands during a parallel operation > > CONTEXT: SQL statement "create table bar()" > > PL/pgSQL function make_table() line 1 at SQL statement parallel worker > > > > The attached patch fixes this, > > > > One thing I am a bit worried about this fix is that after this for > parallel-mode, we will maintain partitionOids at two places, one in > plannedstmt->relationOids and the other in plannedstmt->partitionOids. > I guess you have to do that because, in AcquireExecutorLocks, we can't > find which relationOids are corresponding to partitionOids, am I > right? If so, why not just maintain them in plannedstmt->partitionOids > and then make PlanCacheRelCallback consider it? Maybe Amit Langote can kindly comment on this, as it would involve updates to his prior partition-related fixes. >Also, in > standard_planner, we should add these partitionOids only for > parallel-mode. > It is doing that in v20 patch (what makes you think it isn't?). Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 4, 2021 at 4:37 PM Amit Kapila wrote: > > On Fri, Feb 26, 2021 at 10:37 AM Amit Langote wrote: > > > > I realized that there is a race condition that will allow a concurrent > > backend to invalidate a parallel plan (for insert into a partitioned > > table) at a point in time when it's too late for plancache.c to detect > > it. It has to do with how plancache.c locks the relations affected by > > a cached query and its cached plan. Specifically, > > AcquireExecutorLocks(), which locks the relations that need to be > > locked before the plan could be considered safe to execute, does not > > notice the partitions that would have been checked for parallel safety > > when the plan was made. Given that AcquireExecutorLocks() only loops > > over PlannedStmt.rtable and locks exactly the relations found there, > > partitions are not locked. This means that a concurrent backend can, > > for example, add an unsafe trigger to a partition before a parallel > > worker inserts into it only to fail when it does. > > > > Steps to reproduce (tried with v19 set of patches): > > > > drop table if exists rp, foo; > > create table rp (a int) partition by range (a); > > create table rp1 partition of rp for values from (minvalue) to (0); > > create table rp2 partition of rp for values from (0) to (maxvalue); > > create table foo (a) as select generate_series(1, 100); > > set plan_cache_mode to force_generic_plan; > > set enable_parallel_dml to on; > > deallocate all; > > prepare q as insert into rp select * from foo where a%2 = 0; > > explain analyze execute q; > > > > At this point, attach a debugger to this backend and set a breakpoint > > in AcquireExecutorLocks() and execute q again: > > > > -- will execute the cached plan > > explain analyze execute q; > > > > Breakpoint will be hit. Continue till return from the function and > > stop. Start another backend and execute this: > > > > -- will go through, because no lock still taken on the partition > > create or replace function make_table () returns trigger language > > plpgsql as $$ begin create table bar(); return null; end; $$ parallel > > unsafe; > > create trigger ai_rp2 after insert on rp2 for each row execute > > function make_table(); > > > > Back to the debugger, hit continue to let the plan execute. You > > should see this error: > > > > ERROR: cannot start commands during a parallel operation > > CONTEXT: SQL statement "create table bar()" > > PL/pgSQL function make_table() line 1 at SQL statement parallel worker > > > > The attached patch fixes this, > > > > One thing I am a bit worried about this fix is that after this for > parallel-mode, we will maintain partitionOids at two places, one in > plannedstmt->relationOids and the other in plannedstmt->partitionOids. > I guess you have to do that because, in AcquireExecutorLocks, we can't > find which relationOids are corresponding to partitionOids, am I > right? If so, why not just maintain them in plannedstmt->partitionOids > and then make PlanCacheRelCallback consider it? Also, in > standard_planner, we should add these partitionOids only for > parallel-mode. > One more point I was thinking about is whether we need to worry about locking indexes during prepared query execution (similar to what we do for AcquireExecutorLocks). I think we shouldn't be bothered to lock those or even retain lock during parallel-safety checks because one cannot change index expression or predicate. Is my understanding correct or am I missing something and we should be worried about them as well. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Feb 26, 2021 at 10:37 AM Amit Langote wrote: > > I realized that there is a race condition that will allow a concurrent > backend to invalidate a parallel plan (for insert into a partitioned > table) at a point in time when it's too late for plancache.c to detect > it. It has to do with how plancache.c locks the relations affected by > a cached query and its cached plan. Specifically, > AcquireExecutorLocks(), which locks the relations that need to be > locked before the plan could be considered safe to execute, does not > notice the partitions that would have been checked for parallel safety > when the plan was made. Given that AcquireExecutorLocks() only loops > over PlannedStmt.rtable and locks exactly the relations found there, > partitions are not locked. This means that a concurrent backend can, > for example, add an unsafe trigger to a partition before a parallel > worker inserts into it only to fail when it does. > > Steps to reproduce (tried with v19 set of patches): > > drop table if exists rp, foo; > create table rp (a int) partition by range (a); > create table rp1 partition of rp for values from (minvalue) to (0); > create table rp2 partition of rp for values from (0) to (maxvalue); > create table foo (a) as select generate_series(1, 100); > set plan_cache_mode to force_generic_plan; > set enable_parallel_dml to on; > deallocate all; > prepare q as insert into rp select * from foo where a%2 = 0; > explain analyze execute q; > > At this point, attach a debugger to this backend and set a breakpoint > in AcquireExecutorLocks() and execute q again: > > -- will execute the cached plan > explain analyze execute q; > > Breakpoint will be hit. Continue till return from the function and > stop. Start another backend and execute this: > > -- will go through, because no lock still taken on the partition > create or replace function make_table () returns trigger language > plpgsql as $$ begin create table bar(); return null; end; $$ parallel > unsafe; > create trigger ai_rp2 after insert on rp2 for each row execute > function make_table(); > > Back to the debugger, hit continue to let the plan execute. You > should see this error: > > ERROR: cannot start commands during a parallel operation > CONTEXT: SQL statement "create table bar()" > PL/pgSQL function make_table() line 1 at SQL statement parallel worker > > The attached patch fixes this, > One thing I am a bit worried about this fix is that after this for parallel-mode, we will maintain partitionOids at two places, one in plannedstmt->relationOids and the other in plannedstmt->partitionOids. I guess you have to do that because, in AcquireExecutorLocks, we can't find which relationOids are corresponding to partitionOids, am I right? If so, why not just maintain them in plannedstmt->partitionOids and then make PlanCacheRelCallback consider it? Also, in standard_planner, we should add these partitionOids only for parallel-mode. Thoughts? -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 4, 2021 at 9:03 AM Amit Kapila wrote: > > I think for Update/Delete, we might not do parallel-safety checks by > calling target_rel_max_parallel_hazard_recurse especially because > partitions are handled differently for Updates and Deletes (see > inheritance_planner()). I think what Dilip is telling doesn't sound > unreasonable to me. So, even, if we want to extend it later by making > some checks specific to Inserts/Updates, we can do it at that time. > The comments you have at that place are sufficient to tell that in the > future we can use those checks for Updates as well. They will need > some adjustment if we remove that check but the intent is clear. +1 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 4, 2021 at 7:16 AM Greg Nancarrow wrote: > > On Thu, Mar 4, 2021 at 1:07 AM Dilip Kumar wrote: > > > > On Wed, Mar 3, 2021 at 5:50 PM Greg Nancarrow wrote: > > > > > I agree that assert is only for debug build, but once we add and > > assert that means we are sure that it should only be called for insert > > and if it is called for anything else then it is a programming error > > from the caller's side. So after the assert, adding if check for the > > same condition doesn't look like a good idea. That means we think > > that the code can hit assert in the debug mode so we need an extra > > protection in the release mode. > > > > The if-check isn't there for "extra protection". > It's to help with future changes; inside that if-block is code only > applicable to INSERT (and to UPDATE - sorry, before I said DELETE), as > the code-comment indicates, whereas the rest of the function is > generic to all command types. I don't see any problem with having this > if-block here, to help in this way, when other command types are > added. If that is the case then this check should also be added along with that future patches, I mean when we will allow UPDATE then it makes sense of that check and that time will have to get rid of that assert as well. I mean complete function will be in sync. But now this check looks a bit odd. I think that is my opinion but otherwise, I don't have any strong objection to that check. > > > > > > > 2. > > But the cost_modifytable is setting the number of rows to 0 in > > ModifyTablePath whereas the cost_gather will multiply the rows from > > the GatherPath. I can not see the rows from GatherPath is ever set to > > 0. > > > > OK, I see the problem now. > It works the way I described, but currently there's a problem with > where it's getting the rows for the GatherPath, so there's a > disconnect. > When generating the GatherPaths, it's currently always taking the > rel's (possibly estimated) row-count, rather than using the rows from > the cheapest_partial_path (the subpath: ModifyTablePath). See > generate_gather_paths(). > So when generate_useful_gather_paths() is called from the planner, for > the added partial paths for Parallel INSERT, it should be passing > "true" for the "override_rows" parameter, not "false". > > So I think that in the 0004 patch, the if-block: > > + if (parallel_modify_partial_path_added) > + { > + final_rel->rows = current_rel->rows;/* ??? why > hasn't this been > + > * set above somewhere */ > + generate_useful_gather_paths(root, final_rel, false); > + } > + > > can be reduced to: > > + if (parallel_modify_partial_path_added) > + generate_useful_gather_paths(root, final_rel, true); > + Okay. I will check this part after you provide an updated version. Thanks. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 4, 2021 at 7:16 AM Greg Nancarrow wrote: > > On Thu, Mar 4, 2021 at 1:07 AM Dilip Kumar wrote: > > > > On Wed, Mar 3, 2021 at 5:50 PM Greg Nancarrow wrote: > > > > > > Asserts are normally only enabled in a debug-build, so for a > > > release-build that Assert has no effect. > > > The Assert is being used as a sanity-check that the function is only > > > currently getting called for INSERT, because that's all it currently > > > supports. > > > > I agree that assert is only for debug build, but once we add and > > assert that means we are sure that it should only be called for insert > > and if it is called for anything else then it is a programming error > > from the caller's side. So after the assert, adding if check for the > > same condition doesn't look like a good idea. That means we think > > that the code can hit assert in the debug mode so we need an extra > > protection in the release mode. > > > > The if-check isn't there for "extra protection". > It's to help with future changes; inside that if-block is code only > applicable to INSERT (and to UPDATE - sorry, before I said DELETE), as > the code-comment indicates, whereas the rest of the function is > generic to all command types. I don't see any problem with having this > if-block here, to help in this way, when other command types are > added. > I think for Update/Delete, we might not do parallel-safety checks by calling target_rel_max_parallel_hazard_recurse especially because partitions are handled differently for Updates and Deletes (see inheritance_planner()). I think what Dilip is telling doesn't sound unreasonable to me. So, even, if we want to extend it later by making some checks specific to Inserts/Updates, we can do it at that time. The comments you have at that place are sufficient to tell that in the future we can use those checks for Updates as well. They will need some adjustment if we remove that check but the intent is clear. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Mar 4, 2021 at 1:07 AM Dilip Kumar wrote: > > On Wed, Mar 3, 2021 at 5:50 PM Greg Nancarrow wrote: > > > > Asserts are normally only enabled in a debug-build, so for a > > release-build that Assert has no effect. > > The Assert is being used as a sanity-check that the function is only > > currently getting called for INSERT, because that's all it currently > > supports. > > I agree that assert is only for debug build, but once we add and > assert that means we are sure that it should only be called for insert > and if it is called for anything else then it is a programming error > from the caller's side. So after the assert, adding if check for the > same condition doesn't look like a good idea. That means we think > that the code can hit assert in the debug mode so we need an extra > protection in the release mode. > The if-check isn't there for "extra protection". It's to help with future changes; inside that if-block is code only applicable to INSERT (and to UPDATE - sorry, before I said DELETE), as the code-comment indicates, whereas the rest of the function is generic to all command types. I don't see any problem with having this if-block here, to help in this way, when other command types are added. > > > > > > 2. > > > In patch 0004, We are still charging the parallel_tuple_cost for each > > > tuple, are we planning to do something about this? I mean after this > > > patch tuple will not be transferred through the tuple queue, so we > > > should not add that cost. > > > > > > > I believe that for Parallel INSERT, cost_modifytable() will set > > path->path.rows to 0 (unless there is a RETURNING list), so, for > > example, in cost_gather(), it will not add to the run_cost as > > "run_cost += parallel_tuple_cost * path->path.rows;" > > > > But the cost_modifytable is setting the number of rows to 0 in > ModifyTablePath whereas the cost_gather will multiply the rows from > the GatherPath. I can not see the rows from GatherPath is ever set to > 0. > OK, I see the problem now. It works the way I described, but currently there's a problem with where it's getting the rows for the GatherPath, so there's a disconnect. When generating the GatherPaths, it's currently always taking the rel's (possibly estimated) row-count, rather than using the rows from the cheapest_partial_path (the subpath: ModifyTablePath). See generate_gather_paths(). So when generate_useful_gather_paths() is called from the planner, for the added partial paths for Parallel INSERT, it should be passing "true" for the "override_rows" parameter, not "false". So I think that in the 0004 patch, the if-block: + if (parallel_modify_partial_path_added) + { + final_rel->rows = current_rel->rows;/* ??? why hasn't this been + * set above somewhere */ + generate_useful_gather_paths(root, final_rel, false); + } + can be reduced to: + if (parallel_modify_partial_path_added) + generate_useful_gather_paths(root, final_rel, true); + Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Mar 3, 2021 at 5:50 PM Greg Nancarrow wrote: > > Asserts are normally only enabled in a debug-build, so for a > release-build that Assert has no effect. > The Assert is being used as a sanity-check that the function is only > currently getting called for INSERT, because that's all it currently > supports. I agree that assert is only for debug build, but once we add and assert that means we are sure that it should only be called for insert and if it is called for anything else then it is a programming error from the caller's side. So after the assert, adding if check for the same condition doesn't look like a good idea. That means we think that the code can hit assert in the debug mode so we need an extra protection in the release mode. > > > 2. > > In patch 0004, We are still charging the parallel_tuple_cost for each > > tuple, are we planning to do something about this? I mean after this > > patch tuple will not be transferred through the tuple queue, so we > > should not add that cost. > > > > I believe that for Parallel INSERT, cost_modifytable() will set > path->path.rows to 0 (unless there is a RETURNING list), so, for > example, in cost_gather(), it will not add to the run_cost as > "run_cost += parallel_tuple_cost * path->path.rows;" > But the cost_modifytable is setting the number of rows to 0 in ModifyTablePath whereas the cost_gather will multiply the rows from the GatherPath. I can not see the rows from GatherPath is ever set to 0. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Mar 3, 2021 at 10:45 PM Dilip Kumar wrote: > > On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow wrote: > > > > Posting an updated set of patches that includes Amit Langote's patch > > to the partition tracking scheme... > > (the alternative of adding partitions to the range table needs further > > investigation) > > I was reviewing your latest patch and I have a few comments. > > In patch 0001 > 1. > +static bool > +target_rel_max_parallel_hazard_recurse(Relation rel, > + CmdType command_type, > + max_parallel_hazard_context *context) > +{ > + TupleDesc tupdesc; > + int attnum; > + > + /* Currently only CMD_INSERT is supported */ > + Assert(command_type == CMD_INSERT); > ……. > + /* > + * Column default expressions and check constraints are only applicable to > + * INSERT and UPDATE, but since only parallel INSERT is currently supported, > + * only command_type==CMD_INSERT is checked here. > + */ > + if (command_type == CMD_INSERT) > > If we have an assert at the beginning of the function, then why do we > want to put the if check here? > Asserts are normally only enabled in a debug-build, so for a release-build that Assert has no effect. The Assert is being used as a sanity-check that the function is only currently getting called for INSERT, because that's all it currently supports. Further code below specifically checks for INSERT because the block contains code that is specific to INSERT (and actually DELETE too, but that is not currently supported - code is commented accordingly). In future, this function will support DELETE and UPDATE, and the Assert serves to alert the developer ASAP that it needs updating to support those. > 2. > In patch 0004, We are still charging the parallel_tuple_cost for each > tuple, are we planning to do something about this? I mean after this > patch tuple will not be transferred through the tuple queue, so we > should not add that cost. > I believe that for Parallel INSERT, cost_modifytable() will set path->path.rows to 0 (unless there is a RETURNING list), so, for example, in cost_gather(), it will not add to the run_cost as "run_cost += parallel_tuple_cost * path->path.rows;" Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow wrote: > > Posting an updated set of patches that includes Amit Langote's patch > to the partition tracking scheme... > (the alternative of adding partitions to the range table needs further > investigation) I was reviewing your latest patch and I have a few comments. In patch 0001 1. +static bool +target_rel_max_parallel_hazard_recurse(Relation rel, + CmdType command_type, + max_parallel_hazard_context *context) +{ + TupleDesc tupdesc; + int attnum; + + /* Currently only CMD_INSERT is supported */ + Assert(command_type == CMD_INSERT); ……. + /* + * Column default expressions and check constraints are only applicable to + * INSERT and UPDATE, but since only parallel INSERT is currently supported, + * only command_type==CMD_INSERT is checked here. + */ + if (command_type == CMD_INSERT) If we have an assert at the beginning of the function, then why do we want to put the if check here? 2. In patch 0004, We are still charging the parallel_tuple_cost for each tuple, are we planning to do something about this? I mean after this patch tuple will not be transferred through the tuple queue, so we should not add that cost. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Mar 3, 2021 at 12:52 PM Greg Nancarrow wrote: > > On Tue, Mar 2, 2021 at 11:19 PM Amit Kapila wrote: > > > > On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow wrote: > > > 2. > > /* > > + * Prepare for entering parallel mode by assigning a > > + * FullTransactionId, to be included in the transaction state that is > > + * serialized in the parallel DSM. > > + */ > > + (void) GetCurrentTransactionId(); > > + } > > > > Similar to the previous comment this also seems to indicate that we > > require TransactionId for workers. If that is not the case then this > > comment also needs to be modified. > > > > I'll update to comment to remove the part about the serialization (as > this always happens, not a function of the patch) and mention it is > needed to avoid attempting to assign a FullTransactionId in > parallel-mode. > Okay, but just use TransactionId in comments if there is a need, we still don't use FullTransactionId for the heap. > > 4. > > * Assess whether it's feasible to use parallel mode for this query. We > > * can't do this in a standalone backend, or if the command will try to > > - * modify any data, or if this is a cursor operation, or if GUCs are set > > - * to values that don't permit parallelism, or if parallel-unsafe > > - * functions are present in the query tree. > > + * modify any data using a CTE, or if this is a cursor operation, or if > > + * GUCs are set to values that don't permit parallelism, or if > > + * parallel-unsafe functions are present in the query tree. > > > > This comment change is not required because this is quite similar to > > what we do for CTAS. Your further comment changes in this context are > > sufficient. > > An INSERT modifies data, so according to the original comment, then > it's not feasible to use parallel mode, because the command tries to > modify data ("We can't do this in a standalone backend, or if the > command will try to modify any data ..."). > Except now we need to use parallel-mode for "INSERT with parallel > SELECT", and INSERT is a command that modifies data. > So isn't the comment change actually needed? > If we want to change, we can say "if the command will try to modify any data except for inserts ..." or something like that but saying only about CTE is not correct because then what about updates and deletes. > > > > > 5. > > + (IsModifySupportedInParallelMode(parse->commandType) && > > + is_parallel_possible_for_modify(parse))) && > > > > I think it would be better if we move the check > > IsModifySupportedInParallelMode inside > > is_parallel_possible_for_modify. > > The reason it is done outside of is_parallel_possible_for_modify() is > to avoid the function overhead of calling > is_parallel_possible_for_modify() for SELECT statements, only to > return without doing anything. Note also that > IsModifySupportedInParallelMode() is an inline function. > I don't see any reason to be worried about that here. It is more important for code and checks to look simpler. > >Also, it might be better to name this > > function as is_parallel_allowed_for_modify. > > I do tend to think that in this case "possible" is better than "allowed". > Only the "parallel_dml" GUC test is checking for something that is "allowed". > The other two checks are checking for things that determine whether > parallel-mode is even "possible". > I think I don't like this proposed name for the function. See, if you can think of something better. > > > > 6. > > @@ -260,6 +265,21 @@ set_plan_references(PlannerInfo *root, Plan *plan) > > */ > > add_rtes_to_flat_rtable(root, false); > > > > + /* > > + * If modifying a partitioned table, add its parallel-safety-checked > > + * partitions too to glob->relationOids, to register them as plan > > + * dependencies. This is only really needed in the case of a parallel > > + * plan, so that if parallel-unsafe properties are subsequently defined > > + * on the partitions, the cached parallel plan will be invalidated and > > + * a non-parallel plan will be generated. > > + */ > > + if (IsModifySupportedInParallelMode(root->parse->commandType)) > > + { > > + if (glob->partitionOids != NIL && glob->parallelModeNeeded) > > + glob->relationOids = > > + list_concat(glob->relationOids, glob->partitionOids); > > + } > > + > > > > Isn't it possible to add these partitionOids in set_plan_refs with the > > T_Gather(Merge) node h
Re: Parallel INSERT (INTO ... SELECT ...)
On Tue, Mar 2, 2021 at 11:19 PM Amit Kapila wrote: > > On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow wrote: > > > > Posting an updated set of patches that includes Amit Langote's patch > > to the partition tracking scheme... > > > > Few comments: > == > 1. > "Prior to entering parallel-mode for execution of INSERT with parallel SELECT, > a TransactionId is acquired and assigned to the current transaction state > which > is then serialized in the parallel DSM for the parallel workers to use." > > This point in the commit message seems a bit misleading to me because > IIUC we need to use transaction id only in the master backend for the > purpose of this patch. And, we are doing this at an early stage > because we don't allow to allocate it in parallel-mode. I think here > we should mention in some way that this has a disadvantage that if the > underneath select doesn't return any row then this transaction id > allocation would not be of use. However, that shouldn't happen in > practice in many cases. But, if I am missing something and this is > required by parallel workers then we can ignore what I said. > I'll remove the part that says "which is then serialized ...", as this may imply that the patch is doing this (which of course it is not, Postgres always serializes the transaction state in the parallel DSM for the parallel workers to use). The acquiring of the TransactionId up-front, prior to entering parallel-mode, is absolutely required because heap_insert() lazily tries to get the current transaction-id and attempts to assign one if the current transaction state hasn't got one, and this assignment is not allowed in parallel-mode ("ERROR: cannot assign XIDs during a parallel operation"), and this mode is required for use of parallel SELECT. I'll add the part about the disadvantage of the transaction-id not being used if the underlying SELECT doesn't return any rows. > 2. > /* > + * Prepare for entering parallel mode by assigning a > + * FullTransactionId, to be included in the transaction state that is > + * serialized in the parallel DSM. > + */ > + (void) GetCurrentTransactionId(); > + } > > Similar to the previous comment this also seems to indicate that we > require TransactionId for workers. If that is not the case then this > comment also needs to be modified. > I'll update to comment to remove the part about the serialization (as this always happens, not a function of the patch) and mention it is needed to avoid attempting to assign a FullTransactionId in parallel-mode. > 3. > static int common_prefix_cmp(const void *a, const void *b); > > - > > /* > > Spurious line removal. > I will reinstate that empty line. > 4. > * Assess whether it's feasible to use parallel mode for this query. We > * can't do this in a standalone backend, or if the command will try to > - * modify any data, or if this is a cursor operation, or if GUCs are set > - * to values that don't permit parallelism, or if parallel-unsafe > - * functions are present in the query tree. > + * modify any data using a CTE, or if this is a cursor operation, or if > + * GUCs are set to values that don't permit parallelism, or if > + * parallel-unsafe functions are present in the query tree. > > This comment change is not required because this is quite similar to > what we do for CTAS. Your further comment changes in this context are > sufficient. An INSERT modifies data, so according to the original comment, then it's not feasible to use parallel mode, because the command tries to modify data ("We can't do this in a standalone backend, or if the command will try to modify any data ..."). Except now we need to use parallel-mode for "INSERT with parallel SELECT", and INSERT is a command that modifies data. So isn't the comment change actually needed? > > 5. > + (IsModifySupportedInParallelMode(parse->commandType) && > + is_parallel_possible_for_modify(parse))) && > > I think it would be better if we move the check > IsModifySupportedInParallelMode inside > is_parallel_possible_for_modify. The reason it is done outside of is_parallel_possible_for_modify() is to avoid the function overhead of calling is_parallel_possible_for_modify() for SELECT statements, only to return without doing anything. Note also that IsModifySupportedInParallelMode() is an inline function. >Also, it might be better to name this > function as is_parallel_allowed_for_modify. I do tend to think that in this case "possible" is better than "allowed". Only the "parallel_dml" GUC test is checking for something that is "allowed". The other t
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow wrote: > > Posting an updated set of patches that includes Amit Langote's patch > to the partition tracking scheme... > Few comments: == 1. "Prior to entering parallel-mode for execution of INSERT with parallel SELECT, a TransactionId is acquired and assigned to the current transaction state which is then serialized in the parallel DSM for the parallel workers to use." This point in the commit message seems a bit misleading to me because IIUC we need to use transaction id only in the master backend for the purpose of this patch. And, we are doing this at an early stage because we don't allow to allocate it in parallel-mode. I think here we should mention in some way that this has a disadvantage that if the underneath select doesn't return any row then this transaction id allocation would not be of use. However, that shouldn't happen in practice in many cases. But, if I am missing something and this is required by parallel workers then we can ignore what I said. 2. /* + * Prepare for entering parallel mode by assigning a + * FullTransactionId, to be included in the transaction state that is + * serialized in the parallel DSM. + */ + (void) GetCurrentTransactionId(); + } Similar to the previous comment this also seems to indicate that we require TransactionId for workers. If that is not the case then this comment also needs to be modified. 3. static int common_prefix_cmp(const void *a, const void *b); - /* Spurious line removal. 4. * Assess whether it's feasible to use parallel mode for this query. We * can't do this in a standalone backend, or if the command will try to - * modify any data, or if this is a cursor operation, or if GUCs are set - * to values that don't permit parallelism, or if parallel-unsafe - * functions are present in the query tree. + * modify any data using a CTE, or if this is a cursor operation, or if + * GUCs are set to values that don't permit parallelism, or if + * parallel-unsafe functions are present in the query tree. This comment change is not required because this is quite similar to what we do for CTAS. Your further comment changes in this context are sufficient. 5. + (IsModifySupportedInParallelMode(parse->commandType) && + is_parallel_possible_for_modify(parse))) && I think it would be better if we move the check IsModifySupportedInParallelMode inside is_parallel_possible_for_modify. Also, it might be better to name this function as is_parallel_allowed_for_modify. 6. @@ -260,6 +265,21 @@ set_plan_references(PlannerInfo *root, Plan *plan) */ add_rtes_to_flat_rtable(root, false); + /* + * If modifying a partitioned table, add its parallel-safety-checked + * partitions too to glob->relationOids, to register them as plan + * dependencies. This is only really needed in the case of a parallel + * plan, so that if parallel-unsafe properties are subsequently defined + * on the partitions, the cached parallel plan will be invalidated and + * a non-parallel plan will be generated. + */ + if (IsModifySupportedInParallelMode(root->parse->commandType)) + { + if (glob->partitionOids != NIL && glob->parallelModeNeeded) + glob->relationOids = + list_concat(glob->relationOids, glob->partitionOids); + } + Isn't it possible to add these partitionOids in set_plan_refs with the T_Gather(Merge) node handling? That looks like a more natural place to me, if that is feasible then we don't need parallelModeNeeded check and maybe we don't need to even check IsModifySupportedInParallelMode but I am less sure of the second check requirement. 7. +#include "access/table.h" +#include "access/xact.h" #include "access/transam.h" +#include "catalog/pg_class.h" #include "catalog/pg_type.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" @@ -24,6 +27,8 @@ #include "optimizer/planmain.h" #include "optimizer/planner.h" #include "optimizer/tlist.h" +#include "parser/parsetree.h" +#include "partitioning/partdesc.h" I think apart from xact.h, we don't need new additional includes. 8. I see that in function target_rel_max_parallel_hazard, we don't release the lock on the target table after checking parallel-safety but then in function target_rel_max_parallel_hazard_recurse, we do release the lock on partition tables after checking their parallel-safety. Can we add some comments to explain these two cases? 9. I noticed that the tests added for the first patch in v18-0002-Parallel-SELECT-for-INSERT-INTO-.-SELECT-tests-and-doc take even more time than select_parallel. I think we should see if we can reduce the timing of this test. I haven't studied it in detail but maybe some Inserts execution can be avoided. In some cases like below just checking th
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Mar 1, 2021 at 12:38 PM Greg Nancarrow wrote: > On Fri, Feb 26, 2021 at 5:50 PM Amit Langote wrote: > > > > On Fri, Feb 26, 2021 at 3:35 PM Greg Nancarrow wrote: > > > On Fri, Feb 26, 2021 at 4:07 PM Amit Langote > > > wrote: > > > > The attached patch fixes this, although I am starting to have second > > > > thoughts about how we're tracking partitions in this patch. Wondering > > > > if we should bite the bullet and add partitions into the main range > > > > table instead of tracking them separately in partitionOids, which > > > > might result in a cleaner patch overall. > > > > > > Thanks Amit, > > > > > > I was able to reproduce the problem using your instructions (though I > > > found I had to run that explain an extra time, in order to hit the > > > breakpoint). > > > Also, I can confirm that the problem doesn't occur after application > > > of your patch. > > > I'll leave it to your better judgement as to what to do next - if you > > > feel the current tracking method is not sufficient > > > > Just to be clear, I think the tracking method added by the patch is > > sufficient AFAICS for the problems we were able to discover. The > > concern I was trying to express is that we seem to be duct-taping > > holes in our earlier chosen design to track partitions separately from > > the range table. If we had decided to add partitions to the range > > table as "extra" target relations from the get-go, both the issues I > > mentioned with cached plans -- partitions not being counted as a > > dependency and partitions not being locked before execution -- would > > have been prevented. I haven't fully grasped how invasive that design > > would be, but it sure sounds like it would be a bit more robust. > > Posting an updated set of patches that includes Amit Langote's patch > to the partition tracking scheme... Thanks Greg. > (the alternative of adding partitions to the range table needs further > investigation) I looked at this today with an intention to write and post a PoC patch, but was quickly faced with the task of integrating that design with how ModifyTable works for inserts. Specifically if, in addition to adding partitions to the range table, we also their RT indexes to ModifyTable.resultRelations, then maybe we'll need to rethink our executor tuple routing code. That code currently tracks the insert's target partitions separately from the range table, exactly because they are not there to begin with. So if we change the latter as in this hypothetical PoC, maybe we should also revisit the former. Also, it would not be great that the planner's arrays in PlannerInfo would get longer as a result of Query.rtable getting longer by adding partitions, thus making all the loops over those arrays slower for no benefit. So, let's do this in a follow-on patch, if at all, instead of considering this topic a blocker for committing the current set of patches. -- Amit Langote EDB: http://www.enterprisedb.com
RE: Parallel INSERT (INTO ... SELECT ...)
From: Tsunakawa, Takayuki/綱川 貴之 >the current patch showd nice performance improvement in some (many?) patterns. > >So, I think it can be committed in PG 14, when it has addressed the plan cache >issue that Amit Langote-san posed. Agreed. I summarized my test results for the current patch(V18) in the attached file(Please use VSCode or Notepad++ to open it, the context is a bit long). As you can see, the performance is good in many patterns(Please refer to my test script, test NO in text is correspond to number in sql file). If you have any question on my test, please feel free to ask. Regards, Tang max_parallel_workers_per_gather=2 - Test NO | test case | query plan | patched(ms) | master(ms) | %reg -- 1 | Test INSERT with underlying query. | parallel INSERT+SELECT | 20894.069 | 27030.623 | -23% 2 | Test INSERT into a table with a foreign key. | parallel SELECT | 80921.960 | 84416.775 | -4% 3 | Test INSERT with ON CONFLICT ... DO UPDATE | non-parallel| 28111.186 | 29531.922 | -5% 4 | Test INSERT with parallel_leader_participation = off; | parallel INSERT+SELECT | 2799.354| 5131.874| -45% 5 | Test INSERT with max_parallel_workers=0; | non-parallel| 27006.385 | 26962.279 | 0% 6 | Test INSERT with parallelized aggregate | parallel SELECT | 95482.307 | 105090.301 | -9% 7 | Test INSERT with parallel bitmap heap scan | parallel INSERT+SELECT | 606.664 | 844.471 | -28% 8 | Test INSERT with parallel append | parallel INSERT+SELECT | 109.896 | 239.198 | -54% 9 | Test INSERT with parallel index scan | parallel INSERT+SELECT | 552.481 | 1238.079| -55% 10 | Test INSERT with parallel-safe index expression | parallel INSERT+SELECT | 1453.686| 2908.489 | -50% 11 | Test INSERT with parallel-unsafe index expression | non-parallel| 2828.559| 2837.570| 0% 12 | Test INSERT with underlying query - and RETURNING (no projection) | parallel INSERT+SELECT | 417.493 | 685.430 | -39% 13 | Test INSERT with underlying ordered query - and RETURNING (no projection) | parallel SELECT | 971.095 | 1717.502| -43% 14 | Test INSERT with underlying ordered query - and RETURNING (with projection) | parallel SELECT |
RE: Parallel INSERT (INTO ... SELECT ...)
From: Hou, Zhijie/侯 志杰 > After doing some more tests on it (performance degradation will not happen > when source table is out of order). > I think we can say the performance degradation is related to the order of the > data in source table. ... > So, the order of data 's influence seems a normal phenomenon, I think it seems > we do not need to do anything about it (currently). > It seems better to mark it as todo which we can improve this in the future. > > (Since the performance degradation in parallel bitmap is because of the lock > in > _bt_search, It will not always happen when the target table already have data, > less race condition will happened when parallel insert into a evenly > distributed > btree). I think so, too. The slowness of parallel insert operation due to index page contention, and index bloat, would occur depending on the order of the index key values of source records. I guess other DBMSs exhibit similar phenomenon, but I couldn't find such description in the manual, whitepapers, or several books on Oracle. One relevant excerpt is the following. This is about parallel index build. Oracle tries to minimize page contention and index bloat. This is completely my guess, but they may do similar things in parallel INSERT SELECT, because Oracle holds an exclusive lock on the target table. SQL Server also acquires an exclusive lock. Maybe we can provide an option to do so in the future. https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-tips.html#GUID-08A08783-C243-4872-AFFA-56B603F1F0F5 -- Optimizing Performance by Creating Indexes in Parallel ... Multiple processes can work simultaneously to create an index. By dividing the work necessary to create an index among multiple server processes, Oracle Database can create the index more quickly than if a single server process created the index serially. Parallel index creation works in much the same way as a table scan with an ORDER BY clause. The table is randomly sampled and a set of index keys is found that equally divides the index into the same number of pieces as the DOP. A first set of query processes scans the table, extracts key-rowid pairs, and sends each pair to a process in a second set of query processes based on a key. Each process in the second set sorts the keys and builds an index in the usual fashion. After all index pieces are built, the parallel execution coordinator simply concatenates the pieces (which are ordered) to form the final index. ... When creating an index in parallel, the STORAGE clause refers to the storage of each of the subindexes created by the query server processes. Therefore, an index created with an INITIAL value of 5 MB and a DOP of 12 consumes at least 60 MB of storage during index creation because each process starts with an extent of 5 MB. When the query coordinator process combines the sorted subindexes, some extents might be trimmed, and the resulting index might be smaller than the requested 60 MB. -- IIRC, the current patch showd nice performance improvement in some (many?) patterns. So, I think it can be committed in PG 14, when it has addressed the plan cache issue that Amit Langote-san posed. I remember the following issues/comments are pending, but they are not blockers: 1. Insert operation is run serially when the target table has a foreign key, sequence or identity column. This can be added later based on the current design without requiring rework. That is, the current patch leaves no debt. (Personally, foreign key and sequence support will also be wanted in PG 14. We may try them in the last CF once the current patch is likely to be committable.) 2. There's a plausible reason for the performance variation and index bloat with the bitmap scan case. Ideally, we want to come up with a solution that can be incorporated in PG 15. Or, it may be one conclusion that we can't prevent performance degradation in all cases. That may be one hidden reason why Oracle and SQL Server doesn't enable parallel DML by default. We can advise the user in the manual that parallel DML is not always faster than serial operation so he should test performance by enabling and disabling parallel DML. Also, maybe we should honestly state that indexes can get a bit bigger after parallel insert than after serial insert, and advise the user to do REINDEX CONCURRENTLY if necessary. 3. The total time of parallel execution can get longer because of unbalanced work distribution among parallel workers. This seems to be an existing problem, so we can pursue the improvement later, hopefully before the release of PG 14. Does anyone see any problem with committing the current patch (after polishing it)? Regards Takayuki Tsunakawa
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Feb 26, 2021 at 3:35 PM Greg Nancarrow wrote: > On Fri, Feb 26, 2021 at 4:07 PM Amit Langote wrote: > > The attached patch fixes this, although I am starting to have second > > thoughts about how we're tracking partitions in this patch. Wondering > > if we should bite the bullet and add partitions into the main range > > table instead of tracking them separately in partitionOids, which > > might result in a cleaner patch overall. > > Thanks Amit, > > I was able to reproduce the problem using your instructions (though I > found I had to run that explain an extra time, in order to hit the > breakpoint). > Also, I can confirm that the problem doesn't occur after application > of your patch. > I'll leave it to your better judgement as to what to do next - if you > feel the current tracking method is not sufficient Just to be clear, I think the tracking method added by the patch is sufficient AFAICS for the problems we were able to discover. The concern I was trying to express is that we seem to be duct-taping holes in our earlier chosen design to track partitions separately from the range table. If we had decided to add partitions to the range table as "extra" target relations from the get-go, both the issues I mentioned with cached plans -- partitions not being counted as a dependency and partitions not being locked before execution -- would have been prevented. I haven't fully grasped how invasive that design would be, but it sure sounds like it would be a bit more robust. -- Amit Langote EDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Feb 26, 2021 at 4:07 PM Amit Langote wrote: > > Hi Greg, > > Replying to an earlier email in the thread because I think I found a > problem relevant to the topic that was brought up. > > On Wed, Feb 17, 2021 at 10:34 PM Amit Langote wrote: > > On Wed, Feb 17, 2021 at 10:44 AM Greg Nancarrow wrote: > > > Is the use of "table_close(rel, NoLock)'' intentional? That will keep > > > the lock (lockmode) until end-of-transaction. > > > > I think we always keep any locks on relations that are involved in a > > plan until end-of-transaction. What if a partition is changed in an > > unsafe manner between being considered safe for parallel insertion and > > actually performing the parallel insert? > > I realized that there is a race condition that will allow a concurrent > backend to invalidate a parallel plan (for insert into a partitioned > table) at a point in time when it's too late for plancache.c to detect > it. It has to do with how plancache.c locks the relations affected by > a cached query and its cached plan. Specifically, > AcquireExecutorLocks(), which locks the relations that need to be > locked before the plan could be considered safe to execute, does not > notice the partitions that would have been checked for parallel safety > when the plan was made. Given that AcquireExecutorLocks() only loops > over PlannedStmt.rtable and locks exactly the relations found there, > partitions are not locked. This means that a concurrent backend can, > for example, add an unsafe trigger to a partition before a parallel > worker inserts into it only to fail when it does. > > Steps to reproduce (tried with v19 set of patches): > > drop table if exists rp, foo; > create table rp (a int) partition by range (a); > create table rp1 partition of rp for values from (minvalue) to (0); > create table rp2 partition of rp for values from (0) to (maxvalue); > create table foo (a) as select generate_series(1, 100); > set plan_cache_mode to force_generic_plan; > set enable_parallel_dml to on; > deallocate all; > prepare q as insert into rp select * from foo where a%2 = 0; > explain analyze execute q; > > At this point, attach a debugger to this backend and set a breakpoint > in AcquireExecutorLocks() and execute q again: > > -- will execute the cached plan > explain analyze execute q; > > Breakpoint will be hit. Continue till return from the function and > stop. Start another backend and execute this: > > -- will go through, because no lock still taken on the partition > create or replace function make_table () returns trigger language > plpgsql as $$ begin create table bar(); return null; end; $$ parallel > unsafe; > create trigger ai_rp2 after insert on rp2 for each row execute > function make_table(); > > Back to the debugger, hit continue to let the plan execute. You > should see this error: > > ERROR: cannot start commands during a parallel operation > CONTEXT: SQL statement "create table bar()" > PL/pgSQL function make_table() line 1 at SQL statement parallel worker > > The attached patch fixes this, although I am starting to have second > thoughts about how we're tracking partitions in this patch. Wondering > if we should bite the bullet and add partitions into the main range > table instead of tracking them separately in partitionOids, which > might result in a cleaner patch overall. > Thanks Amit, I was able to reproduce the problem using your instructions (though I found I had to run that explain an extra time, in order to hit the breakpoint). Also, I can confirm that the problem doesn't occur after application of your patch. I'll leave it to your better judgement as to what to do next - if you feel the current tracking method is not sufficient Regards, Greg Nancarrow Fujitsu Australia
RE: Parallel INSERT (INTO ... SELECT ...)
> I add some code to track the time spent in index operation. > From the results[1], we can see more workers will bring more cost in > _bt_search_insert() in each worker. > After debugged, the most cost part is the following: > - > /* drop the read lock on the page, then acquire one on its child > */ > *bufP = _bt_relandgetbuf(rel, *bufP, child, page_access); > - > > It seems the order of parallel bitmap scan's result will result in more lock > time in > parallel insert. > [1]---parallel bitmap scan-- worker 0: > psql:test.sql:10: INFO: insert index _bt_search_insert time:834735 > psql:test.sql:10: INFO: insert index total time:1895330 > psql:test.sql:10: INFO: insert tuple time:628064 > > worker 2: > psql:test.sql:10: INFO: insert index _bt_search_insert time:1552242 > psql:test.sql:10: INFO: insert index total time:2374741 > psql:test.sql:10: INFO: insert tuple time:314571 > > worker 4: > psql:test.sql:10: INFO: insert index _bt_search_insert time:2496424 > psql:test.sql:10: INFO: insert index total time:3016150 > psql:test.sql:10: INFO: insert tuple time:211741 > > > Based on above, I tried to change the order of results that bitmapscan return. > In the original test, we prepare data in order (like: > generate_series(1,1,1)), If > we change the order we insert the data in the source table, the performance > degradation will not always happen[2]. > And table size difference will be small. > > ---out of order source table- > insert into x(a,b,c) select i,i+1,i+2 from generate_series(1,6) as > t(i) > order by random(); > > > Test results when source table out of order(using bitmap heap scan): > [2] > Worker 0: > Execution Time: 37028.006 ms > Worker 2: > Execution Time: 11355.153 ms > Worker 4: > Execution Time: 9273.398 ms > > > So, this performance degradation issue seems related on the order of the data > in the source table. > It does not always happen. Do we need to do some specific fix for it ? After doing some more tests on it (performance degradation will not happen when source table is out of order). I think we can say the performance degradation is related to the order of the data in source table. Also , In master branch, I found the order of data in source table will not affect the planner when generating plan(for select part). As we can see from [1][2], If source table's data is in order, when set max_parallel_workers_per_gather to 4, the planner will choose parallel seqscan here but it is actually slower than serial bitmapscan. If data in source table is out of order, the performance degradation will not happen again[3][4]. So, the order of data 's influence seems a normal phenomenon, I think it seems we do not need to do anything about it (currently). It seems better to mark it as todo which we can improve this in the future. (Since the performance degradation in parallel bitmap is because of the lock in _bt_search, It will not always happen when the target table already have data, less race condition will happened when parallel insert into a evenly distributed btree). Test result with sql: "postgres=# explain analyze verbose select a from x where a<8 or (a%2=0 and a>19900);" [1]---Source table data in order and max_parallel_workers_per_gather = 0--- Bitmap Heap Scan on public.x (cost=22999.40..4991850.30 rows=91002 width=4) (actual time=45.445..201.322 rows=57 loops=1) Output: a Recheck Cond: ((x.a < 8) OR (x.a > 19900)) Filter: ((x.a < 8) OR (((x.a % 2) = 0) AND (x.a > 19900))) Rows Removed by Filter: 50 Heap Blocks: exact=5840 -> BitmapOr (cost=22999.40..22999.40 rows=1242768 width=0) (actual time=44.622..44.637 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1575.70 rows=85217 width=0) (actual time=3.622..3.634 rows=7 loops=1) Index Cond: (x.a < 8) -> Bitmap Index Scan on x_a_idx (cost=0.00..21378.20 rows=1157551 width=0) (actual time=40.998..40.998 rows=100 loops=1) Index Cond: (x.a > 19900) Planning Time: 0.199 ms Execution Time: 232.327 ms [2]---Source table data in order and max_parallel_workers_per_gather = 4--- Gather (cost=1000.00..2091183.08 rows=91002 width=4) (actual time=0.594..4216.197 rows=57 loops=1) Output: a Workers Planned: 4 Workers Launched: 4 -> Parallel Seq Scan on public.x (cost=0.00..2081082.88 rows=22750 width=4) (actual time=0.087..4197.570 rows=116000 loops=5) Output: a Filter: ((x.a < 8) OR (((x.a % 2) = 0) AND (x.a > 19900))) Rows Removed by Filter: 39884000
Re: Parallel INSERT (INTO ... SELECT ...)
Hi Greg, Replying to an earlier email in the thread because I think I found a problem relevant to the topic that was brought up. On Wed, Feb 17, 2021 at 10:34 PM Amit Langote wrote: > On Wed, Feb 17, 2021 at 10:44 AM Greg Nancarrow wrote: > > Is the use of "table_close(rel, NoLock)'' intentional? That will keep > > the lock (lockmode) until end-of-transaction. > > I think we always keep any locks on relations that are involved in a > plan until end-of-transaction. What if a partition is changed in an > unsafe manner between being considered safe for parallel insertion and > actually performing the parallel insert? I realized that there is a race condition that will allow a concurrent backend to invalidate a parallel plan (for insert into a partitioned table) at a point in time when it's too late for plancache.c to detect it. It has to do with how plancache.c locks the relations affected by a cached query and its cached plan. Specifically, AcquireExecutorLocks(), which locks the relations that need to be locked before the plan could be considered safe to execute, does not notice the partitions that would have been checked for parallel safety when the plan was made. Given that AcquireExecutorLocks() only loops over PlannedStmt.rtable and locks exactly the relations found there, partitions are not locked. This means that a concurrent backend can, for example, add an unsafe trigger to a partition before a parallel worker inserts into it only to fail when it does. Steps to reproduce (tried with v19 set of patches): drop table if exists rp, foo; create table rp (a int) partition by range (a); create table rp1 partition of rp for values from (minvalue) to (0); create table rp2 partition of rp for values from (0) to (maxvalue); create table foo (a) as select generate_series(1, 100); set plan_cache_mode to force_generic_plan; set enable_parallel_dml to on; deallocate all; prepare q as insert into rp select * from foo where a%2 = 0; explain analyze execute q; At this point, attach a debugger to this backend and set a breakpoint in AcquireExecutorLocks() and execute q again: -- will execute the cached plan explain analyze execute q; Breakpoint will be hit. Continue till return from the function and stop. Start another backend and execute this: -- will go through, because no lock still taken on the partition create or replace function make_table () returns trigger language plpgsql as $$ begin create table bar(); return null; end; $$ parallel unsafe; create trigger ai_rp2 after insert on rp2 for each row execute function make_table(); Back to the debugger, hit continue to let the plan execute. You should see this error: ERROR: cannot start commands during a parallel operation CONTEXT: SQL statement "create table bar()" PL/pgSQL function make_table() line 1 at SQL statement parallel worker The attached patch fixes this, although I am starting to have second thoughts about how we're tracking partitions in this patch. Wondering if we should bite the bullet and add partitions into the main range table instead of tracking them separately in partitionOids, which might result in a cleaner patch overall. -- Amit Langote EDB: http://www.enterprisedb.com AcquireExecutorLocks-partition-insert.patch Description: Binary data
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Feb 24, 2021 at 6:03 PM Amit Kapila wrote: > On Wed, Feb 24, 2021 at 2:14 PM Greg Nancarrow wrote: > > On Wed, Feb 24, 2021 at 3:12 PM Amit Kapila wrote: > > > On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow > > > wrote: > > > > On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila > > > > wrote: > > > > > > But the non-parallel plan was chosen (instead of a parallel plan) > > > > > > because of parallel-safety checks on the partitions, which found > > > > > > attributes of the partitions which weren't parallel-safe. > > > > > > So it's not so clear to me that the dependency doesn't exist - the > > > > > > non-parallel plan does in fact depend on the state of the > > > > > > partitions. > > > > > > > > > > Hmm, I think that is not what we can consider as a dependency. > > > > > > > > Then if it's not a dependency, then we shouldn't have to check the > > > > attributes of the partitions for parallel-safety, to determine whether > > > > we must use a non-parallel plan (or can use a parallel plan). > > > > Except, of course, we do have to ... > > > > > > I don't think the plan-dependency and checking for parallel-safety are > > > directly related. > > > > That is certainly not my understanding. Why do you think that they are > > not directly related? > > This whole issue came about because Amit L pointed out that there is a > > need to add partition OIDs as plan-dependencies BECAUSE the checking > > for parallel-safety and plan-dependency are related - since now, for > > Parallel INSERT, we're executing extra parallel-safety checks that > > check partition properties, so the resultant plan is dependent on the > > partitions and their properties. > > He has pointed out an issue when the plan is parallel and you can see > in that example that it fails if we didn't invalidate it. For > non-parallel plans, there won't be any such issue. Yes. I checked around a bit (code and -hackers archive [1]) and came away with the impression that there do not appear to be any set rules for deciding which object changes to send an invalidation message for (sending side: ddl, vacuum/analyze) and which items of a Query or a Plan to track changes for (receiving side: planner, plancache). One could say the foremost rule is to avoid broken cached plans and only in some really obvious cases do the thing that produces a better plan [2]. While no compromises can be made for the former, whether or not to go for the latter probably involves some cost-benefit analysis, something we can probably revisit. I don't think we're compromising by not adding the partition OIDs when the insert plan is not parallel, but the benefits of adding them in all cases are not so clear cut that maybe it's not worth it. -- Amit Langote EDB: http://www.enterprisedb.com [1] Plan invalidation design: https://www.postgresql.org/message-id/flat/20244.1171734513%40sss.pgh.pa.us [2] ...contradicts what I said before, but I found this comment in DefineIndex(): /* * The pg_index update will cause backends (including this one) to update * relcache entries for the index itself, but we should also send a * relcache inval on the parent table to force replanning of cached plans. * Otherwise existing sessions might fail to use the new index where it * would be useful. (Note that our earlier commits did not create reasons * to replan; so relcache flush on the index itself was sufficient.) */ CacheInvalidateRelcacheByRelid(heaprelid.relId); So this invalidates any plans referencing the index's parent relation to trigger replanning so as to take the index into account. The old plans would not really be "unrunnable" without the index though.
RE: Parallel INSERT (INTO ... SELECT ...)
> > It is quite possible what you are saying is correct but I feel that is > > not this patch's fault. So, won't it better to discuss this in a > > separate thread? > > > > Good use case but again, I think this can be done as a separate patch. > > Agreed. > I think even the current patch offers great benefits and can be committed in > PG > 14, even if all my four feedback comments are left unaddressed. I just > touched > on them for completeness in terms of typically expected use cases. They will > probably be able to be implemented along the current design. > > > > > I think here you are talking about the third patch (Parallel Inserts). > > I guess if one has run inserts parallelly from psql then also similar > > behavior would have been observed. For tables, it might lead to better > > results once we have the patch discussed at [1]. Actually, this needs > > more investigation. > > > > [1] - > > > https://www.postgresql.org/message-id/20200508072545.GA9701%40telsas > o > > ft.com > > That looks interesting and worth a try. Hi, I test the bitmapscan with both multi-insert patch and parallel insert patch applied. But the performance degradation and table size increased still happened in my machine. To better analyze this issue, I did some more research on it (only applied parallel insert patch) I add some code to track the time spent in index operation. From the results[1], we can see more workers will bring more cost in _bt_search_insert() in each worker. After debugged, the most cost part is the following: - /* drop the read lock on the page, then acquire one on its child */ *bufP = _bt_relandgetbuf(rel, *bufP, child, page_access); - It seems the order of parallel bitmap scan's result will result in more lock time in parallel insert. [1]---parallel bitmap scan-- worker 0: psql:test.sql:10: INFO: insert index _bt_search_insert time:834735 psql:test.sql:10: INFO: insert index total time:1895330 psql:test.sql:10: INFO: insert tuple time:628064 worker 2: psql:test.sql:10: INFO: insert index _bt_search_insert time:1552242 psql:test.sql:10: INFO: insert index total time:2374741 psql:test.sql:10: INFO: insert tuple time:314571 worker 4: psql:test.sql:10: INFO: insert index _bt_search_insert time:2496424 psql:test.sql:10: INFO: insert index total time:3016150 psql:test.sql:10: INFO: insert tuple time:211741 Based on above, I tried to change the order of results that bitmapscan return. In the original test, we prepare data in order (like: generate_series(1,1,1)), If we change the order we insert the data in the source table, the performance degradation will not always happen[2]. And table size difference will be small. ---out of order source table- insert into x(a,b,c) select i,i+1,i+2 from generate_series(1,6) as t(i) order by random(); Test results when source table out of order(using bitmap heap scan): [2] Worker 0: Execution Time: 37028.006 ms Worker 2: Execution Time: 11355.153 ms Worker 4: Execution Time: 9273.398 ms So, this performance degradation issue seems related on the order of the data in the source table. It does not always happen. Do we need to do some specific fix for it ? For multi-insert, I guess the reason why it does not solve the performance problem is that we do not actually have a api for multi-index insert, Like the api for tableam rd_tableam->multi_insert(), so we still execute ExecInsertIndexTuples in a loop for the multi index insert. I plan to do some more test for multi-insert and parallel insert with out of order source table. Best regards, houzj
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Feb 25, 2021 at 12:19 AM Amit Kapila wrote: > > On Wed, Feb 24, 2021 at 6:21 PM Greg Nancarrow wrote: > > > > On Wed, Feb 24, 2021 at 10:38 PM Amit Kapila > > wrote: > > > > > > > > Thanks, I'll try it. > > > > I did, however, notice a few concerns with your suggested alternative > > > > fix: > > > > - It is not restricted to INSERT (as current fix is). > > > > > > > > > > So what? The Select also has a similar problem. > > > > > > > Yes, but you're potentially now adding overhead to every > > SELECT/UPDATE/DELETE with a subquery, by the added recursive checking > > and walking done by the new call to max_parallel_hazard_walker().and > > code block looking for a modifying CTE > > > > Can you please share an example where it has added an overhead? > > > And anyway I'm not sure it's really right putting in a fix for SELECT > > with a modifying CTE, into a patch that adds parallel INSERT > > functionality - in any case you'd need to really spell this out in > > code comments, as this is at best a temporary fix that would need to > > be removed whenever the query rewriter is fixed to set hasModifyingCTE > > correctly. > > > > > > - It does not set parse->hasModifyingCTE (as current fix does), so the > > > > return value (PlannedStmt) from standard_planner() won't have > > > > hasModifyingCTE set correctly in the cases where the rewriter doesn't > > > > set it correctly (and I'm not sure what strange side effects ??? that > > > > might have). > > > > > > Here end goal is not to set hasModifyingCTE but do let me know if you > > > see any problem or impact. > > > > > > > parse->hasModifyingCTE is not just used in the shortcut-test for > > parallel-safety, its value is subsequently copied into the PlannedStmt > > returned by standard_planner. > > It's inconsistent to leave hasModifyingCTE FALSE when by the fix it > > has found a modifying CTE. > > Even if no existing tests detect an issue with this, PlannedStmt is > > left with a bad hasModifyingCTE value in this case, so there is the > > potential for something to go wrong. > > > > > > - Although the invocation of max_parallel_hazard_walker() on a RTE > > > > subquery will "work" in finally locating your fix's added > > > > "CommonTableExpr" parallel-safety disabling block for commandType != > > > > CMD_SELECT, it looks like it potentially results in checking and > > > > walking over a lot of other stuff within the subquery not related to > > > > CTEs. The current fix does a more specific and efficient search for a > > > > modifying CTE. > > > > > > > > > > I find the current fix proposed by you quite ad-hoc and don't think we > > > can go that way. > > > > > > > At least my current fix is very specific, efficient and clear in its > > purpose, and suitably documented, so it is very clear when and how it > > is to be removed, when the issue is fixed in the query rewriter. > > Another concern with the alternative fix is that it always searches > > for a modifying CTE, even when parse->hasModifyingCTE is true after > > the query rewriter processing. > > > > There is a check in standard_planner such that if > parse->hasModifyingCTE is true then we won't try checking > parallel-safety. > OK, I retract that last concern, parallel-safety checks are skipped when parse->hasModifyingCTE is true. Examples of overhead will need to wait until tomorrow (and would need to test), but seems fairly clear max_parallel_hazard_walker() first checks parallel-unsafe functions in the node, then does numerous node-type checks before getting to CommonTableExpr - exactly how much extra work would depend on the SQL. Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Feb 24, 2021 at 6:21 PM Greg Nancarrow wrote: > > On Wed, Feb 24, 2021 at 10:38 PM Amit Kapila wrote: > > > > > > Thanks, I'll try it. > > > I did, however, notice a few concerns with your suggested alternative fix: > > > - It is not restricted to INSERT (as current fix is). > > > > > > > So what? The Select also has a similar problem. > > > > Yes, but you're potentially now adding overhead to every > SELECT/UPDATE/DELETE with a subquery, by the added recursive checking > and walking done by the new call to max_parallel_hazard_walker().and > code block looking for a modifying CTE > Can you please share an example where it has added an overhead? > And anyway I'm not sure it's really right putting in a fix for SELECT > with a modifying CTE, into a patch that adds parallel INSERT > functionality - in any case you'd need to really spell this out in > code comments, as this is at best a temporary fix that would need to > be removed whenever the query rewriter is fixed to set hasModifyingCTE > correctly. > > > > - It does not set parse->hasModifyingCTE (as current fix does), so the > > > return value (PlannedStmt) from standard_planner() won't have > > > hasModifyingCTE set correctly in the cases where the rewriter doesn't > > > set it correctly (and I'm not sure what strange side effects ??? that > > > might have). > > > > Here end goal is not to set hasModifyingCTE but do let me know if you > > see any problem or impact. > > > > parse->hasModifyingCTE is not just used in the shortcut-test for > parallel-safety, its value is subsequently copied into the PlannedStmt > returned by standard_planner. > It's inconsistent to leave hasModifyingCTE FALSE when by the fix it > has found a modifying CTE. > Even if no existing tests detect an issue with this, PlannedStmt is > left with a bad hasModifyingCTE value in this case, so there is the > potential for something to go wrong. > > > > - Although the invocation of max_parallel_hazard_walker() on a RTE > > > subquery will "work" in finally locating your fix's added > > > "CommonTableExpr" parallel-safety disabling block for commandType != > > > CMD_SELECT, it looks like it potentially results in checking and > > > walking over a lot of other stuff within the subquery not related to > > > CTEs. The current fix does a more specific and efficient search for a > > > modifying CTE. > > > > > > > I find the current fix proposed by you quite ad-hoc and don't think we > > can go that way. > > > > At least my current fix is very specific, efficient and clear in its > purpose, and suitably documented, so it is very clear when and how it > is to be removed, when the issue is fixed in the query rewriter. > Another concern with the alternative fix is that it always searches > for a modifying CTE, even when parse->hasModifyingCTE is true after > the query rewriter processing. > There is a check in standard_planner such that if parse->hasModifyingCTE is true then we won't try checking parallel-safety. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Feb 24, 2021 at 10:38 PM Amit Kapila wrote: > > > > Thanks, I'll try it. > > I did, however, notice a few concerns with your suggested alternative fix: > > - It is not restricted to INSERT (as current fix is). > > > > So what? The Select also has a similar problem. > Yes, but you're potentially now adding overhead to every SELECT/UPDATE/DELETE with a subquery, by the added recursive checking and walking done by the new call to max_parallel_hazard_walker().and code block looking for a modifying CTE And anyway I'm not sure it's really right putting in a fix for SELECT with a modifying CTE, into a patch that adds parallel INSERT functionality - in any case you'd need to really spell this out in code comments, as this is at best a temporary fix that would need to be removed whenever the query rewriter is fixed to set hasModifyingCTE correctly. > > - It does not set parse->hasModifyingCTE (as current fix does), so the > > return value (PlannedStmt) from standard_planner() won't have > > hasModifyingCTE set correctly in the cases where the rewriter doesn't > > set it correctly (and I'm not sure what strange side effects ??? that > > might have). > > Here end goal is not to set hasModifyingCTE but do let me know if you > see any problem or impact. > parse->hasModifyingCTE is not just used in the shortcut-test for parallel-safety, its value is subsequently copied into the PlannedStmt returned by standard_planner. It's inconsistent to leave hasModifyingCTE FALSE when by the fix it has found a modifying CTE. Even if no existing tests detect an issue with this, PlannedStmt is left with a bad hasModifyingCTE value in this case, so there is the potential for something to go wrong. > > - Although the invocation of max_parallel_hazard_walker() on a RTE > > subquery will "work" in finally locating your fix's added > > "CommonTableExpr" parallel-safety disabling block for commandType != > > CMD_SELECT, it looks like it potentially results in checking and > > walking over a lot of other stuff within the subquery not related to > > CTEs. The current fix does a more specific and efficient search for a > > modifying CTE. > > > > I find the current fix proposed by you quite ad-hoc and don't think we > can go that way. > At least my current fix is very specific, efficient and clear in its purpose, and suitably documented, so it is very clear when and how it is to be removed, when the issue is fixed in the query rewriter. Another concern with the alternative fix is that it always searches for a modifying CTE, even when parse->hasModifyingCTE is true after the query rewriter processing. Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Feb 24, 2021 at 4:30 PM Greg Nancarrow wrote: > > On Wed, Feb 24, 2021 at 8:39 PM Amit Kapila wrote: > > > > On Fri, Feb 19, 2021 at 6:56 AM Greg Nancarrow wrote: > > > > > > Posting a new version of the patches, with the following updates: > > > > > > > I am not happy with the below code changes, I think we need a better > > way to deal with this. > > > > @@ -313,19 +314,35 @@ standard_planner(Query *parse, const char > > *query_string, int cursorOptions, > > glob->transientPlan = false; > > glob->dependsOnRole = false; > > > > + if (IsModifySupportedInParallelMode(parse->commandType) && > > + !parse->hasModifyingCTE) > > + { > > + /* > > + * FIXME > > + * There is a known bug in the query rewriter: re-written queries with > > + * a modifying CTE may not have the "hasModifyingCTE" flag set. When > > + * that bug is fixed, this temporary fix must be removed. > > + * > > + * Note that here we've made a fix for this problem only for a > > + * supported-in-parallel-mode table-modification statement (i.e. > > + * INSERT), but this bug exists for SELECT too. > > + */ > > + parse->hasModifyingCTE = query_has_modifying_cte(parse); > > + } > > + > > > > I understand that this is an existing bug but I am not happy with this > > workaround. I feel it is better to check for modifyingCTE in > > max_parallel_hazard_walker. See attached, this is atop > > v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. > > > > Thanks, I'll try it. > I did, however, notice a few concerns with your suggested alternative fix: > - It is not restricted to INSERT (as current fix is). > So what? The Select also has a similar problem. > - It does not set parse->hasModifyingCTE (as current fix does), so the > return value (PlannedStmt) from standard_planner() won't have > hasModifyingCTE set correctly in the cases where the rewriter doesn't > set it correctly (and I'm not sure what strange side effects ??? that > might have). Here end goal is not to set hasModifyingCTE but do let me know if you see any problem or impact. > - Although the invocation of max_parallel_hazard_walker() on a RTE > subquery will "work" in finally locating your fix's added > "CommonTableExpr" parallel-safety disabling block for commandType != > CMD_SELECT, it looks like it potentially results in checking and > walking over a lot of other stuff within the subquery not related to > CTEs. The current fix does a more specific and efficient search for a > modifying CTE. > I find the current fix proposed by you quite ad-hoc and don't think we can go that way. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Feb 24, 2021 at 8:39 PM Amit Kapila wrote: > > On Fri, Feb 19, 2021 at 6:56 AM Greg Nancarrow wrote: > > > > Posting a new version of the patches, with the following updates: > > > > I am not happy with the below code changes, I think we need a better > way to deal with this. > > @@ -313,19 +314,35 @@ standard_planner(Query *parse, const char > *query_string, int cursorOptions, > glob->transientPlan = false; > glob->dependsOnRole = false; > > + if (IsModifySupportedInParallelMode(parse->commandType) && > + !parse->hasModifyingCTE) > + { > + /* > + * FIXME > + * There is a known bug in the query rewriter: re-written queries with > + * a modifying CTE may not have the "hasModifyingCTE" flag set. When > + * that bug is fixed, this temporary fix must be removed. > + * > + * Note that here we've made a fix for this problem only for a > + * supported-in-parallel-mode table-modification statement (i.e. > + * INSERT), but this bug exists for SELECT too. > + */ > + parse->hasModifyingCTE = query_has_modifying_cte(parse); > + } > + > > I understand that this is an existing bug but I am not happy with this > workaround. I feel it is better to check for modifyingCTE in > max_parallel_hazard_walker. See attached, this is atop > v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. > Thanks, I'll try it. I did, however, notice a few concerns with your suggested alternative fix: - It is not restricted to INSERT (as current fix is). - It does not set parse->hasModifyingCTE (as current fix does), so the return value (PlannedStmt) from standard_planner() won't have hasModifyingCTE set correctly in the cases where the rewriter doesn't set it correctly (and I'm not sure what strange side effects ??? that might have). - Although the invocation of max_parallel_hazard_walker() on a RTE subquery will "work" in finally locating your fix's added "CommonTableExpr" parallel-safety disabling block for commandType != CMD_SELECT, it looks like it potentially results in checking and walking over a lot of other stuff within the subquery not related to CTEs. The current fix does a more specific and efficient search for a modifying CTE. Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Feb 19, 2021 at 6:56 AM Greg Nancarrow wrote: > > Posting a new version of the patches, with the following updates: > I am not happy with the below code changes, I think we need a better way to deal with this. @@ -313,19 +314,35 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions, glob->transientPlan = false; glob->dependsOnRole = false; + if (IsModifySupportedInParallelMode(parse->commandType) && + !parse->hasModifyingCTE) + { + /* + * FIXME + * There is a known bug in the query rewriter: re-written queries with + * a modifying CTE may not have the "hasModifyingCTE" flag set. When + * that bug is fixed, this temporary fix must be removed. + * + * Note that here we've made a fix for this problem only for a + * supported-in-parallel-mode table-modification statement (i.e. + * INSERT), but this bug exists for SELECT too. + */ + parse->hasModifyingCTE = query_has_modifying_cte(parse); + } + I understand that this is an existing bug but I am not happy with this workaround. I feel it is better to check for modifyingCTE in max_parallel_hazard_walker. See attached, this is atop v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT. -- With Regards, Amit Kapila. fix_modifyingcte_parallel_safety_v1.patch Description: Binary data
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Feb 24, 2021 at 2:14 PM Greg Nancarrow wrote: > > On Wed, Feb 24, 2021 at 3:12 PM Amit Kapila wrote: > > > > On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow wrote: > > > > > > On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila > > > wrote: > > > > > > > > > But the non-parallel plan was chosen (instead of a parallel plan) > > > > > because of parallel-safety checks on the partitions, which found > > > > > attributes of the partitions which weren't parallel-safe. > > > > > So it's not so clear to me that the dependency doesn't exist - the > > > > > non-parallel plan does in fact depend on the state of the partitions. > > > > > > > > > > > > > Hmm, I think that is not what we can consider as a dependency. > > > > > > > > > > Then if it's not a dependency, then we shouldn't have to check the > > > attributes of the partitions for parallel-safety, to determine whether > > > we must use a non-parallel plan (or can use a parallel plan). > > > Except, of course, we do have to ... > > > > > > > I don't think the plan-dependency and checking for parallel-safety are > > directly related. > > > > That is certainly not my understanding. Why do you think that they are > not directly related? > This whole issue came about because Amit L pointed out that there is a > need to add partition OIDs as plan-dependencies BECAUSE the checking > for parallel-safety and plan-dependency are related - since now, for > Parallel INSERT, we're executing extra parallel-safety checks that > check partition properties, so the resultant plan is dependent on the > partitions and their properties. > He has pointed out an issue when the plan is parallel and you can see in that example that it fails if we didn't invalidate it. For non-parallel plans, there won't be any such issue. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Feb 24, 2021 at 3:12 PM Amit Kapila wrote: > > On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow wrote: > > > > On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila > > wrote: > > > > > > > But the non-parallel plan was chosen (instead of a parallel plan) > > > > because of parallel-safety checks on the partitions, which found > > > > attributes of the partitions which weren't parallel-safe. > > > > So it's not so clear to me that the dependency doesn't exist - the > > > > non-parallel plan does in fact depend on the state of the partitions. > > > > > > > > > > Hmm, I think that is not what we can consider as a dependency. > > > > > > > Then if it's not a dependency, then we shouldn't have to check the > > attributes of the partitions for parallel-safety, to determine whether > > we must use a non-parallel plan (or can use a parallel plan). > > Except, of course, we do have to ... > > > > I don't think the plan-dependency and checking for parallel-safety are > directly related. > That is certainly not my understanding. Why do you think that they are not directly related? This whole issue came about because Amit L pointed out that there is a need to add partition OIDs as plan-dependencies BECAUSE the checking for parallel-safety and plan-dependency are related - since now, for Parallel INSERT, we're executing extra parallel-safety checks that check partition properties, so the resultant plan is dependent on the partitions and their properties. Amit L originally explained this as follows: "I think that the concerns raised by Tsunakawa-san in: https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com regarding how this interacts with plancache.c deserve a look. Specifically, a plan that uses parallel insert may fail to be invalidated when partitions are altered directly (that is without altering their root parent). That would be because we are not adding partition OIDs to PlannerGlobal.invalItems despite making a plan that's based on checking their properties." Tsunakawa-san: "Is the cached query plan invalidated when some alteration is done to change the parallel safety, such as adding a trigger with a parallel-unsafe trigger action?" Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow wrote: > > On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila wrote: > > > > > But the non-parallel plan was chosen (instead of a parallel plan) > > > because of parallel-safety checks on the partitions, which found > > > attributes of the partitions which weren't parallel-safe. > > > So it's not so clear to me that the dependency doesn't exist - the > > > non-parallel plan does in fact depend on the state of the partitions. > > > > > > > Hmm, I think that is not what we can consider as a dependency. > > > > Then if it's not a dependency, then we shouldn't have to check the > attributes of the partitions for parallel-safety, to determine whether > we must use a non-parallel plan (or can use a parallel plan). > Except, of course, we do have to ... > I don't think the plan-dependency and checking for parallel-safety are directly related. > > > I know you're suggesting to reduce plan-cache invalidation by only > > > recording a dependency in the parallel-plan case, but I've yet to see > > > that in the existing code, and in fact it seems to be inconsistent > > > with the behaviour I've tested so far (one example given prior, but > > > will look for more). > > > > > > > I don't see your example matches what you are saying as in it the > > regular table exists in the plan whereas for the case we are > > discussing partitions doesn't exist in the plan. Amit L. seems to have > > given a correct explanation [1] of why we don't need to invalidate for > > non-parallel plans which match my understanding. > > > > [1] - > > https://www.postgresql.org/message-id/CA%2BHiwqFKJfzgBbkg0i0Fz_FGsCiXW-Fw0tBjdsaUbNbpyv0JhA%40mail.gmail.com > > > > Amit L's explanation was: > > I may be wrong but it doesn't seem to me that the possibility of > constructing a better plan due to a given change is enough reason for > plancache.c to invalidate plans that depend on that change. AIUI, > plancache.c only considers a change interesting if it would *break* a > Query or a plan. > > So in this case, a non-parallel plan may be slower, but it isn't > exactly rendered *wrong* by changes that make a parallel plan > possible. > > > This explanation doesn't seem to match existing planner behavior > AFAICS, and we should try to be consistent with existing behavior > (rather than doing something different, for partitions specifically in > this case). > I still think it matches. You have missed the important point in your example and explanation which Amit L and I am trying to explain to you. See below. > Using a concrete example, to avoid any confusion, consider the > following SQL (using unpatched Postgres, master branch): > > > -- encourage use of parallel plans, where possible > set parallel_setup_cost=0; > set parallel_tuple_cost=0; > set min_parallel_table_scan_size=0; > set max_parallel_workers_per_gather=4; > > create or replace function myfunc() returns boolean as $$ > begin > return true; > end; > $$ language plpgsql parallel unsafe immutable; > > create table mytable(x int); > insert into mytable(x) values(generate_series(1,10)); > > prepare q as select * from mytable, myfunc(); > explain execute q; > > -- change myfunc to be parallel-safe, to see the effect > -- on the planner for the same select > create or replace function myfunc() returns boolean as $$ > begin > return true; > end; > $$ language plpgsql parallel safe immutable; > > explain execute q; > > > Here a function referenced in the SELECT statement is changed from > parallel-unsafe to parallel-safe, to see the effect on plancache. > According to your referenced explanation, that shouldn't be considered > an "interesting" change by plancache.c, as it wouldn't "break" the > previously planned and cached (non-parallel) query - the existing > non-parallel plan could and should be used, as it still would execute > OK, even if slower. Right? > No that is not what I said or maybe you have misunderstood. In your example, if you use the verbose option, then you will see the output as below. postgres=# explain (verbose) execute q; QUERY PLAN -- Seq Scan on public.mytable (cost=0.00..35.50 rows=2550 width=5) Output: mytable.x, true (2 rows) Here, you can see that Plan depends on function (it's return value), so it needs to invalidated when the function changes. To, see it in a bit different way, if you change your function as below then, you can clearly see FunctionScan in the plan: create or replace function myfunc(c1 int) returns int as $$ begin return c1 * 10; end; $$ language plpgsql parallel unsafe; postgres=# prepare q as select * from mytable, myfunc(2); PREPARE postgres=# explain select * from mytable, myfunc(2); QUERY PLAN - Nested Loop (cost=0.25..61.26
Re: Parallel INSERT (INTO ... SELECT ...)
On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila wrote: > > On Tue, Feb 23, 2021 at 4:47 PM Greg Nancarrow wrote: > > > > On Tue, Feb 23, 2021 at 2:30 PM Amit Kapila wrote: > > > > > > On Tue, Feb 23, 2021 at 6:37 AM Greg Nancarrow > > > wrote: > > > > > > > > On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila > > > > wrote: > > > > > > > > > > On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow > > > > > wrote: > > > > > > > > > > > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila > > > > > > wrote: > > > > > > > > > > > > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote > > > > > > > wrote: > > > > > > > > > > > > > > > > > > It also occurred to me that we can avoid pointless adding of > > > > > > > > > > partitions if the final plan won't use parallelism. For > > > > > > > > > > that, the > > > > > > > > > > patch adds checking glob->parallelModeNeeded, which seems > > > > > > > > > > to do the > > > > > > > > > > trick though I don't know if that's the correct way of > > > > > > > > > > doing that. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm not sure if's pointless adding partitions even in the > > > > > > > > > case of NOT > > > > > > > > > using parallelism, because we may be relying on the result of > > > > > > > > > parallel-safety checks on partitions in both cases. > > > > > > > > > For example, insert_parallel.sql currently includes a test > > > > > > > > > (that you > > > > > > > > > originally provided in a previous post) that checks a > > > > > > > > > non-parallel > > > > > > > > > plan is generated after a parallel-unsafe trigger is created > > > > > > > > > on a > > > > > > > > > partition involved in the INSERT. > > > > > > > > > If I further add to that test by then dropping that trigger > > > > > > > > > and then > > > > > > > > > again using EXPLAIN to see what plan is generated, then I'd > > > > > > > > > expect a > > > > > > > > > parallel-plan to be generated, but with the setrefs-v3.patch > > > > > > > > > it still > > > > > > > > > generates a non-parallel plan. So I think the "&& > > > > > > > > > glob->parallelModeNeeded" part of test needs to be removed. > > > > > > > > > > > > > > > > Ah, okay, I didn't retest my case after making that change. > > > > > > > > > > > > > > > > > > > > > > Greg has point here but I feel something on previous lines > > > > > > > (having a > > > > > > > test of glob->parallelModeNeeded) is better. We only want to > > > > > > > invalidate the plan if the prepared plan is unsafe to execute next > > > > > > > time. It is quite possible that there are unsafe triggers on > > > > > > > different > > > > > > > partitions and only one of them is dropped, so next time planning > > > > > > > will > > > > > > > again yield to the same non-parallel plan. If we agree with that I > > > > > > > think it is better to add this dependency in set_plan_refs (along > > > > > > > with > > > > > > > Gather node handling). > > > > > > > > > > > > > > > > > > > I think we should try to be consistent with current plan-cache > > > > > > functionality, rather than possibly inventing new rules for > > > > > > partitions. > > > > > > I'm finding that with current Postgres functionality (master > > > > > > branch), > > > > > > if there are two parallel-unsafe triggers defined on a normal table > > > > > > and one is dropped, it results in replanning and it yields the same > > > > > > (non-parallel) plan. > > > > > > > > > > > > > > > > Does such a plan have partitions access in it? Can you share the plan? > > > > > > > > > > > > > Er, no (it's just a regular table), but that was exactly my point: > > > > aren't you suggesting functionality for partitions that doesn't seem > > > > to work the same way for non-partitions? > > > > > > > > > > I don't think so. The non-parallel plan for Insert doesn't directly > > > depend on partitions so we don't need to invalidate those. > > > > > > > But the non-parallel plan was chosen (instead of a parallel plan) > > because of parallel-safety checks on the partitions, which found > > attributes of the partitions which weren't parallel-safe. > > So it's not so clear to me that the dependency doesn't exist - the > > non-parallel plan does in fact depend on the state of the partitions. > > > > Hmm, I think that is not what we can consider as a dependency. > Then if it's not a dependency, then we shouldn't have to check the attributes of the partitions for parallel-safety, to determine whether we must use a non-parallel plan (or can use a parallel plan). Except, of course, we do have to ... > > I know you're suggesting to reduce plan-cache invalidation by only > > recording a dependency in the parallel-plan case, but I've yet to see > > that in the existing code, and in fact it seems to be inconsistent > > with the behaviour I've tested so far (one example given prior, but > > will look for more). > > > > I don't see your example matches what you are saying as in it the > regular table exists in the plan whereas for
Re: Parallel INSERT (INTO ... SELECT ...)
On Tue, Feb 23, 2021 at 4:47 PM Greg Nancarrow wrote: > > On Tue, Feb 23, 2021 at 2:30 PM Amit Kapila wrote: > > > > On Tue, Feb 23, 2021 at 6:37 AM Greg Nancarrow wrote: > > > > > > On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila > > > wrote: > > > > > > > > On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow > > > > wrote: > > > > > > > > > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila > > > > > wrote: > > > > > > > > > > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote > > > > > > wrote: > > > > > > > > > > > > > > > > It also occurred to me that we can avoid pointless adding of > > > > > > > > > partitions if the final plan won't use parallelism. For > > > > > > > > > that, the > > > > > > > > > patch adds checking glob->parallelModeNeeded, which seems to > > > > > > > > > do the > > > > > > > > > trick though I don't know if that's the correct way of doing > > > > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > > > > I'm not sure if's pointless adding partitions even in the case > > > > > > > > of NOT > > > > > > > > using parallelism, because we may be relying on the result of > > > > > > > > parallel-safety checks on partitions in both cases. > > > > > > > > For example, insert_parallel.sql currently includes a test > > > > > > > > (that you > > > > > > > > originally provided in a previous post) that checks a > > > > > > > > non-parallel > > > > > > > > plan is generated after a parallel-unsafe trigger is created on > > > > > > > > a > > > > > > > > partition involved in the INSERT. > > > > > > > > If I further add to that test by then dropping that trigger and > > > > > > > > then > > > > > > > > again using EXPLAIN to see what plan is generated, then I'd > > > > > > > > expect a > > > > > > > > parallel-plan to be generated, but with the setrefs-v3.patch it > > > > > > > > still > > > > > > > > generates a non-parallel plan. So I think the "&& > > > > > > > > glob->parallelModeNeeded" part of test needs to be removed. > > > > > > > > > > > > > > Ah, okay, I didn't retest my case after making that change. > > > > > > > > > > > > > > > > > > > Greg has point here but I feel something on previous lines (having a > > > > > > test of glob->parallelModeNeeded) is better. We only want to > > > > > > invalidate the plan if the prepared plan is unsafe to execute next > > > > > > time. It is quite possible that there are unsafe triggers on > > > > > > different > > > > > > partitions and only one of them is dropped, so next time planning > > > > > > will > > > > > > again yield to the same non-parallel plan. If we agree with that I > > > > > > think it is better to add this dependency in set_plan_refs (along > > > > > > with > > > > > > Gather node handling). > > > > > > > > > > > > > > > > I think we should try to be consistent with current plan-cache > > > > > functionality, rather than possibly inventing new rules for > > > > > partitions. > > > > > I'm finding that with current Postgres functionality (master branch), > > > > > if there are two parallel-unsafe triggers defined on a normal table > > > > > and one is dropped, it results in replanning and it yields the same > > > > > (non-parallel) plan. > > > > > > > > > > > > > Does such a plan have partitions access in it? Can you share the plan? > > > > > > > > > > Er, no (it's just a regular table), but that was exactly my point: > > > aren't you suggesting functionality for partitions that doesn't seem > > > to work the same way for non-partitions? > > > > > > > I don't think so. The non-parallel plan for Insert doesn't directly > > depend on partitions so we don't need to invalidate those. > > > > But the non-parallel plan was chosen (instead of a parallel plan) > because of parallel-safety checks on the partitions, which found > attributes of the partitions which weren't parallel-safe. > So it's not so clear to me that the dependency doesn't exist - the > non-parallel plan does in fact depend on the state of the partitions. > Hmm, I think that is not what we can consider as a dependency. > I know you're suggesting to reduce plan-cache invalidation by only > recording a dependency in the parallel-plan case, but I've yet to see > that in the existing code, and in fact it seems to be inconsistent > with the behaviour I've tested so far (one example given prior, but > will look for more). > I don't see your example matches what you are saying as in it the regular table exists in the plan whereas for the case we are discussing partitions doesn't exist in the plan. Amit L. seems to have given a correct explanation [1] of why we don't need to invalidate for non-parallel plans which match my understanding. [1] - https://www.postgresql.org/message-id/CA%2BHiwqFKJfzgBbkg0i0Fz_FGsCiXW-Fw0tBjdsaUbNbpyv0JhA%40mail.gmail.com -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Tue, Feb 23, 2021 at 2:30 PM Amit Kapila wrote: > > On Tue, Feb 23, 2021 at 6:37 AM Greg Nancarrow wrote: > > > > On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila > > wrote: > > > > > > On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow > > > wrote: > > > > > > > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila > > > > wrote: > > > > > > > > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote > > > > > wrote: > > > > > > > > > > > > > > It also occurred to me that we can avoid pointless adding of > > > > > > > > partitions if the final plan won't use parallelism. For that, > > > > > > > > the > > > > > > > > patch adds checking glob->parallelModeNeeded, which seems to do > > > > > > > > the > > > > > > > > trick though I don't know if that's the correct way of doing > > > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > I'm not sure if's pointless adding partitions even in the case of > > > > > > > NOT > > > > > > > using parallelism, because we may be relying on the result of > > > > > > > parallel-safety checks on partitions in both cases. > > > > > > > For example, insert_parallel.sql currently includes a test (that > > > > > > > you > > > > > > > originally provided in a previous post) that checks a non-parallel > > > > > > > plan is generated after a parallel-unsafe trigger is created on a > > > > > > > partition involved in the INSERT. > > > > > > > If I further add to that test by then dropping that trigger and > > > > > > > then > > > > > > > again using EXPLAIN to see what plan is generated, then I'd > > > > > > > expect a > > > > > > > parallel-plan to be generated, but with the setrefs-v3.patch it > > > > > > > still > > > > > > > generates a non-parallel plan. So I think the "&& > > > > > > > glob->parallelModeNeeded" part of test needs to be removed. > > > > > > > > > > > > Ah, okay, I didn't retest my case after making that change. > > > > > > > > > > > > > > > > Greg has point here but I feel something on previous lines (having a > > > > > test of glob->parallelModeNeeded) is better. We only want to > > > > > invalidate the plan if the prepared plan is unsafe to execute next > > > > > time. It is quite possible that there are unsafe triggers on different > > > > > partitions and only one of them is dropped, so next time planning will > > > > > again yield to the same non-parallel plan. If we agree with that I > > > > > think it is better to add this dependency in set_plan_refs (along with > > > > > Gather node handling). > > > > > > > > > > > > > I think we should try to be consistent with current plan-cache > > > > functionality, rather than possibly inventing new rules for > > > > partitions. > > > > I'm finding that with current Postgres functionality (master branch), > > > > if there are two parallel-unsafe triggers defined on a normal table > > > > and one is dropped, it results in replanning and it yields the same > > > > (non-parallel) plan. > > > > > > > > > > Does such a plan have partitions access in it? Can you share the plan? > > > > > > > Er, no (it's just a regular table), but that was exactly my point: > > aren't you suggesting functionality for partitions that doesn't seem > > to work the same way for non-partitions? > > > > I don't think so. The non-parallel plan for Insert doesn't directly > depend on partitions so we don't need to invalidate those. > But the non-parallel plan was chosen (instead of a parallel plan) because of parallel-safety checks on the partitions, which found attributes of the partitions which weren't parallel-safe. So it's not so clear to me that the dependency doesn't exist - the non-parallel plan does in fact depend on the state of the partitions. I know you're suggesting to reduce plan-cache invalidation by only recording a dependency in the parallel-plan case, but I've yet to see that in the existing code, and in fact it seems to be inconsistent with the behaviour I've tested so far (one example given prior, but will look for more). Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Tue, Feb 23, 2021 at 6:37 AM Greg Nancarrow wrote: > > On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila wrote: > > > > On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow wrote: > > > > > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila > > > wrote: > > > > > > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote > > > > wrote: > > > > > > > > > > > > It also occurred to me that we can avoid pointless adding of > > > > > > > partitions if the final plan won't use parallelism. For that, the > > > > > > > patch adds checking glob->parallelModeNeeded, which seems to do > > > > > > > the > > > > > > > trick though I don't know if that's the correct way of doing that. > > > > > > > > > > > > > > > > > > > I'm not sure if's pointless adding partitions even in the case of > > > > > > NOT > > > > > > using parallelism, because we may be relying on the result of > > > > > > parallel-safety checks on partitions in both cases. > > > > > > For example, insert_parallel.sql currently includes a test (that you > > > > > > originally provided in a previous post) that checks a non-parallel > > > > > > plan is generated after a parallel-unsafe trigger is created on a > > > > > > partition involved in the INSERT. > > > > > > If I further add to that test by then dropping that trigger and then > > > > > > again using EXPLAIN to see what plan is generated, then I'd expect a > > > > > > parallel-plan to be generated, but with the setrefs-v3.patch it > > > > > > still > > > > > > generates a non-parallel plan. So I think the "&& > > > > > > glob->parallelModeNeeded" part of test needs to be removed. > > > > > > > > > > Ah, okay, I didn't retest my case after making that change. > > > > > > > > > > > > > Greg has point here but I feel something on previous lines (having a > > > > test of glob->parallelModeNeeded) is better. We only want to > > > > invalidate the plan if the prepared plan is unsafe to execute next > > > > time. It is quite possible that there are unsafe triggers on different > > > > partitions and only one of them is dropped, so next time planning will > > > > again yield to the same non-parallel plan. If we agree with that I > > > > think it is better to add this dependency in set_plan_refs (along with > > > > Gather node handling). > > > > > > > > > > I think we should try to be consistent with current plan-cache > > > functionality, rather than possibly inventing new rules for > > > partitions. > > > I'm finding that with current Postgres functionality (master branch), > > > if there are two parallel-unsafe triggers defined on a normal table > > > and one is dropped, it results in replanning and it yields the same > > > (non-parallel) plan. > > > > > > > Does such a plan have partitions access in it? Can you share the plan? > > > > Er, no (it's just a regular table), but that was exactly my point: > aren't you suggesting functionality for partitions that doesn't seem > to work the same way for non-partitions? > I don't think so. The non-parallel plan for Insert doesn't directly depend on partitions so we don't need to invalidate those. -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila wrote: > > On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow wrote: > > > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila wrote: > > > > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote > > > wrote: > > > > > > > > > > It also occurred to me that we can avoid pointless adding of > > > > > > partitions if the final plan won't use parallelism. For that, the > > > > > > patch adds checking glob->parallelModeNeeded, which seems to do the > > > > > > trick though I don't know if that's the correct way of doing that. > > > > > > > > > > > > > > > > I'm not sure if's pointless adding partitions even in the case of NOT > > > > > using parallelism, because we may be relying on the result of > > > > > parallel-safety checks on partitions in both cases. > > > > > For example, insert_parallel.sql currently includes a test (that you > > > > > originally provided in a previous post) that checks a non-parallel > > > > > plan is generated after a parallel-unsafe trigger is created on a > > > > > partition involved in the INSERT. > > > > > If I further add to that test by then dropping that trigger and then > > > > > again using EXPLAIN to see what plan is generated, then I'd expect a > > > > > parallel-plan to be generated, but with the setrefs-v3.patch it still > > > > > generates a non-parallel plan. So I think the "&& > > > > > glob->parallelModeNeeded" part of test needs to be removed. > > > > > > > > Ah, okay, I didn't retest my case after making that change. > > > > > > > > > > Greg has point here but I feel something on previous lines (having a > > > test of glob->parallelModeNeeded) is better. We only want to > > > invalidate the plan if the prepared plan is unsafe to execute next > > > time. It is quite possible that there are unsafe triggers on different > > > partitions and only one of them is dropped, so next time planning will > > > again yield to the same non-parallel plan. If we agree with that I > > > think it is better to add this dependency in set_plan_refs (along with > > > Gather node handling). > > > > > > > I think we should try to be consistent with current plan-cache > > functionality, rather than possibly inventing new rules for > > partitions. > > I'm finding that with current Postgres functionality (master branch), > > if there are two parallel-unsafe triggers defined on a normal table > > and one is dropped, it results in replanning and it yields the same > > (non-parallel) plan. > > > > Does such a plan have partitions access in it? Can you share the plan? > Er, no (it's just a regular table), but that was exactly my point: aren't you suggesting functionality for partitions that doesn't seem to work the same way for non-partitions? Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow wrote: > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila wrote: > > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote > > wrote: > > > > > > > > It also occurred to me that we can avoid pointless adding of > > > > > partitions if the final plan won't use parallelism. For that, the > > > > > patch adds checking glob->parallelModeNeeded, which seems to do the > > > > > trick though I don't know if that's the correct way of doing that. > > > > > > > > > > > > > I'm not sure if's pointless adding partitions even in the case of NOT > > > > using parallelism, because we may be relying on the result of > > > > parallel-safety checks on partitions in both cases. > > > > For example, insert_parallel.sql currently includes a test (that you > > > > originally provided in a previous post) that checks a non-parallel > > > > plan is generated after a parallel-unsafe trigger is created on a > > > > partition involved in the INSERT. > > > > If I further add to that test by then dropping that trigger and then > > > > again using EXPLAIN to see what plan is generated, then I'd expect a > > > > parallel-plan to be generated, but with the setrefs-v3.patch it still > > > > generates a non-parallel plan. So I think the "&& > > > > glob->parallelModeNeeded" part of test needs to be removed. > > > > > > Ah, okay, I didn't retest my case after making that change. > > > > > > > Greg has point here but I feel something on previous lines (having a > > test of glob->parallelModeNeeded) is better. We only want to > > invalidate the plan if the prepared plan is unsafe to execute next > > time. It is quite possible that there are unsafe triggers on different > > partitions and only one of them is dropped, so next time planning will > > again yield to the same non-parallel plan. If we agree with that I > > think it is better to add this dependency in set_plan_refs (along with > > Gather node handling). > > > > I think we should try to be consistent with current plan-cache > functionality, rather than possibly inventing new rules for > partitions. > I'm finding that with current Postgres functionality (master branch), > if there are two parallel-unsafe triggers defined on a normal table > and one is dropped, it results in replanning and it yields the same > (non-parallel) plan. > Does such a plan have partitions access in it? Can you share the plan? -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Feb 22, 2021 at 8:46 AM Amit Langote wrote: > > On Fri, Feb 19, 2021 at 7:38 PM Amit Kapila wrote: > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote > > wrote: > > > > > > > > It also occurred to me that we can avoid pointless adding of > > > > > partitions if the final plan won't use parallelism. For that, the > > > > > patch adds checking glob->parallelModeNeeded, which seems to do the > > > > > trick though I don't know if that's the correct way of doing that. > > > > > > > > > > > > > I'm not sure if's pointless adding partitions even in the case of NOT > > > > using parallelism, because we may be relying on the result of > > > > parallel-safety checks on partitions in both cases. > > > > For example, insert_parallel.sql currently includes a test (that you > > > > originally provided in a previous post) that checks a non-parallel > > > > plan is generated after a parallel-unsafe trigger is created on a > > > > partition involved in the INSERT. > > > > If I further add to that test by then dropping that trigger and then > > > > again using EXPLAIN to see what plan is generated, then I'd expect a > > > > parallel-plan to be generated, but with the setrefs-v3.patch it still > > > > generates a non-parallel plan. So I think the "&& > > > > glob->parallelModeNeeded" part of test needs to be removed. > > > > > > Ah, okay, I didn't retest my case after making that change. > > > > Greg has point here but I feel something on previous lines (having a > > test of glob->parallelModeNeeded) is better. We only want to > > invalidate the plan if the prepared plan is unsafe to execute next > > time. > > > > It is quite possible that there are unsafe triggers on different > > partitions and only one of them is dropped, so next time planning will > > again yield to the same non-parallel plan. If we agree with that I > > think it is better to add this dependency in set_plan_refs (along with > > Gather node handling). > > Are you saying that partitions shouldn't be added to the dependency > list if a parallel plan was not chosen for insert into a partitioned > table for whatever reason (parallel unsafe expressions or beaten by > other paths in terms of cost)? > Right. > If so, I am inclined to agree with > that. > > I may be wrong but it doesn't seem to me that the possibility of > constructing a better plan due to a given change is enough reason for > plancache.c to invalidate plans that depend on that change. AIUI, > plancache.c only considers a change interesting if it would *break* a > Query or a plan. > that makes sense to me. > So in this case, a non-parallel plan may be slower, but it isn't > exactly rendered *wrong* by changes that make a parallel plan > possible. > Right. > > Also, if we agree that we don't have any cheap way to determine > > parallel-safety of partitioned relations then shall we consider the > > patch being discussed [1] to be combined here? > > Yes, I think it does make sense to consider the GUC patch with the > patches on this thread. > Cool, Greg/Hou, can we consider this along with the next patch? -- With Regards, Amit Kapila.
Re: Parallel INSERT (INTO ... SELECT ...)
On Mon, Feb 22, 2021 at 6:25 PM houzj.f...@fujitsu.com wrote: > > Hi > > (I may be wrong here) > I noticed that the patch does not have check for partial index(index > predicate). > It seems parallel index build will check it like the following: > -- > /* > * Determine if it's safe to proceed. > * > * Currently, parallel workers can't access the leader's temporary > tables. > * Furthermore, any index predicate or index expressions must be > parallel > * safe. > */ > if (heap->rd_rel->relpersistence == RELPERSISTENCE_TEMP || > !is_parallel_safe(root, (Node *) > RelationGetIndexExpressions(index)) || > !is_parallel_safe(root, (Node *) > RelationGetIndexPredicate(index))) > -- > > Should we do parallel safety check for it ? > Thanks, it looks like you're right, it is missing (and there's no test for it). I can add a fix to the index-checking code, something like: +if (!found_max_hazard) +{ +ii_Predicate = RelationGetIndexPredicate(index_rel); +if (ii_Predicate != NIL) +{ +if (max_parallel_hazard_walker((Node *)ii_Predicate, context)) +{ +found_max_hazard = true; +} +} +} Also will need a bit of renaming of that function. I'll include this in the next patch update. Regards, Greg Nancarrow Fujitsu Australia
RE: Parallel INSERT (INTO ... SELECT ...)
> Posting a new version of the patches, with the following updates: > - Moved the update of glob->relationOIDs (i.e. addition of partition OIDs that > plan depends on, resulting from parallel-safety checks) from within > max_parallel_hazard() to set_plan_references(). > - Added an extra test for partition plan-cache invalidation. > - Simplified query_has_modifying_cte() temporary bug-fix. > - Added a comment explaining why parallel-safety of partition column defaults > is not checked. > - Minor simplification: hasSubQuery return to > is_parallel_possible_for_modify(). Hi (I may be wrong here) I noticed that the patch does not have check for partial index(index predicate). It seems parallel index build will check it like the following: -- /* * Determine if it's safe to proceed. * * Currently, parallel workers can't access the leader's temporary tables. * Furthermore, any index predicate or index expressions must be parallel * safe. */ if (heap->rd_rel->relpersistence == RELPERSISTENCE_TEMP || !is_parallel_safe(root, (Node *) RelationGetIndexExpressions(index)) || !is_parallel_safe(root, (Node *) RelationGetIndexPredicate(index))) -- Should we do parallel safety check for it ? Best regards, houzj
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Feb 19, 2021 at 7:38 PM Amit Kapila wrote: > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote wrote: > > > > > > It also occurred to me that we can avoid pointless adding of > > > > partitions if the final plan won't use parallelism. For that, the > > > > patch adds checking glob->parallelModeNeeded, which seems to do the > > > > trick though I don't know if that's the correct way of doing that. > > > > > > > > > > I'm not sure if's pointless adding partitions even in the case of NOT > > > using parallelism, because we may be relying on the result of > > > parallel-safety checks on partitions in both cases. > > > For example, insert_parallel.sql currently includes a test (that you > > > originally provided in a previous post) that checks a non-parallel > > > plan is generated after a parallel-unsafe trigger is created on a > > > partition involved in the INSERT. > > > If I further add to that test by then dropping that trigger and then > > > again using EXPLAIN to see what plan is generated, then I'd expect a > > > parallel-plan to be generated, but with the setrefs-v3.patch it still > > > generates a non-parallel plan. So I think the "&& > > > glob->parallelModeNeeded" part of test needs to be removed. > > > > Ah, okay, I didn't retest my case after making that change. > > Greg has point here but I feel something on previous lines (having a > test of glob->parallelModeNeeded) is better. We only want to > invalidate the plan if the prepared plan is unsafe to execute next > time. > > It is quite possible that there are unsafe triggers on different > partitions and only one of them is dropped, so next time planning will > again yield to the same non-parallel plan. If we agree with that I > think it is better to add this dependency in set_plan_refs (along with > Gather node handling). Are you saying that partitions shouldn't be added to the dependency list if a parallel plan was not chosen for insert into a partitioned table for whatever reason (parallel unsafe expressions or beaten by other paths in terms of cost)? If so, I am inclined to agree with that. I may be wrong but it doesn't seem to me that the possibility of constructing a better plan due to a given change is enough reason for plancache.c to invalidate plans that depend on that change. AIUI, plancache.c only considers a change interesting if it would *break* a Query or a plan. So in this case, a non-parallel plan may be slower, but it isn't exactly rendered *wrong* by changes that make a parallel plan possible. > Also, if we agree that we don't have any cheap way to determine > parallel-safety of partitioned relations then shall we consider the > patch being discussed [1] to be combined here? Yes, I think it does make sense to consider the GUC patch with the patches on this thread. > I feel we should focus on getting the first patch of Greg > (v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT, along with > a test case patch) and Hou-San's patch discussed at [1] ready. Then we > can focus on the > v18-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO. Because > even if we get the first patch that is good enough for some users. +1. -- Amit Langote EDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila wrote: > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote wrote: > > > > > > It also occurred to me that we can avoid pointless adding of > > > > partitions if the final plan won't use parallelism. For that, the > > > > patch adds checking glob->parallelModeNeeded, which seems to do the > > > > trick though I don't know if that's the correct way of doing that. > > > > > > > > > > I'm not sure if's pointless adding partitions even in the case of NOT > > > using parallelism, because we may be relying on the result of > > > parallel-safety checks on partitions in both cases. > > > For example, insert_parallel.sql currently includes a test (that you > > > originally provided in a previous post) that checks a non-parallel > > > plan is generated after a parallel-unsafe trigger is created on a > > > partition involved in the INSERT. > > > If I further add to that test by then dropping that trigger and then > > > again using EXPLAIN to see what plan is generated, then I'd expect a > > > parallel-plan to be generated, but with the setrefs-v3.patch it still > > > generates a non-parallel plan. So I think the "&& > > > glob->parallelModeNeeded" part of test needs to be removed. > > > > Ah, okay, I didn't retest my case after making that change. > > > > Greg has point here but I feel something on previous lines (having a > test of glob->parallelModeNeeded) is better. We only want to > invalidate the plan if the prepared plan is unsafe to execute next > time. It is quite possible that there are unsafe triggers on different > partitions and only one of them is dropped, so next time planning will > again yield to the same non-parallel plan. If we agree with that I > think it is better to add this dependency in set_plan_refs (along with > Gather node handling). > I think we should try to be consistent with current plan-cache functionality, rather than possibly inventing new rules for partitions. I'm finding that with current Postgres functionality (master branch), if there are two parallel-unsafe triggers defined on a normal table and one is dropped, it results in replanning and it yields the same (non-parallel) plan. This would seem to go against what you are suggesting. Regards, Greg Nancarrow Fujitsu Australia
RE: Parallel INSERT (INTO ... SELECT ...)
From: Amit Kapila > It is quite possible what you are saying is correct but I feel that is > not this patch's fault. So, won't it better to discuss this in a > separate thread? > > Good use case but again, I think this can be done as a separate patch. Agreed. I think even the current patch offers great benefits and can be committed in PG 14, even if all my four feedback comments are left unaddressed. I just touched on them for completeness in terms of typically expected use cases. They will probably be able to be implemented along the current design. > I think here you are talking about the third patch (Parallel Inserts). > I guess if one has run inserts parallelly from psql then also similar > behavior would have been observed. For tables, it might lead to better > results once we have the patch discussed at [1]. Actually, this needs > more investigation. > > [1] - > https://www.postgresql.org/message-id/20200508072545.GA9701%40telsaso > ft.com That looks interesting and worth a try. Regards Takayuki Tsunakawa
Re: Parallel INSERT (INTO ... SELECT ...)
On Fri, Feb 19, 2021 at 10:13 AM tsunakawa.ta...@fujitsu.com wrote: > > From: Greg Nancarrow > -- > On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.ta...@fujitsu.com > wrote: > > (8) > > + /* > > +* If the trigger type is RI_TRIGGER_FK, this indicates a > > FK exists in > > +* the relation, and this would result in creation of new > > CommandIds > > +* on insert/update/delete and this isn't supported in a > > parallel > > +* worker (but is safe in the parallel leader). > > +*/ > > + trigtype = RI_FKey_trigger_type(trigger->tgfoid); > > + if (trigtype == RI_TRIGGER_FK) > > + { > > + if > > (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)) > > + return true; > > + } > > > > Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK > > triggers do not generate command IDs. See RI_FKey_check() which is called > > in RI_TRIGGER_FK case. In there, ri_PerformCheck() is called with the > > detectNewRows argument set to false, which causes CommandCounterIncrement() > > to not be called. > > > > Hmmm, I'm not sure that you have read and interpreted the patch code > correctly. > The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign > key, and an insert into such a table will generate a new commandId (so we > must avoid that, as we don't currently have the technology to support sharing > of new command IDs across the participants in the parallel operation). This > is what the code comment says, It does not say that such a trigger generates > a new command ID. > > See Amit's updated comment here: > https://github.com/postgres/postgres/commit/0d32511eca5aec205cb6b609638ea67129ef6665 > > In addition, the 2nd patch has an explicit test case for this (testing insert > into a table that has a FK). > -- > > > First of all, I anticipate this parallel INSERT SELECT feature will typically > shine, and expected to work, in the ETL or ELT into a data warehouse or an > ODS for analytics. Bearing that in mind, let me list some issues or > questions below. But the current state of the patch would be of course > attractive in some workloads, so I don't think these are not necessarily > blockers. > > > (1) > According to the classic book "The Data Warehouse Toolkit" and the website > [1] by its author, the fact table (large transaction history) in the data > warehouse has foreign keys referencing to the dimension tables (small or > medium-sized master or reference data). So, parallel insert will be > effective if it works when loading data into the fact table with foreign keys. > > To answer the above question, I'm assuming: > > CREATE TABLE some_dimension (key_col int PRIMARY KEY); > CREATE TABLE some_fact (some_key int REFERENCES some_dimension); > INSERT INTO some_fact SELECT ...; > > > My naive question is, "why should new command IDs be generated to check > foreign key constraints in this INSERT case? The check just reads the parent > (some_dimension table here)..." > It is quite possible what you are saying is correct but I feel that is not this patch's fault. So, won't it better to discuss this in a separate thread? > > > (2) > Likewise, dimension tables have surrogate keys that are typically implemented > as a sequence or an identity column. It is suggested that even fact tables > sometimes (or often?) have surrogate keys. But the current patch does not > parallelize the statement when the target table has a sequence or an identity > column. > > I was looking at the sequence code, and my naive (again) idea is that the > parallel leader and workers allocates numbers from the sequence > independently, and sets the largest number of them as the session's currval > at the end of parallel operation. We have to note in the documentation that > gaps in the sequence numbers will arise and not used in parallel DML. > Good use case but again, I think this can be done as a separate patch. > > (3) > As Hou-san demonstrated, the current patch causes the resulting table and > index to become larger when inserted in parallel than in inserted serially. > This could be a problem for analytics use cases where the table is just > inserted and read only afterwards. We could advise the user to run REINDEX > CONCURRENTLY after loading data, but what about tables? > I
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Feb 18, 2021 at 11:05 AM Amit Langote wrote: > > > > It also occurred to me that we can avoid pointless adding of > > > partitions if the final plan won't use parallelism. For that, the > > > patch adds checking glob->parallelModeNeeded, which seems to do the > > > trick though I don't know if that's the correct way of doing that. > > > > > > > I'm not sure if's pointless adding partitions even in the case of NOT > > using parallelism, because we may be relying on the result of > > parallel-safety checks on partitions in both cases. > > For example, insert_parallel.sql currently includes a test (that you > > originally provided in a previous post) that checks a non-parallel > > plan is generated after a parallel-unsafe trigger is created on a > > partition involved in the INSERT. > > If I further add to that test by then dropping that trigger and then > > again using EXPLAIN to see what plan is generated, then I'd expect a > > parallel-plan to be generated, but with the setrefs-v3.patch it still > > generates a non-parallel plan. So I think the "&& > > glob->parallelModeNeeded" part of test needs to be removed. > > Ah, okay, I didn't retest my case after making that change. > Greg has point here but I feel something on previous lines (having a test of glob->parallelModeNeeded) is better. We only want to invalidate the plan if the prepared plan is unsafe to execute next time. It is quite possible that there are unsafe triggers on different partitions and only one of them is dropped, so next time planning will again yield to the same non-parallel plan. If we agree with that I think it is better to add this dependency in set_plan_refs (along with Gather node handling). Also, if we agree that we don't have any cheap way to determine parallel-safety of partitioned relations then shall we consider the patch being discussed [1] to be combined here? Amit L, do you agree with that line of thought, or you have any other ideas? I feel we should focus on getting the first patch of Greg (v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT, along with a test case patch) and Hou-San's patch discussed at [1] ready. Then we can focus on the v18-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO. Because even if we get the first patch that is good enough for some users. What do you think? [1] - https://www.postgresql.org/message-id/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com -- With Regards, Amit Kapila.
RE: Parallel INSERT (INTO ... SELECT ...)
From: Greg Nancarrow -- On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.ta...@fujitsu.com wrote: > (8) > + /* > +* If the trigger type is RI_TRIGGER_FK, this indicates a FK > exists in > +* the relation, and this would result in creation of new > CommandIds > +* on insert/update/delete and this isn't supported in a > parallel > +* worker (but is safe in the parallel leader). > +*/ > + trigtype = RI_FKey_trigger_type(trigger->tgfoid); > + if (trigtype == RI_TRIGGER_FK) > + { > + if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, > context)) > + return true; > + } > > Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK > triggers do not generate command IDs. See RI_FKey_check() which is called in > RI_TRIGGER_FK case. In there, ri_PerformCheck() is called with the > detectNewRows argument set to false, which causes CommandCounterIncrement() > to not be called. > Hmmm, I'm not sure that you have read and interpreted the patch code correctly. The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign key, and an insert into such a table will generate a new commandId (so we must avoid that, as we don't currently have the technology to support sharing of new command IDs across the participants in the parallel operation). This is what the code comment says, It does not say that such a trigger generates a new command ID. See Amit's updated comment here: https://github.com/postgres/postgres/commit/0d32511eca5aec205cb6b609638ea67129ef6665 In addition, the 2nd patch has an explicit test case for this (testing insert into a table that has a FK). ---------- First of all, I anticipate this parallel INSERT SELECT feature will typically shine, and expected to work, in the ETL or ELT into a data warehouse or an ODS for analytics. Bearing that in mind, let me list some issues or questions below. But the current state of the patch would be of course attractive in some workloads, so I don't think these are not necessarily blockers. (1) According to the classic book "The Data Warehouse Toolkit" and the website [1] by its author, the fact table (large transaction history) in the data warehouse has foreign keys referencing to the dimension tables (small or medium-sized master or reference data). So, parallel insert will be effective if it works when loading data into the fact table with foreign keys. To answer the above question, I'm assuming: CREATE TABLE some_dimension (key_col int PRIMARY KEY); CREATE TABLE some_fact (some_key int REFERENCES some_dimension); INSERT INTO some_fact SELECT ...; My naive question is, "why should new command IDs be generated to check foreign key constraints in this INSERT case? The check just reads the parent (some_dimension table here)..." Looking a bit deeper into the code, although ri_PerformCheck() itself tries to avoid generating command IDs, it calls _SPI_execute_snapshot() with the read_only argument always set to false. It in turn calls _SPI_execute_plan() -> CommandCounterIncrement() as follows: [_SPI_execute_plan()] /* * If not read-only mode, advance the command counter before each * command and update the snapshot. */ if (!read_only && !plan->no_snapshots) { CommandCounterIncrement(); UpdateActiveSnapshotCommandId(); } Can't we pass true to read_only from ri_PerformCheck() in some cases? (2) Likewise, dimension tables have surrogate keys that are typically implemented as a sequence or an identity column. It is suggested that even fact tables sometimes (or often?) have surrogate keys. But the current patch does not parallelize the statement when the target table has a sequence or an identity column. I was looking at the sequence code, and my naive (again) idea is that the parallel leader and workers allocates numbers from the sequence independently, and sets the largest number of them as the session's currval at the end of parallel operation. We have to note in the documentation that gaps in the sequence numbers will arise and not used in parallel DML. (3) As Hou-san demonstrated, the current patch causes the resulting table and index to become larger when inserted in parallel than in inserted serially. This could be a problem for analytics use cases where the table is just inserted and read only afterwards. We could advise the user to run REINDEX CONCURRENTLY after loading data, but what about tables? BTW, I don't know if Oracle and SQL Server have similar i
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Feb 18, 2021 at 4:35 PM Amit Langote wrote: > > Looking at this again, I am a bit concerned about going over the whole > partition tree *twice* when making a parallel plan for insert into > partitioned tables. Maybe we should do what you did in your first > attempt a slightly differently -- add partition OIDs during the > max_parallel_hazard() initiated scan of the partition tree as you did. > Instead of adding them directly to PlannerGlobal.relationOids, add to, > say, PlannerInfo.targetPartitionOids and have set_plan_references() do > list_concat(glob->relationOids, list_copy(root->targetPartitionOids) > in the same place as setrefs-v3 does > add_target_partition_oids_recurse(). Thoughts? > Agreed, that might be a better approach, and that way we're also only recording the partition OIDs that the parallel-safety checks are relying on. I'll give it a go and see if I can detect any issues with this method. Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Feb 18, 2021 at 10:03 AM Greg Nancarrow wrote: > On Thu, Feb 18, 2021 at 12:34 AM Amit Langote wrote: > > All that is to say that we should move our code to add partition OIDs > > as plan dependencies to somewhere in set_plan_references(), which > > otherwise populates PlannedStmt.relationOids. I updated the patch to > > do that. > > OK, understood. Thanks for the detailed explanation. > > > It also occurred to me that we can avoid pointless adding of > > partitions if the final plan won't use parallelism. For that, the > > patch adds checking glob->parallelModeNeeded, which seems to do the > > trick though I don't know if that's the correct way of doing that. > > > > I'm not sure if's pointless adding partitions even in the case of NOT > using parallelism, because we may be relying on the result of > parallel-safety checks on partitions in both cases. > For example, insert_parallel.sql currently includes a test (that you > originally provided in a previous post) that checks a non-parallel > plan is generated after a parallel-unsafe trigger is created on a > partition involved in the INSERT. > If I further add to that test by then dropping that trigger and then > again using EXPLAIN to see what plan is generated, then I'd expect a > parallel-plan to be generated, but with the setrefs-v3.patch it still > generates a non-parallel plan. So I think the "&& > glob->parallelModeNeeded" part of test needs to be removed. Ah, okay, I didn't retest my case after making that change. Looking at this again, I am a bit concerned about going over the whole partition tree *twice* when making a parallel plan for insert into partitioned tables. Maybe we should do what you did in your first attempt a slightly differently -- add partition OIDs during the max_parallel_hazard() initiated scan of the partition tree as you did. Instead of adding them directly to PlannerGlobal.relationOids, add to, say, PlannerInfo.targetPartitionOids and have set_plan_references() do list_concat(glob->relationOids, list_copy(root->targetPartitionOids) in the same place as setrefs-v3 does add_target_partition_oids_recurse(). Thoughts? -- Amit Langote EDB: http://www.enterprisedb.com
Re: Parallel INSERT (INTO ... SELECT ...)
On Thu, Feb 18, 2021 at 12:34 AM Amit Langote wrote: > > > Your revised version seems OK, though I do have a concern: > > Is the use of "table_close(rel, NoLock)'' intentional? That will keep > > the lock (lockmode) until end-of-transaction. > > I think we always keep any locks on relations that are involved in a > plan until end-of-transaction. What if a partition is changed in an > unsafe manner between being considered safe for parallel insertion and > actually performing the parallel insert? > > BTW, I just noticed that exctract_query_dependencies() runs on a > rewritten, but not-yet-planned query tree, that is, I didn't know that > extract_query_dependencies() only populates the CachedPlanSource's > relationOids and not CachedPlan's. The former is only for tracking > the dependencies of an unplanned Query, so partitions should never be > added to it. Instead, they should be added to > PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan), > which is kind of what your earlier patch did. Needless to say, > PlanCacheRelCallback checks both CachedPlanSource.relationOids and > PlannedStmt.relationOids, so if it receives a message about a > partition, its OID is matched from the latter. > > All that is to say that we should move our code to add partition OIDs > as plan dependencies to somewhere in set_plan_references(), which > otherwise populates PlannedStmt.relationOids. I updated the patch to > do that. OK, understood. Thanks for the detailed explanation. > It also occurred to me that we can avoid pointless adding of > partitions if the final plan won't use parallelism. For that, the > patch adds checking glob->parallelModeNeeded, which seems to do the > trick though I don't know if that's the correct way of doing that. > I'm not sure if's pointless adding partitions even in the case of NOT using parallelism, because we may be relying on the result of parallel-safety checks on partitions in both cases. For example, insert_parallel.sql currently includes a test (that you originally provided in a previous post) that checks a non-parallel plan is generated after a parallel-unsafe trigger is created on a partition involved in the INSERT. If I further add to that test by then dropping that trigger and then again using EXPLAIN to see what plan is generated, then I'd expect a parallel-plan to be generated, but with the setrefs-v3.patch it still generates a non-parallel plan. So I think the "&& glob->parallelModeNeeded" part of test needs to be removed. Regards, Greg Nancarrow Fujitsu Australia
Re: Parallel INSERT (INTO ... SELECT ...)
On Wed, Feb 17, 2021 at 10:44 AM Greg Nancarrow wrote: > On Wed, Feb 17, 2021 at 12:19 AM Amit Langote wrote: > > On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow wrote: > > > On Sat, Feb 13, 2021 at 12:17 AM Amit Langote > > > wrote: > > > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow > > > > wrote: > > > > > Actually, I tried adding the following in the loop that checks the > > > > > parallel-safety of each partition and it seemed to work: > > > > > > > > > > glob->relationOids = > > > > > lappend_oid(glob->relationOids, pdesc->oids[i]); > > > > > > > > > > Can you confirm, is that what you were referring to? > > > > > > > > Right. I had mistakenly mentioned PlannerGlobal.invalItems, sorry. > > > > > > > > Although it gets the job done, I'm not sure if manipulating > > > > relationOids from max_parallel_hazard() or its subroutines is okay, > > > > but I will let the committer decide that. As I mentioned above, the > > > > person who designed this decided for some reason that it is > > > > extract_query_dependencies()'s job to populate > > > > PlannerGlobal.relationOids/invalItems. > > > > > > Yes, it doesn't really seem right doing it within max_parallel_hazard(). > > > I tried doing it in extract_query_dependencies() instead - see > > > attached patch - and it seems to work, but I'm not sure if there might > > > be any unintended side-effects. > > > > One issue I see with the patch is that it fails to consider > > multi-level partitioning, because it's looking up partitions only in > > the target table's PartitionDesc and no other. > > > > @@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node, > > PlannerInfo *context) > > RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc); > > > > if (rte->rtekind == RTE_RELATION) > > - context->glob->relationOids = > > - lappend_oid(context->glob->relationOids, rte->relid); > > + { > > + PlannerGlobal *glob; > > + > > + glob = context->glob; > > + glob->relationOids = > > + lappend_oid(glob->relationOids, rte->relid); > > + if (query->commandType == CMD_INSERT && > > + rte->relkind == > > RELKIND_PARTITIONED_TABLE) > > > > The RTE whose relkind is being checked here may not be the INSERT > > target relation's RTE, even though that's perhaps always true today. > > So, I suggest to pull the new block out of the loop over rtable and > > perform its deeds on the result RTE explicitly fetched using > > rt_fetch(), preferably using a separate recursive function. I'm > > thinking something like the attached revised version. > > Thanks. Yes, I'd forgotten about the fact a partition may itself be > partitioned, so it needs to be recursive (like in the parallel-safety > checks). > Your revised version seems OK, though I do have a concern: > Is the use of "table_close(rel, NoLock)'' intentional? That will keep > the lock (lockmode) until end-of-transaction. I think we always keep any locks on relations that are involved in a plan until end-of-transaction. What if a partition is changed in an unsafe manner between being considered safe for parallel insertion and actually performing the parallel insert? BTW, I just noticed that exctract_query_dependencies() runs on a rewritten, but not-yet-planned query tree, that is, I didn't know that extract_query_dependencies() only populates the CachedPlanSource's relationOids and not CachedPlan's. The former is only for tracking the dependencies of an unplanned Query, so partitions should never be added to it. Instead, they should be added to PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan), which is kind of what your earlier patch did. Needless to say, PlanCacheRelCallback checks both CachedPlanSource.relationOids and PlannedStmt.relationOids, so if it receives a message about a partition, its OID is matched from the latter. All that is to say that we should move our code to add partition OIDs as plan dependencies to somewhere in set_plan_references(), which otherwise populates PlannedStmt.relationOids. I updated the patch to do that. It also occurred to me that we can avoid pointless adding of partitions if the final plan won't use parallelism. For that, the patch adds checking glob->parallelModeNeeded, which seems to do the trick though I don't know if that's the correct way of doing that. -- Amit Langote EDB: http://www.enterprisedb.com setrefs-v3.patch Description: Binary data