subject:"Parallel INSERT \(INTO ... SELECT ...\)"

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-31 Thread Amit Kapila

On Mon, Mar 22, 2021 at 3:57 PM Greg Nancarrow  wrote:
>
> On Mon, Mar 22, 2021 at 6:28 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > >
> > > Let me know if these changes seem OK to you.
> >
> > Yes, these changes look good to me.
>
> Posting an updated set of patches with these changes...
>

I have marked this as Returned with Feedback. There is a lot of work
to do for this patch as per the feedback given on pgsql-committers
[1].

[1] - 
https://www.postgresql.org/message-id/E1lMiB9-0001c3-SY%40gemulon.postgresql.org

-- 
With Regards,
Amit Kapila.

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-22 Thread houzj.f...@fujitsu.com

> > I noticed that some comments may need updated since we introduced
> parallel insert in this patch.
> >
> > 1) src/backend/executor/execMain.c
> >  * Don't allow writes in parallel mode.  Supporting UPDATE and
> DELETE
> >  * would require (a) storing the combocid hash in shared memory,
> rather
> >  * than synchronizing it just once at the start of parallelism, and 
> > (b) an
> >  * alternative to heap_update()'s reliance on xmax for mutual
> exclusion.
> >  * INSERT may have no such troubles, but we forbid it to simplify 
> > the
> >  * checks.
> >
> > As we will allow INSERT in parallel mode, we'd better change the comment
> here.
> >
> 
> Thanks, it does need to be updated for parallel INSERT.
> I was thinking of the following change:
> 
> - * Don't allow writes in parallel mode.  Supporting UPDATE and DELETE
> - * would require (a) storing the combocid hash in shared memory, rather
> - * than synchronizing it just once at the start of parallelism, and (b) 
> an
> - * alternative to heap_update()'s reliance on xmax for mutual exclusion.
> - * INSERT may have no such troubles, but we forbid it to simplify the
> - * checks.
> + * Except for INSERT, don't allow writes in parallel mode.  Supporting
> + * UPDATE and DELETE would require (a) storing the combocid hash in
> shared
> + * memory, rather than synchronizing it just once at the start of
> + * parallelism, and (b) an alternative to heap_update()'s reliance on 
> xmax
> + * for mutual exclusion.
> 
> 
> 
> > 2) src/backend/storage/lmgr/README
> > dangers are modest.  The leader and worker share the same transaction,
> > snapshot, and combo CID hash, and neither can perform any DDL or,
> > indeed, write any data at all.  Thus, for either to read a table
> > locked exclusively by
> >
> > The same as 1), parallel insert is the exception.
> >
> 
> I agree, it needs to be updated too, to account for parallel INSERT now being
> supported.
> 
> -write any data at all.  ...
> +write any data at all (with the exception of parallel insert).  ...
> 
> 
> > 3) src/backend/storage/lmgr/README
> > mutual exclusion method for such cases.  Currently, the parallel mode
> > is strictly read-only, but now we have the infrastructure to allow
> > parallel inserts and parallel copy.
> >
> > May be we can say:
> > +mutual exclusion method for such cases.  Currently, we only allowed
> > +parallel inserts, but we already have the infrastructure to allow parallel 
> > copy.
> >
> 
> Yes, agree, something like:
> 
> -mutual exclusion method for such cases.  Currently, the parallel mode is
> -strictly read-only, but now we have the infrastructure to allow parallel 
> -inserts
> and parallel copy.
> +mutual exclusion method for such cases.  Currently, only parallel
> +insert is allowed, but we have the infrastructure to allow parallel copy.
> 
> 
> Let me know if these changes seem OK to you.

Yes, these changes look good to me.

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-22 Thread Greg Nancarrow

On Mon, Mar 22, 2021 at 2:30 PM houzj.f...@fujitsu.com
 wrote:
>
> I noticed that some comments may need updated since we introduced parallel 
> insert in this patch.
>
> 1) src/backend/executor/execMain.c
>  * Don't allow writes in parallel mode.  Supporting UPDATE and DELETE
>  * would require (a) storing the combocid hash in shared memory, 
> rather
>  * than synchronizing it just once at the start of parallelism, and 
> (b) an
>  * alternative to heap_update()'s reliance on xmax for mutual 
> exclusion.
>  * INSERT may have no such troubles, but we forbid it to simplify the
>  * checks.
>
> As we will allow INSERT in parallel mode, we'd better change the comment here.
>

Thanks, it does need to be updated for parallel INSERT.
I was thinking of the following change:

- * Don't allow writes in parallel mode.  Supporting UPDATE and DELETE
- * would require (a) storing the combocid hash in shared memory, rather
- * than synchronizing it just once at the start of parallelism, and (b) an
- * alternative to heap_update()'s reliance on xmax for mutual exclusion.
- * INSERT may have no such troubles, but we forbid it to simplify the
- * checks.
+ * Except for INSERT, don't allow writes in parallel mode.  Supporting
+ * UPDATE and DELETE would require (a) storing the combocid hash in shared
+ * memory, rather than synchronizing it just once at the start of
+ * parallelism, and (b) an alternative to heap_update()'s reliance on xmax
+ * for mutual exclusion.



> 2) src/backend/storage/lmgr/README
> dangers are modest.  The leader and worker share the same transaction,
> snapshot, and combo CID hash, and neither can perform any DDL or, indeed,
> write any data at all.  Thus, for either to read a table locked exclusively by
>
> The same as 1), parallel insert is the exception.
>

I agree, it needs to be updated too, to account for parallel INSERT
now being supported.

-write any data at all.  ...
+write any data at all (with the exception of parallel insert).  ...


> 3) src/backend/storage/lmgr/README
> mutual exclusion method for such cases.  Currently, the parallel mode is
> strictly read-only, but now we have the infrastructure to allow parallel
> inserts and parallel copy.
>
> May be we can say:
> +mutual exclusion method for such cases.  Currently, we only allowed parallel
> +inserts, but we already have the infrastructure to allow parallel copy.
>

Yes, agree, something like:

-mutual exclusion method for such cases.  Currently, the parallel mode is
-strictly read-only, but now we have the infrastructure to allow parallel
-inserts and parallel copy.
+mutual exclusion method for such cases.  Currently, only parallel insert is
+allowed, but we have the infrastructure to allow parallel copy.


Let me know if these changes seem OK to you.

Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-21 Thread houzj.f...@fujitsu.com

> Since the "Parallel SELECT for INSERT" patch and related GUC/reloption patch 
> have been committed, I have now rebased and attached the rest of the original 
> patchset, 
> which includes:
>- Additional tests for "Parallel SELECT for INSERT"
>- Parallel INSERT functionality
>- Test and documentation updates for Parallel INSERT
Hi,

I noticed that some comments may need updated since we introduced parallel 
insert in this patch.

1) src/backend/executor/execMain.c
 * Don't allow writes in parallel mode.  Supporting UPDATE and DELETE
 * would require (a) storing the combocid hash in shared memory, rather
 * than synchronizing it just once at the start of parallelism, and (b) 
an
 * alternative to heap_update()'s reliance on xmax for mutual exclusion.
 * INSERT may have no such troubles, but we forbid it to simplify the
 * checks.

As we will allow INSERT in parallel mode, we'd better change the comment here.

2) src/backend/storage/lmgr/README
dangers are modest.  The leader and worker share the same transaction,
snapshot, and combo CID hash, and neither can perform any DDL or, indeed,
write any data at all.  Thus, for either to read a table locked exclusively by

The same as 1), parallel insert is the exception.

3) src/backend/storage/lmgr/README
mutual exclusion method for such cases.  Currently, the parallel mode is
strictly read-only, but now we have the infrastructure to allow parallel
inserts and parallel copy.

May be we can say:
+mutual exclusion method for such cases.  Currently, we only allowed parallel
+inserts, but we already have the infrastructure to allow parallel copy.


Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-18 Thread Amit Kapila

On Thu, Mar 18, 2021 at 9:04 AM houzj.f...@fujitsu.com
 wrote:
>
> > >  If a table parameter value is set and the
> > >  equivalent toast. parameter is not, the TOAST 
> > > table
> > >  will use the table's parameter value.
> > > -Specifying these parameters for partitioned tables is not supported,
> > > -but you may specify them for individual leaf partitions.
> > > +These parameters, with the exception of
> > > +parallel_insert_enabled,
> > > +are not supported on partitioned tables, but may be specified for
> > > +individual leaf partitions.
> > > 
> > >
> >
> > Your patch looks good to me. While checking this, I notice a typo in the
> > previous patch:
> > -  planner parameter parallel_workers.
> > +  planner parameter parallel_workers and
> > +  parallel_insert_enabled.
> >
> > Here, it should be /planner parameter/planner parameters.
>
> Thanks amit and justin for pointing this out !
> The changes looks good to me.
>

Pushed!

-- 
With Regards,
Amit Kapila.

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-17 Thread houzj.f...@fujitsu.com

> >  If a table parameter value is set and the
> >  equivalent toast. parameter is not, the TOAST table
> >  will use the table's parameter value.
> > -Specifying these parameters for partitioned tables is not supported,
> > -but you may specify them for individual leaf partitions.
> > +These parameters, with the exception of
> > +parallel_insert_enabled,
> > +are not supported on partitioned tables, but may be specified for
> > +individual leaf partitions.
> > 
> >
> 
> Your patch looks good to me. While checking this, I notice a typo in the
> previous patch:
> -  planner parameter parallel_workers.
> +  planner parameter parallel_workers and
> +  parallel_insert_enabled.
> 
> Here, it should be /planner parameter/planner parameters.

Thanks amit and justin for pointing this out !
The changes looks good to me.

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-17 Thread Amit Kapila

On Thu, Mar 18, 2021 at 8:30 AM Justin Pryzby  wrote:
>
> diff --git a/doc/src/sgml/ref/create_table.sgml 
> b/doc/src/sgml/ref/create_table.sgml
> index ff1b642722..d5d356f2de 100644
> --- a/doc/src/sgml/ref/create_table.sgml
> +++ b/doc/src/sgml/ref/create_table.sgml
> @@ -1338,8 +1338,10 @@ WITH ( MODULUS  class="parameter">numeric_literal, REM
>  If a table parameter value is set and the
>  equivalent toast. parameter is not, the TOAST table
>  will use the table's parameter value.
> -Specifying these parameters for partitioned tables is not supported,
> -but you may specify them for individual leaf partitions.
> +These parameters, with the exception of
> +parallel_insert_enabled,
> +are not supported on partitioned tables, but may be specified for
> +individual leaf partitions.
> 
>

Your patch looks good to me. While checking this, I notice a typo in
the previous patch:
-  planner parameter parallel_workers.
+  planner parameter parallel_workers and
+  parallel_insert_enabled.

Here, it should be /planner parameter/planner parameters.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-17 Thread Justin Pryzby

diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index ff1b642722..d5d356f2de 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1338,8 +1338,10 @@ WITH ( MODULUS numeric_literal, REM
 If a table parameter value is set and the
 equivalent toast. parameter is not, the TOAST table
 will use the table's parameter value.
-Specifying these parameters for partitioned tables is not supported,
-but you may specify them for individual leaf partitions.
+These parameters, with the exception of
+parallel_insert_enabled,
+are not supported on partitioned tables, but may be specified for
+individual leaf partitions.

 

>From 55c7326e56f9b49710a564e91c46212a17f12b24 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 17 Mar 2021 21:47:27 -0500
Subject: [PATCH] Fix for parallel_insert_enabled

---
 doc/src/sgml/ref/create_table.sgml | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index ff1b642722..d5d356f2de 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1338,8 +1338,10 @@ WITH ( MODULUS numeric_literal, REM
 If a table parameter value is set and the
 equivalent toast. parameter is not, the TOAST table
 will use the table's parameter value.
-Specifying these parameters for partitioned tables is not supported,
-but you may specify them for individual leaf partitions.
+These parameters, with the exception of
+parallel_insert_enabled,
+are not supported on partitioned tables, but may be specified for
+individual leaf partitions.

 

-- 
2.17.0

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-17 Thread Justin Pryzby

On Fri, Mar 12, 2021 at 04:05:09PM +0900, Amit Langote wrote:
> On Fri, Mar 12, 2021 at 6:10 AM Justin Pryzby  wrote:
> > Note also this CF entry
> > https://commitfest.postgresql.org/32/2987/
> > | Allow setting parallel_workers on partitioned tables

I'm rebasing that other patch on top of master (with this patch), and I noticed
that it adds this bit, and this patch should have done something similar:

--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1340,4 +1340,5 @@ WITH ( MODULUS numeric_literal, REM
 will use the table's parameter value.
-Specifying these parameters for partitioned tables is not supported,
-but you may specify them for individual leaf partitions.
+These parameters, with the exception of 
parallel_workers,
+are not supported on partitioned tables, but you may specify them for
+individual leaf partitions.


-- 
Justin

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-15 Thread houzj.f...@fujitsu.com

> > Attaching new version patch with this change.
> >
> 
> Thanks, the patch looks good to me. I have made some minor cosmetic
> changes in the attached. I am planning to push this by tomorrow unless you or
> others have any more comments or suggestions for this patch.

Thanks for the review.
I have no more comments.
BTW, I have done the final test on the patch and all of the tests passed.

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-15 Thread Amit Kapila

On Mon, Mar 15, 2021 at 11:34 AM Justin Pryzby  wrote:
>
> On Mon, Mar 15, 2021 at 11:25:26AM +0530, Amit Kapila wrote:
> > On Fri, Mar 12, 2021 at 3:01 PM houzj.f...@fujitsu.com wrote:
> > >
> > > Attaching new version patch with this change.
> >
> > Thanks, the patch looks good to me. I have made some minor cosmetic
> > changes in the attached. I am planning to push this by tomorrow unless
> > you or others have any more comments or suggestions for this patch.
>
> I still wonder if it should be possible for the GUC to be off, and then
> selectively enable parallel inserts on specific tables.
>

I think that could be inconvenient for users because in most tables
such a restriction won't need to be applied.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-15 Thread Justin Pryzby

On Mon, Mar 15, 2021 at 11:25:26AM +0530, Amit Kapila wrote:
> On Fri, Mar 12, 2021 at 3:01 PM houzj.f...@fujitsu.com wrote:
> >
> > Attaching new version patch with this change.
> 
> Thanks, the patch looks good to me. I have made some minor cosmetic
> changes in the attached. I am planning to push this by tomorrow unless
> you or others have any more comments or suggestions for this patch.

I still wonder if it should be possible for the GUC to be off, and then
selectively enable parallel inserts on specific tables.  In that case, the GUC
should be called something other than enable_*, or maybe it should be an enum
with values like "off" and "allowed"/enabled/defer/???

-- 
Justin

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-14 Thread Amit Kapila

On Fri, Mar 12, 2021 at 3:01 PM houzj.f...@fujitsu.com
 wrote:
>
> Attaching new version patch with this change.
>

Thanks, the patch looks good to me. I have made some minor cosmetic
changes in the attached. I am planning to push this by tomorrow unless
you or others have any more comments or suggestions for this patch.

-- 
With Regards,
Amit Kapila.

v28-0001-Add-a-new-GUC-and-a-reloption-to-enable-inserts-.patch
Description: Binary data

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-12 Thread Amit Kapila

On Sat, Mar 13, 2021 at 10:08 AM Tom Lane  wrote:
>
> Greg Nancarrow  writes:
> > On Fri, Mar 12, 2021 at 5:00 AM Tom Lane  wrote:
> >> BTW, having special logic for FK triggers in
> >> target_rel_trigger_max_parallel_hazard seems quite loony to me.
> >> Why isn't that handled by setting appropriate proparallel values
> >> for those trigger functions?
>
> > ... and also attached a patch to update the code for this issue.
>
> Hm, what I was expecting to see is that RI_FKey_check_ins and
> RI_FKey_check_upd get marked as proparallel = 'r' and the remainder
> get marked as proparallel = 'u'.
>

oh, I think Greg's patch has just marked functions for which the
current code has parallel-safety checks and I also thought that would
be sufficient.

>  Remember that the default for
> builtin functions is proparallel = 's', but surely that's wrong
> for triggers that can propagate updates to other tables?
>

Okay, probably the others can be marked as unsafe. I think we have not
considered others except for FK-related triggers which we knew are
hazardous for enabling inserts in parallel-mode. The others seem to be
related to update/delete, so we have not done anything, but maybe it
is better to mark them as 'unsafe' now, and later when we support the
update/delete in parallel-mode, we can see if any of those can be
executed in parallel-mode. But OTOH, we can keep them as it is if they
don't impact the current operation we have supported in parallel-mode.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-12 Thread Tom Lane

Greg Nancarrow  writes:
> On Fri, Mar 12, 2021 at 5:00 AM Tom Lane  wrote:
>> BTW, having special logic for FK triggers in
>> target_rel_trigger_max_parallel_hazard seems quite loony to me.
>> Why isn't that handled by setting appropriate proparallel values
>> for those trigger functions?

> ... and also attached a patch to update the code for this issue.

Hm, what I was expecting to see is that RI_FKey_check_ins and
RI_FKey_check_upd get marked as proparallel = 'r' and the remainder
get marked as proparallel = 'u'.  Remember that the default for
builtin functions is proparallel = 's', but surely that's wrong
for triggers that can propagate updates to other tables?

regards, tom lane

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-12 Thread Amit Kapila

On Fri, Mar 12, 2021 at 9:33 AM houzj.f...@fujitsu.com
 wrote:
>
> > On Thu, Mar 11, 2021 at 01:01:42PM +, houzj.f...@fujitsu.com wrote:
> > > > I guess to have the finer granularity we'd have to go with
> > > > enable_parallel_insert, which then would mean possibly having to
> > > > later add enable_parallel_update, should parallel update have
> > > > similar potential overhead in the parallel-safety checks (which to
> > > > me, looks like it could, and parallel delete may not ...)
> > > >
> > > > It's a shame there is no "set" type for GUC options.
> > > > e.g.
> > > > enable_parallel_dml='insert,update'
> > > > Maybe that's going too far.
> >
> > Isn't that just GUC_LIST_INPUT ?
> > I'm not sure why it'd be going to far ?
> >
> > The GUC-setting assign hook can parse the enable_parallel_dml_list value set
> > by the user, and set an internal int/bits enable_parallel_dml variable with 
> > some
> > define/enum values like:
> >
> > GUC_PARALLEL_DML_INSERT 0x01
> > GUC_PARALLEL_DML_DELETE 0x02
> > GUC_PARALLEL_DML_UPDATE 0x04
> >
>
> I think this ideas works, but we still need to consider about the reloption.
> After looking into the reloption, I think postgres do not have a list-like 
> type for reloption.
> And I think it's better that the guc and reloption is consistent.
>

I also think it is better to be consistent here.

> Besides, a list type guc option that only support one valid value 'insert' 
> seems a little weird to me(we only support parallel insert for now).
>
> So, I tend to keep the current style of guc option.
>

+1. I feel at this stage it might not be prudent to predict the
overhead for parallel updates or deletes especially when there doesn't
appear to be an easy way to provide a futuristic guc/reloption and we
don't have any patch on the table which can prove or disprove that
theory. The only thing that we can see that even if parallel
updates/deletes have overhead, it might not be due to similar reasons.
Also, I guess the parallel-copy might need somewhat similar
parallel-safety checking w.r.t partitioned tables and I feel the
current proposed guc/reloption can be used for the same as it is quite
a similar operation.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-12 Thread Amit Kapila

On Fri, Mar 12, 2021 at 1:33 PM houzj.f...@fujitsu.com
 wrote:
>
> > > The problem is that target_rel_trigger_max_parallel_hazard and its
> > > caller think they can use a relcache TriggerDesc field across other
> > > cache accesses, which they can't because the relcache doesn't
> > > guarantee that that won't move.
> > >
> > > One approach would be to add logic to RelationClearRelation similar to
> > > what it does for tupdescs, rules, etc, to avoid moving them when their
> > > contents haven't changed.  But given that we've not needed that for
> > > the past several decades, I'm disinclined to add the overhead.  I
> > > think this code ought to be adjusted to not make its own copy of the
> > > trigdesc pointer, but instead fetch it out of the relcache struct each
> > > time it is accessed.  There's no real reason why
> > > target_rel_trigger_max_parallel_hazard shouldn't be passed the
> > > (stable) Relation pointer instead of just the trigdesc pointer.
> > >
> >
> > I have attached a patch to fix the issue, based on your suggestion (tested 
> > with
> > CLOBBER_CACHE_ALWAYS defined).
> >
> > > BTW, having special logic for FK triggers in
> > > target_rel_trigger_max_parallel_hazard seems quite loony to me.
> > > Why isn't that handled by setting appropriate proparallel values for
> > > those trigger functions?
> > >
> >
> > ... and also attached a patch to update the code for this issue.
> >
> > (2nd patch relies on application of the 1st patch)
> >
> > Thanks again for pointing out these problems.
>
> I have tested the triggerdesc bugfix patch with CLOBBER_CACHE_ALWAYS flag.
> It passed the testset where is fail in buildfarm (foreign_key, foreign_data).
>

Thanks for the patch and review. It looks good to me as well and
passes the tests (foreign_key, foreign_data) with CLOBBER_CACHE_ALWAYS
flag.

I'll review the second patch of Greg.

-- 
With Regards,
Amit Kapila.

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-12 Thread houzj.f...@fujitsu.com

> On Fri, Mar 12, 2021 at 6:10 AM Justin Pryzby <mailto:pry...@telsasoft.com> 
> wrote:
> > Note also this CF entry
> > https://commitfest.postgresql.org/32/2987/
> > | Allow setting parallel_workers on partitioned tables
> 
> +/*
> + * PartitionedOptions
> + * Contents of rd_options for partitioned tables
> + */
> +typedef struct PartitionedOptions
> +{
> +   int32   vl_len_;/* varlena header (do not touch directly!) */
> +   boolparallel_insert_enabled;/* enables planner's use
> of parallel insert */
> +} PartitionedOptions;
> 
> houzj, could you please consider naming the struct PartitionedTableRdOptions
> as the patch for adding parallel_workers option to partitioned tables does?

Thanks for reminding.
I agreed that " PartitionedTableRdOptions " is better.

Attaching new version patch with this change.

Best regards,
houzj


v27-0003-Parallel-SELECT-for-INSERT-INTO-.-SELECT-advanced-tests.patch
Description:  v27-0003-Parallel-SELECT-for-INSERT-INTO-.-SELECT-advanced-tests.patch


v27-0002-Add-new-GUC-option-enable_parallel_insert-boolean.patch
Description:  v27-0002-Add-new-GUC-option-enable_parallel_insert-boolean.patch

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-12 Thread houzj.f...@fujitsu.com

> > The problem is that target_rel_trigger_max_parallel_hazard and its
> > caller think they can use a relcache TriggerDesc field across other
> > cache accesses, which they can't because the relcache doesn't
> > guarantee that that won't move.
> >
> > One approach would be to add logic to RelationClearRelation similar to
> > what it does for tupdescs, rules, etc, to avoid moving them when their
> > contents haven't changed.  But given that we've not needed that for
> > the past several decades, I'm disinclined to add the overhead.  I
> > think this code ought to be adjusted to not make its own copy of the
> > trigdesc pointer, but instead fetch it out of the relcache struct each
> > time it is accessed.  There's no real reason why
> > target_rel_trigger_max_parallel_hazard shouldn't be passed the
> > (stable) Relation pointer instead of just the trigdesc pointer.
> >
> 
> I have attached a patch to fix the issue, based on your suggestion (tested 
> with
> CLOBBER_CACHE_ALWAYS defined).
> 
> > BTW, having special logic for FK triggers in
> > target_rel_trigger_max_parallel_hazard seems quite loony to me.
> > Why isn't that handled by setting appropriate proparallel values for
> > those trigger functions?
> >
> 
> ... and also attached a patch to update the code for this issue.
> 
> (2nd patch relies on application of the 1st patch)
> 
> Thanks again for pointing out these problems.

I have tested the triggerdesc bugfix patch with CLOBBER_CACHE_ALWAYS flag.
It passed the testset where is fail in buildfarm (foreign_key, foreign_data).

And the patch LGTM.

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-11 Thread Greg Nancarrow

On Fri, Mar 12, 2021 at 5:00 AM Tom Lane  wrote:
>
>
> The problem is that target_rel_trigger_max_parallel_hazard and its caller
> think they can use a relcache TriggerDesc field across other cache
> accesses, which they can't because the relcache doesn't guarantee that
> that won't move.
>
> One approach would be to add logic to RelationClearRelation similar to
> what it does for tupdescs, rules, etc, to avoid moving them when their
> contents haven't changed.  But given that we've not needed that for the
> past several decades, I'm disinclined to add the overhead.  I think this
> code ought to be adjusted to not make its own copy of the trigdesc
> pointer, but instead fetch it out of the relcache struct each time it is
> accessed.  There's no real reason why
> target_rel_trigger_max_parallel_hazard shouldn't be passed the (stable)
> Relation pointer instead of just the trigdesc pointer.
>

I have attached a patch to fix the issue, based on your suggestion
(tested with CLOBBER_CACHE_ALWAYS defined).

> BTW, having special logic for FK triggers in
> target_rel_trigger_max_parallel_hazard seems quite loony to me.
> Why isn't that handled by setting appropriate proparallel values
> for those trigger functions?
>

... and also attached a patch to update the code for this issue.

(2nd patch relies on application of the 1st patch)

Thanks again for pointing out these problems.

Regards,
Greg Nancarrow
Fujitsu Australia


v1-0001-Fix-TriggerDesc-relcache-bug-introduced-by-recent-commit.patch
Description: Binary data


v1-0001-Better-implement-FK-trigger-parallel-safety-checking.patch
Description: Binary data

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-11 Thread Amit Langote

On Fri, Mar 12, 2021 at 6:10 AM Justin Pryzby  wrote:
> Note also this CF entry
> https://commitfest.postgresql.org/32/2987/
> | Allow setting parallel_workers on partitioned tables

+/*
+ * PartitionedOptions
+ * Contents of rd_options for partitioned tables
+ */
+typedef struct PartitionedOptions
+{
+   int32   vl_len_;/* varlena header (do not touch directly!) */
+   boolparallel_insert_enabled;/* enables planner's use
of parallel insert */
+} PartitionedOptions;

houzj, could you please consider naming the struct
PartitionedTableRdOptions as the patch for adding parallel_workers
option to partitioned tables does?

-- 
Amit Langote
EDB: http://www.enterprisedb.com

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-11 Thread houzj.f...@fujitsu.com

> On Thu, Mar 11, 2021 at 01:01:42PM +, houzj.f...@fujitsu.com wrote:
> > > I guess to have the finer granularity we'd have to go with
> > > enable_parallel_insert, which then would mean possibly having to
> > > later add enable_parallel_update, should parallel update have
> > > similar potential overhead in the parallel-safety checks (which to
> > > me, looks like it could, and parallel delete may not ...)
> > >
> > > It's a shame there is no "set" type for GUC options.
> > > e.g.
> > > enable_parallel_dml='insert,update'
> > > Maybe that's going too far.
> 
> Isn't that just GUC_LIST_INPUT ?
> I'm not sure why it'd be going to far ?
> 
> The GUC-setting assign hook can parse the enable_parallel_dml_list value set
> by the user, and set an internal int/bits enable_parallel_dml variable with 
> some
> define/enum values like:
> 
> GUC_PARALLEL_DML_INSERT 0x01
> GUC_PARALLEL_DML_DELETE 0x02
> GUC_PARALLEL_DML_UPDATE 0x04
> 
> The namespace.c assign hook is a good prototype for this.  The parsed,
> integer GUC can probably be a static variable in clauses.c.
> 
> Then, the planner can check if:
> |commandType == CMD_INSERT &&
> |   (enable_parallel_dml & GUC_PARALLEL_DML_INSERT) != 0
> [...]
> 
> +  this table. When enabled (and provided that
> +   is also
> + true),

I think this ideas works, but we still need to consider about the reloption.
After looking into the reloption, I think postgres do not have a list-like type 
for reloption.
And I think it's better that the guc and reloption is consistent.

Besides, a list type guc option that only support one valid value 'insert' 
seems a little weird to me(we only support parallel insert for now).

So, I tend to keep the current style of guc option. 
If we turn out that we do need same option to restrict update/delete, we can 
improve this in the future
What do you think ?

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-11 Thread Greg Nancarrow

On Fri, Mar 12, 2021 at 5:00 AM Tom Lane  wrote:
>
> The buildfarm has revealed that this patch doesn't work under
> CLOBBER_CACHE_ALWAYS:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=husky=2021-03-10%2021%3A09%3A32
>
> I initially thought that that was a problem with c3ffe34863, but after
> reproducing it here I get this stack trace:
>
> #0  target_rel_trigger_max_parallel_hazard (context=0x7fff9cbfc220,
> trigdesc=0x7fdd7dfa9718) at clauses.c:971
> #1  target_rel_max_parallel_hazard_recurse (command_type=CMD_INSERT,
> context=0x7fff9cbfc220, rel=0x7fdd7df819e0) at clauses.c:929
> #2  target_rel_max_parallel_hazard_recurse (rel=0x7fdd7df819e0,
> command_type=, context=0x7fff9cbfc220) at clauses.c:893
> #3  0x0075d26e in target_rel_max_parallel_hazard (
> context=0x7fff9cbfc220) at clauses.c:884
> #4  max_parallel_hazard_walker (node=node@entry=0x146e590,
> context=context@entry=0x7fff9cbfc220) at clauses.c:831
> #5  0x00700812 in range_table_entry_walker (rte=0x146e590,
> walker=0x75cf00 , context=0x7fff9cbfc220,
> flags=16) at nodeFuncs.c:2467
> #6  0x00700927 in range_table_walker (rtable=0x11fdd70,
> walker=walker@entry=0x75cf00 ,
> context=context@entry=0x7fff9cbfc220, flags=16) at nodeFuncs.c:2446
> #7  0x00700ada in query_tree_walker (flags=,
> context=0x7fff9cbfc220, walker=0x75cf00 ,
> query=0x11fdc58) at nodeFuncs.c:2423
> #8  query_tree_walker (query=query@entry=0x700927 ,
> walker=walker@entry=0x75cf00 ,
> context=context@entry=0x11fdc58, flags=) at 
> nodeFuncs.c:2336
> #9  0x0075d138 in max_parallel_hazard_walker (
> node=node@entry=0x11fdc58, context=0x11fdc58, 
> context@entry=0x7fff9cbfc220)
> at clauses.c:853
> #10 0x0075dc98 in max_parallel_hazard (parse=parse@entry=0x11fdc58,
> glob=glob@entry=0x11fdb40) at clauses.c:585
> #11 0x0074cd22 in standard_planner (parse=0x11fdc58,
> query_string=, cursorOptions=256,
> boundParams=) at planner.c:345
> #12 0x00814947 in pg_plan_query (querytree=0x11fdc58,
> query_string=0x11fc740 "insert into fk_notpartitioned_pk (a, b)\n  select 
> 2048, x from generate_series(1,10) x;", cursorOptions=256, boundParams=0x0)
> at postgres.c:809
> #13 0x00814a43 in pg_plan_queries (querytrees=0x14725d0,
> query_string=query_string@entry=0x11fc740 "insert into 
> fk_notpartitioned_pk (a, b)\n  select 2048, x from generate_series(1,10) x;",
> cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0)
> at postgres.c:900
> #14 0x00814d35 in exec_simple_query (
> query_string=0x11fc740 "insert into fk_notpartitioned_pk (a, b)\n  select 
> 2048, x from generate_series(1,10) x;") at postgres.c:1092
>
> The problem is that target_rel_trigger_max_parallel_hazard and its caller
> think they can use a relcache TriggerDesc field across other cache
> accesses, which they can't because the relcache doesn't guarantee that
> that won't move.
>
> One approach would be to add logic to RelationClearRelation similar to
> what it does for tupdescs, rules, etc, to avoid moving them when their
> contents haven't changed.  But given that we've not needed that for the
> past several decades, I'm disinclined to add the overhead.  I think this
> code ought to be adjusted to not make its own copy of the trigdesc
> pointer, but instead fetch it out of the relcache struct each time it is
> accessed.  There's no real reason why
> target_rel_trigger_max_parallel_hazard shouldn't be passed the (stable)
> Relation pointer instead of just the trigdesc pointer.
>
> BTW, having special logic for FK triggers in
> target_rel_trigger_max_parallel_hazard seems quite loony to me.
> Why isn't that handled by setting appropriate proparallel values
> for those trigger functions?
>

Thanks Tom for your investigation, detailed analysis and suggested code fixes.
Will work on getting these issues corrected ASAP (and, er, removing
the looniness ...).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-11 Thread Justin Pryzby

On Thu, Mar 11, 2021 at 01:01:42PM +, houzj.f...@fujitsu.com wrote:
> > I guess to have the finer granularity we'd have to go with 
> > enable_parallel_insert,
> > which then would mean possibly having to later add enable_parallel_update,
> > should parallel update have similar potential overhead in the 
> > parallel-safety
> > checks (which to me, looks like it could, and parallel delete may not ...)
> > 
> > It's a shame there is no "set" type for GUC options.
> > e.g.
> > enable_parallel_dml='insert,update'
> > Maybe that's going too far.

Isn't that just GUC_LIST_INPUT ?
I'm not sure why it'd be going to far ?

The GUC-setting assign hook can parse the enable_parallel_dml_list value set by
the user, and set an internal int/bits enable_parallel_dml variable with some
define/enum values like:

GUC_PARALLEL_DML_INSERT 0x01
GUC_PARALLEL_DML_DELETE 0x02
GUC_PARALLEL_DML_UPDATE 0x04

The namespace.c assign hook is a good prototype for this.  The parsed, integer
GUC can probably be a static variable in clauses.c.

Then, the planner can check if:
|commandType == CMD_INSERT &&
|   (enable_parallel_dml & GUC_PARALLEL_DML_INSERT) != 0
[...]

+  this table. When enabled (and provided that
+   is also 
true),

It seems like this usefully allows the GUC to be enabled, and reloption to be
disabled.  But if the GUC is disabled, then it's impossible to enable for a
single table.  That seems unfortunate.  I think part of the issue is the
naming.  If the GUC is called "enable_*", then setting it to "off" should
disable it entirely, for consistency with other GUCs.  So maybe it needs
another name, like parallel_dml='insert'.  I think maybe "all" should be
an accepted value.

Note also this CF entry
https://commitfest.postgresql.org/32/2987/
| Allow setting parallel_workers on partitioned tables

-- 
Justin

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-11 Thread Tom Lane

The buildfarm has revealed that this patch doesn't work under
CLOBBER_CACHE_ALWAYS:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=husky=2021-03-10%2021%3A09%3A32

I initially thought that that was a problem with c3ffe34863, but after
reproducing it here I get this stack trace:

#0  target_rel_trigger_max_parallel_hazard (context=0x7fff9cbfc220, 
trigdesc=0x7fdd7dfa9718) at clauses.c:971
#1  target_rel_max_parallel_hazard_recurse (command_type=CMD_INSERT, 
context=0x7fff9cbfc220, rel=0x7fdd7df819e0) at clauses.c:929
#2  target_rel_max_parallel_hazard_recurse (rel=0x7fdd7df819e0, 
command_type=, context=0x7fff9cbfc220) at clauses.c:893
#3  0x0075d26e in target_rel_max_parallel_hazard (
context=0x7fff9cbfc220) at clauses.c:884
#4  max_parallel_hazard_walker (node=node@entry=0x146e590, 
context=context@entry=0x7fff9cbfc220) at clauses.c:831
#5  0x00700812 in range_table_entry_walker (rte=0x146e590, 
walker=0x75cf00 , context=0x7fff9cbfc220, 
flags=16) at nodeFuncs.c:2467
#6  0x00700927 in range_table_walker (rtable=0x11fdd70, 
walker=walker@entry=0x75cf00 , 
context=context@entry=0x7fff9cbfc220, flags=16) at nodeFuncs.c:2446
#7  0x00700ada in query_tree_walker (flags=, 
context=0x7fff9cbfc220, walker=0x75cf00 , 
query=0x11fdc58) at nodeFuncs.c:2423
#8  query_tree_walker (query=query@entry=0x700927 , 
walker=walker@entry=0x75cf00 , 
context=context@entry=0x11fdc58, flags=) at nodeFuncs.c:2336
#9  0x0075d138 in max_parallel_hazard_walker (
node=node@entry=0x11fdc58, context=0x11fdc58, context@entry=0x7fff9cbfc220)
at clauses.c:853
#10 0x0075dc98 in max_parallel_hazard (parse=parse@entry=0x11fdc58, 
glob=glob@entry=0x11fdb40) at clauses.c:585
#11 0x0074cd22 in standard_planner (parse=0x11fdc58, 
query_string=, cursorOptions=256, 
boundParams=) at planner.c:345
#12 0x00814947 in pg_plan_query (querytree=0x11fdc58, 
query_string=0x11fc740 "insert into fk_notpartitioned_pk (a, b)\n  select 
2048, x from generate_series(1,10) x;", cursorOptions=256, boundParams=0x0)
at postgres.c:809
#13 0x00814a43 in pg_plan_queries (querytrees=0x14725d0, 
query_string=query_string@entry=0x11fc740 "insert into fk_notpartitioned_pk 
(a, b)\n  select 2048, x from generate_series(1,10) x;", 
cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0)
at postgres.c:900
#14 0x00814d35 in exec_simple_query (
query_string=0x11fc740 "insert into fk_notpartitioned_pk (a, b)\n  select 
2048, x from generate_series(1,10) x;") at postgres.c:1092

The problem is that target_rel_trigger_max_parallel_hazard and its caller
think they can use a relcache TriggerDesc field across other cache
accesses, which they can't because the relcache doesn't guarantee that
that won't move.

One approach would be to add logic to RelationClearRelation similar to
what it does for tupdescs, rules, etc, to avoid moving them when their
contents haven't changed.  But given that we've not needed that for the
past several decades, I'm disinclined to add the overhead.  I think this
code ought to be adjusted to not make its own copy of the trigdesc
pointer, but instead fetch it out of the relcache struct each time it is
accessed.  There's no real reason why
target_rel_trigger_max_parallel_hazard shouldn't be passed the (stable)
Relation pointer instead of just the trigdesc pointer.

BTW, having special logic for FK triggers in
target_rel_trigger_max_parallel_hazard seems quite loony to me.
Why isn't that handled by setting appropriate proparallel values
for those trigger functions?

regards, tom lane

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-11 Thread houzj.f...@fujitsu.com

> I guess to have the finer granularity we'd have to go with 
> enable_parallel_insert,
> which then would mean possibly having to later add enable_parallel_update,
> should parallel update have similar potential overhead in the parallel-safety
> checks (which to me, looks like it could, and parallel delete may not ...)
> 
> It's a shame there is no "set" type for GUC options.
> e.g.
> enable_parallel_dml='insert,update'
> Maybe that's going too far.
> 
> > 2. Should we keep the default value of GUC to on or off? It is
> > currently off. I am fine keeping it off for this release and we can
> > always turn it on in the later releases if required. Having said that,
> > I see the value in keeping it on because in many cases Insert ...
> > Select will be used for large data and there we will see a benefit of
> > parallelism and users facing trouble (who have a very large number of
> > partitions with less data to query) can still disable the parallel
> > insert for that particular table. Also, the other benefit of keeping
> > it on till at least the beta period is that this functionality will
> > get tested and if we found reports of regression then we can turn it
> > off for this release as well.
> >
> 
> I'd agree to keeping it on by default (and it can be selectively turned off 
> for a
> particular table using the reloption, if needed).

Thanks, attaching new version patch keeping the default value of guc option to 
ON.

Best regards,
houzj


v26-0003-Parallel-SELECT-for-INSERT-INTO-.-SELECT-advanced-tests.patch
Description:  v26-0003-Parallel-SELECT-for-INSERT-INTO-.-SELECT-advanced-tests.patch


v26-0002-Add-new-GUC-option-enable_parallel_insert-boolean.patch
Description:  v26-0002-Add-new-GUC-option-enable_parallel_insert-boolean.patch

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-11 Thread Greg Nancarrow

On Wed, Mar 10, 2021 at 8:18 PM Amit Kapila  wrote:
>
> Now, coming back to Hou-San's patch to introduce a GUC and reloption
> for this feature, I think both of those make sense to me because when
> the feature is enabled via GUC, one might want to disable it for
> partitioned tables? Do we agree on that part or someone thinks
> otherwise?
>

Having both makes sense to me.

> The other points to bikeshed could be:
> 1. The name of GUC and reloption. The two proposals at hand are
> enable_parallel_dml and enable_parallel_insert. I would prefer the
> second (enable_parallel_insert) because updates/deletes might not have
> a similar overhead.
>

I guess to have the finer granularity we'd have to go with
enable_parallel_insert, which then would mean possibly having to later
add enable_parallel_update, should parallel update have similar
potential overhead in the parallel-safety checks (which to me, looks
like it could, and parallel delete may not ...)

It's a shame there is no "set" type for GUC options.
e.g.
enable_parallel_dml='insert,update'
Maybe that's going too far.

> 2. Should we keep the default value of GUC to on or off? It is
> currently off. I am fine keeping it off for this release and we can
> always turn it on in the later releases if required. Having said that,
> I see the value in keeping it on because in many cases Insert ...
> Select will be used for large data and there we will see a benefit of
> parallelism and users facing trouble (who have a very large number of
> partitions with less data to query) can still disable the parallel
> insert for that particular table. Also, the other benefit of keeping
> it on till at least the beta period is that this functionality will
> get tested and if we found reports of regression then we can turn it
> off for this release as well.
>

I'd agree to keeping it on by default (and it can be selectively
turned off for a particular table using the reloption, if needed).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-10 Thread Amit Langote

On Wed, Mar 10, 2021 at 6:18 PM Amit Kapila  wrote:
> On Mon, Mar 8, 2021 at 7:19 PM Amit Langote  wrote:
> > I just read through v25 and didn't find anything to complain about.
>
> Thanks a lot, pushed now! Amit L., your inputs are valuable for this work.

Glad I could be of help.  Really appreciate that you credited me as
author for whatever small part I contributed.

> Now, coming back to Hou-San's patch to introduce a GUC and reloption
> for this feature, I think both of those make sense to me because when
> the feature is enabled via GUC, one might want to disable it for
> partitioned tables? Do we agree on that part or someone thinks
> otherwise?

Agree to have both.

> The other points to bikeshed could be:
> 1. The name of GUC and reloption. The two proposals at hand are
> enable_parallel_dml and enable_parallel_insert. I would prefer the
> second (enable_parallel_insert) because updates/deletes might not have
> a similar overhead.

Sounds reasonable.

> 2. Should we keep the default value of GUC to on or off? It is
> currently off. I am fine keeping it off for this release and we can
> always turn it on in the later releases if required. Having said that,
> I see the value in keeping it on because in many cases Insert ...
> Select will be used for large data and there we will see a benefit of
> parallelism and users facing trouble (who have a very large number of
> partitions with less data to query) can still disable the parallel
> insert for that particular table. Also, the other benefit of keeping
> it on till at least the beta period is that this functionality will
> get tested and if we found reports of regression then we can turn it
> off for this release as well.

This makes sense too.

--
Amit Langote
EDB: http://www.enterprisedb.com

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-10 Thread tsunakawa.ta...@fujitsu.com

From: Amit Kapila 
> Now, coming back to Hou-San's patch to introduce a GUC and reloption
> for this feature, I think both of those make sense to me because when
> the feature is enabled via GUC, one might want to disable it for
> partitioned tables? Do we agree on that part or someone thinks
> otherwise?
> 
> The other points to bikeshed could be:
> 1. The name of GUC and reloption. The two proposals at hand are
> enable_parallel_dml and enable_parallel_insert. I would prefer the
> second (enable_parallel_insert) because updates/deletes might not have
> a similar overhead.
> 
> 2. Should we keep the default value of GUC to on or off? It is
> currently off. I am fine keeping it off for this release and we can
> always turn it on in the later releases if required. Having said that,
> I see the value in keeping it on because in many cases Insert ...
> Select will be used for large data and there we will see a benefit of
> parallelism and users facing trouble (who have a very large number of
> partitions with less data to query) can still disable the parallel
> insert for that particular table. Also, the other benefit of keeping
> it on till at least the beta period is that this functionality will
> get tested and if we found reports of regression then we can turn it
> off for this release as well.
> 
> Thoughts?


TBH, I'm a bit unsure, but the names with _insert would be OK.

The reason why Oracle has ENABLE PARALLEL DML clause and the parallel DML is 
disabled by default is probably due to the following critical restriction (and 
they have other less severe restrictions in parallel execution.)  Our 
implementation does not have things like this.

"Serial or parallel statements that attempt to access a table that has been 
modified in parallel within the same transaction are rejected with an error 
message."

"A transaction can contain multiple parallel DML statements that modify 
different tables, but after a parallel DML statement modifies a table, no 
subsequent serial or parallel statement (DML or query) can access the same 
table again in that transaction."


OTOH, I'm a bit concerned about whether we would have a common reason to 
disable parallel INSERT, UPDATE and DELETE.  Oracle states space usage 
difference as follows.  I wonder if something similar would apply to our 
parallel UPDATE/DELETE, particularly UPDATE.  At least, we already know 
parallel INSERT results in larger tables and indexes compared to serial 
execution.

"Parallel UPDATE uses the existing free space in the object, while direct-path 
INSERT gets new extents for the data."

"Space usage characteristics may be different in parallel than serial execution 
because multiple concurrent child transactions modify the object."

Even if that's the case, we can add _update parameters, although we feel sorry 
to cause users trouble to have to be aware of multiple parameters.

Or, when it has turned out that we need _update and/or _delete parameters, we 
can opt to rename _insert to _dml, and keep the _insert parameter as an old, 
hidden synonym.


Regards
Takayuki Tsunakawa

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-10 Thread houzj.f...@fujitsu.com

> > 2. Should we keep the default value of GUC to on or off? It is
> > currently off. I am fine keeping it off for this release and we can
> > always turn it on in the later releases if required. Having said that,
> > I see the value in keeping it on because in many cases Insert ...
> > Select will be used for large data and there we will see a benefit of
> > parallelism and users facing trouble (who have a very large number of
> > partitions with less data to query) can still disable the parallel
> > insert for that particular table. Also, the other benefit of keeping
> > it on till at least the beta period is that this functionality will
> > get tested and if we found reports of regression then we can turn it
> > off for this release as well.
> >
> > Thoughts?
> 
> IMHO, we should keep it on because in most of the cases it is going the give
> benefit to the user, and if for some specific scenario where a table has a 
> lot of
> partition then it can be turned off using reloption.  And, if for some users 
> most
> of the tables are like that where they always have a lot of partition then 
> the user
> can turn it off using guc.
> 
I think your suggestion makes sense.
If no one have other opinions, I will post a new version set default 
enable_parallel_insert=on soon.

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-10 Thread Dilip Kumar

On Wed, Mar 10, 2021 at 2:48 PM Amit Kapila  wrote:
>
> On Mon, Mar 8, 2021 at 7:19 PM Amit Langote  wrote:
> >
> > Hi Amit
> >
> > On Mon, Mar 8, 2021 at 10:18 PM Amit Kapila  wrote:
> > > On Mon, Mar 8, 2021 at 3:54 PM Greg Nancarrow  wrote:
> > > > I've attached an updated set of patches with the suggested locking 
> > > > changes.
> >
> > (Thanks Greg.)
> >
> > > Amit L, others, do let me know if you have still more comments on
> > > 0001* patch or if you want to review it further?
> >
> > I just read through v25 and didn't find anything to complain about.
> >
>
> Thanks a lot, pushed now! Amit L., your inputs are valuable for this work.
>
> Now, coming back to Hou-San's patch to introduce a GUC and reloption
> for this feature, I think both of those make sense to me because when
> the feature is enabled via GUC, one might want to disable it for
> partitioned tables? Do we agree on that part or someone thinks
> otherwise?

I think it makes sense to have both.

>
> The other points to bikeshed could be:
> 1. The name of GUC and reloption. The two proposals at hand are
> enable_parallel_dml and enable_parallel_insert. I would prefer the
> second (enable_parallel_insert) because updates/deletes might not have
> a similar overhead.

I agree enable_parallel_insert makes more sense.

> 2. Should we keep the default value of GUC to on or off? It is
> currently off. I am fine keeping it off for this release and we can
> always turn it on in the later releases if required. Having said that,
> I see the value in keeping it on because in many cases Insert ...
> Select will be used for large data and there we will see a benefit of
> parallelism and users facing trouble (who have a very large number of
> partitions with less data to query) can still disable the parallel
> insert for that particular table. Also, the other benefit of keeping
> it on till at least the beta period is that this functionality will
> get tested and if we found reports of regression then we can turn it
> off for this release as well.
>
> Thoughts?

IMHO, we should keep it on because in most of the cases it is going
the give benefit to the user, and if for some specific scenario where
a table has a lot of partition then it can be turned off using
reloption.  And, if for some users most of the tables are like that
where they always have a lot of partition then the user can turn it
off using guc.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-10 Thread Amit Kapila

On Mon, Mar 8, 2021 at 7:19 PM Amit Langote  wrote:
>
> Hi Amit
>
> On Mon, Mar 8, 2021 at 10:18 PM Amit Kapila  wrote:
> > On Mon, Mar 8, 2021 at 3:54 PM Greg Nancarrow  wrote:
> > > I've attached an updated set of patches with the suggested locking 
> > > changes.
>
> (Thanks Greg.)
>
> > Amit L, others, do let me know if you have still more comments on
> > 0001* patch or if you want to review it further?
>
> I just read through v25 and didn't find anything to complain about.
>

Thanks a lot, pushed now! Amit L., your inputs are valuable for this work.

Now, coming back to Hou-San's patch to introduce a GUC and reloption
for this feature, I think both of those make sense to me because when
the feature is enabled via GUC, one might want to disable it for
partitioned tables? Do we agree on that part or someone thinks
otherwise?

The other points to bikeshed could be:
1. The name of GUC and reloption. The two proposals at hand are
enable_parallel_dml and enable_parallel_insert. I would prefer the
second (enable_parallel_insert) because updates/deletes might not have
a similar overhead.

2. Should we keep the default value of GUC to on or off? It is
currently off. I am fine keeping it off for this release and we can
always turn it on in the later releases if required. Having said that,
I see the value in keeping it on because in many cases Insert ...
Select will be used for large data and there we will see a benefit of
parallelism and users facing trouble (who have a very large number of
partitions with less data to query) can still disable the parallel
insert for that particular table. Also, the other benefit of keeping
it on till at least the beta period is that this functionality will
get tested and if we found reports of regression then we can turn it
off for this release as well.

Thoughts?

-- 
With Regards,
Amit Kapila.

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-08 Thread houzj.f...@fujitsu.com

> >
> > I've attached an updated set of patches with the suggested locking changes.
> >
> 
> Amit L, others, do let me know if you have still more comments on
> 0001* patch or if you want to review it further?

I took a look into the latest 0001 patch, and it looks good to me.

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-08 Thread Amit Langote

Hi Amit

On Mon, Mar 8, 2021 at 10:18 PM Amit Kapila  wrote:
> On Mon, Mar 8, 2021 at 3:54 PM Greg Nancarrow  wrote:
> > I've attached an updated set of patches with the suggested locking changes.

(Thanks Greg.)

> Amit L, others, do let me know if you have still more comments on
> 0001* patch or if you want to review it further?

I just read through v25 and didn't find anything to complain about.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-08 Thread Amit Kapila

On Mon, Mar 8, 2021 at 3:54 PM Greg Nancarrow  wrote:
>
> I've attached an updated set of patches with the suggested locking changes.
>

Amit L, others, do let me know if you have still more comments on
0001* patch or if you want to review it further?

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-08 Thread Greg Nancarrow

On Mon, Mar 8, 2021 at 6:25 PM Amit Langote  wrote:
>
> A couple of things that look odd in v24-0001 (sorry if there were like
> that from the beginning):
>
> +static bool
> +target_rel_max_parallel_hazard(max_parallel_hazard_context *context)
> +{
> +   boolmax_hazard_found;
> +
> +   RelationtargetRel;
> +
> +   /*
> +* The target table is already locked by the caller (this is done in the
> +* parse/analyze phase), and remains locked until end-of-transaction.
> +*/
> +   targetRel = table_open(context->target_rte->relid,
> +  context->target_rte->rellockmode);
>
> The comment seems to imply that the lock need not be retaken here, but
> the code does.  Maybe it's fine to pass NoLock here, but use
> rellockmode when locking partitions, because they would not have been
> locked by the parser/analyzer.

Actually, it was originally NoLock, but I think in your suggested
changes (v15_delta.diff) to better integrate the extra parallel-safety
checks into max_parallel_hazard(), you changed "NoLock" to
"context->targetRTE->rellockmode"..
Having said that, since the table is already locked, I think that
using "context->target_rte->rellockmode" in this case just results in
the lock reference count being incremented, so I doubt it really makes
any difference, since it's locked until end-of-transaction.
I'll revert it back to NoLock.

>Which brings me to:
>
> +static bool
> +target_rel_partitions_max_parallel_hazard(Relation rel,
> + max_parallel_hazard_context 
> *context)
> +{
> ...
> +   for (i = 0; i < pdesc->nparts; i++)
> +   {
> +   boolmax_hazard_found;
> +   Relationpart_rel;
> +
> +   /*
> +* The partition needs to be locked, and remain locked until
> +* end-of-transaction to ensure its parallel-safety state is not
> +* hereafter altered.
> +*/
> +   part_rel = table_open(pdesc->oids[i], AccessShareLock);
>
> Not that I can prove there to be any actual hazard, but we tend to
> avoid taking locks with different strengths in the same query as would
> occur with this piece of code.  We're locking the partition with
> AccessShareLock here, but the executor will lock the partition with
> the stronger RowExclusiveLock mode before trying to insert into it.
> Better to use the stronger mode to begin with or just use the target
> partitioned table's RTE's rellockmode which would be RowExclusiveLock.
> You can see that that's what AcquireExecutorLocksOnPartitions() will
> do in the generic plan case.
>

OK, I see what you mean, best to use the target partitioned table's
RTE's rellockmode in this case then.
I'll update the patch accordingly.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-08 Thread Amit Kapila

On Mon, Mar 8, 2021 at 12:55 PM Amit Langote  wrote:
>
> Hi Amit, Greg,
>
> Sorry, I hadn't noticed last week that some questions were addressed to me.
>
> On Sat, Mar 6, 2021 at 7:19 PM Amit Kapila  wrote:
> > Thanks, your changes look good to me. I went ahead and changed the
> > patch to track the partitionOids once rather than in two separate
> > lists and use that list in PlanCacheRelCallback to invalidate the
> > plans.
>
> Not mixing the partition OIDs into relationOids seems fine to me.  I
> had considered that option too, but somehow forgot to mention it here.
>

Okay, thanks for the confirmation.

> A couple of things that look odd in v24-0001 (sorry if there were like
> that from the beginning):
>
> +static bool
> +target_rel_max_parallel_hazard(max_parallel_hazard_context *context)
> +{
> +   boolmax_hazard_found;
> +
> +   RelationtargetRel;
> +
> +   /*
> +* The target table is already locked by the caller (this is done in the
> +* parse/analyze phase), and remains locked until end-of-transaction.
> +*/
> +   targetRel = table_open(context->target_rte->relid,
> +  context->target_rte->rellockmode);
>
> The comment seems to imply that the lock need not be retaken here, but
> the code does.  Maybe it's fine to pass NoLock here, but use
> rellockmode when locking partitions, because they would not have been
> locked by the parser/analyzer.  Which brings me to:
>
> +static bool
> +target_rel_partitions_max_parallel_hazard(Relation rel,
> + max_parallel_hazard_context 
> *context)
> +{
> ...
> +   for (i = 0; i < pdesc->nparts; i++)
> +   {
> +   boolmax_hazard_found;
> +   Relationpart_rel;
> +
> +   /*
> +* The partition needs to be locked, and remain locked until
> +* end-of-transaction to ensure its parallel-safety state is not
> +* hereafter altered.
> +*/
> +   part_rel = table_open(pdesc->oids[i], AccessShareLock);
>
> Not that I can prove there to be any actual hazard, but we tend to
> avoid taking locks with different strengths in the same query as would
> occur with this piece of code.  We're locking the partition with
> AccessShareLock here, but the executor will lock the partition with
> the stronger RowExclusiveLock mode before trying to insert into it.
> Better to use the stronger mode to begin with or just use the target
> partitioned table's RTE's rellockmode which would be RowExclusiveLock.
> You can see that that's what AcquireExecutorLocksOnPartitions() will
> do in the generic plan case.
>

Both of your comments make sense to me.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-07 Thread Amit Langote

Hi Amit, Greg,

Sorry, I hadn't noticed last week that some questions were addressed to me.

On Sat, Mar 6, 2021 at 7:19 PM Amit Kapila  wrote:
> Thanks, your changes look good to me. I went ahead and changed the
> patch to track the partitionOids once rather than in two separate
> lists and use that list in PlanCacheRelCallback to invalidate the
> plans.

Not mixing the partition OIDs into relationOids seems fine to me.  I
had considered that option too, but somehow forgot to mention it here.

A couple of things that look odd in v24-0001 (sorry if there were like
that from the beginning):

+static bool
+target_rel_max_parallel_hazard(max_parallel_hazard_context *context)
+{
+   boolmax_hazard_found;
+
+   RelationtargetRel;
+
+   /*
+* The target table is already locked by the caller (this is done in the
+* parse/analyze phase), and remains locked until end-of-transaction.
+*/
+   targetRel = table_open(context->target_rte->relid,
+  context->target_rte->rellockmode);

The comment seems to imply that the lock need not be retaken here, but
the code does.  Maybe it's fine to pass NoLock here, but use
rellockmode when locking partitions, because they would not have been
locked by the parser/analyzer.  Which brings me to:

+static bool
+target_rel_partitions_max_parallel_hazard(Relation rel,
+ max_parallel_hazard_context *context)
+{
...
+   for (i = 0; i < pdesc->nparts; i++)
+   {
+   boolmax_hazard_found;
+   Relationpart_rel;
+
+   /*
+* The partition needs to be locked, and remain locked until
+* end-of-transaction to ensure its parallel-safety state is not
+* hereafter altered.
+*/
+   part_rel = table_open(pdesc->oids[i], AccessShareLock);

Not that I can prove there to be any actual hazard, but we tend to
avoid taking locks with different strengths in the same query as would
occur with this piece of code.  We're locking the partition with
AccessShareLock here, but the executor will lock the partition with
the stronger RowExclusiveLock mode before trying to insert into it.
Better to use the stronger mode to begin with or just use the target
partitioned table's RTE's rellockmode which would be RowExclusiveLock.
You can see that that's what AcquireExecutorLocksOnPartitions() will
do in the generic plan case.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

RE: Parallel INSERT (INTO ... SELECT ...)

2021-03-07 Thread tsunakawa.ta...@fujitsu.com

From: Amit Kapila 
> For now, I have left 0005 and 0006 patches, we can come back to those once we
> are done with the first set of patches. The first patch looks good to me and I
> think we can commit it and then bikeshed about GUC/reloption patch.

Agreed, it looks good to me, too.


Regards
Takayuki Tsunakawa

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-07 Thread Greg Nancarrow

On Sat, Mar 6, 2021 at 9:19 PM Amit Kapila  wrote:
>
> On Fri, Mar 5, 2021 at 6:34 PM Greg Nancarrow  wrote:
> >
> > For the time being at least, I am posting an updated set of patches,
> > as I found that the additional parallel-safety checks on DOMAIN check
> > constraints to be somewhat inefficient and could also be better
> > integrated into max_parallel_hazard(). I also updated the basic tests
> > with a test case for this.
> >
>
> Thanks, your changes look good to me. I went ahead and changed the
> patch to track the partitionOids once rather than in two separate
> lists and use that list in PlanCacheRelCallback to invalidate the
> plans. I made few other changes:
> a. don't need to retain the lock on indexes as we can't change index 
> expressions
> b. added/updated comments at few places in the code.
> c. made a few other cosmetic changes
> d. ran pgindent
> e. slightly modify the commit message.
> f. combine the code and test case patch
>
> For now, I have left 0005 and 0006 patches, we can come back to those
> once we are done with the first set of patches. The first patch looks
> good to me and I think we can commit it and then bikeshed about
> GUC/reloption patch.
>

I've checked and tested the changes, and the resultant patches look OK
to me, so I'm happy for you to commit the first patch.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-06 Thread Zhihong Yu

For cfffe83ba82021a1819a656e7ec5c28fb3a99152, if a bool was written (true |
false), READ_INT_FIELD calls atoi() where atoi("true") returns 0
and atoi("false") returns 0 as well.

I am not sure if the new release containing these commits had a higher cat
version compared to the previous release.
If the new releases did have a higher cat version, I guess there was no
issue, by chance.

Cheers

On Sat, Mar 6, 2021 at 8:12 PM Amit Kapila  wrote:

> On Sun, Mar 7, 2021 at 8:24 AM Zhihong Yu  wrote:
> >
> > I was looking at src/backend/nodes/readfuncs.c
> >
> > READ_NODE_FIELD(relationOids);
> > +   READ_NODE_FIELD(partitionOids);
> >
> > READ_NODE_FIELD would call nodeRead() for partitionOids. However, such
> field may not exist.
> > Since there is no 'if (strncmp(":partitionOids", token, length) == 0) {'
> check, I was curious whether CATALOG_VERSION_NO should be bumped.
> >
>
> Won't that be true only if we are reading the stored planned stmt from
> disk? But is it possible in any way? The last two commits in this
> function (cfffe83b and 45639a05) also didn't bump the catversion.
>
> --
> With Regards,
> Amit Kapila.
>

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-06 Thread Amit Kapila

On Sun, Mar 7, 2021 at 8:24 AM Zhihong Yu  wrote:
>
> I was looking at src/backend/nodes/readfuncs.c
>
> READ_NODE_FIELD(relationOids);
> +   READ_NODE_FIELD(partitionOids);
>
> READ_NODE_FIELD would call nodeRead() for partitionOids. However, such field 
> may not exist.
> Since there is no 'if (strncmp(":partitionOids", token, length) == 0) {' 
> check, I was curious whether CATALOG_VERSION_NO should be bumped.
>

Won't that be true only if we are reading the stored planned stmt from
disk? But is it possible in any way? The last two commits in this
function (cfffe83b and 45639a05) also didn't bump the catversion.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-06 Thread Zhihong Yu

I was looking at src/backend/nodes/readfuncs.c

READ_NODE_FIELD(relationOids);
+   READ_NODE_FIELD(partitionOids);

READ_NODE_FIELD would call nodeRead() for partitionOids. However, such
field may not exist.
Since there is no 'if (strncmp(":partitionOids", token, length) == 0) {'
check, I was curious whether CATALOG_VERSION_NO should be bumped.

Cheers

On Sat, Mar 6, 2021 at 6:31 PM Amit Kapila  wrote:

> On Sat, Mar 6, 2021 at 9:13 PM Zhihong Yu  wrote:
> >
> > Hi,
> > Does CATALOG_VERSION_NO need to be bumped (introduction of partitionOids
> field) ?
> >
>
> Good question. I usually update CATALOG_VERSION_NO when the patch
> changes any of the system catalogs. This is what is also mentioned in
> catversion.h. See the following text in that file: "The catalog
> version number is used to flag incompatible changes in the PostgreSQL
> system catalogs.  Whenever anyone changes the format of a system
> catalog relation, or adds, deletes, or modifies standard
> catalog entries in such a way that an updated backend wouldn't work
> with an old database (or vice versa), the catalog version number
> should be changed.".
>
> I might be missing something here but why you think that due to
> partitionOids field (in plannedstmt or at another place) we need to
> update CATALOG_VERSION_NO?
>
> Anyway, thanks for bringing this up.
>
> --
> With Regards,
> Amit Kapila.
>

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-06 Thread Amit Kapila

On Sat, Mar 6, 2021 at 9:13 PM Zhihong Yu  wrote:
>
> Hi,
> Does CATALOG_VERSION_NO need to be bumped (introduction of partitionOids 
> field) ?
>

Good question. I usually update CATALOG_VERSION_NO when the patch
changes any of the system catalogs. This is what is also mentioned in
catversion.h. See the following text in that file: "The catalog
version number is used to flag incompatible changes in the PostgreSQL
system catalogs.  Whenever anyone changes the format of a system
catalog relation, or adds, deletes, or modifies standard
catalog entries in such a way that an updated backend wouldn't work
with an old database (or vice versa), the catalog version number
should be changed.".

I might be missing something here but why you think that due to
partitionOids field (in plannedstmt or at another place) we need to
update CATALOG_VERSION_NO?

Anyway, thanks for bringing this up.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-06 Thread Zhihong Yu

Hi,
Does CATALOG_VERSION_NO need to be bumped (introduction of
partitionOids field) ?

Cheers

On Sat, Mar 6, 2021 at 2:19 AM Amit Kapila  wrote:

> On Fri, Mar 5, 2021 at 6:34 PM Greg Nancarrow  wrote:
> >
> > For the time being at least, I am posting an updated set of patches,
> > as I found that the additional parallel-safety checks on DOMAIN check
> > constraints to be somewhat inefficient and could also be better
> > integrated into max_parallel_hazard(). I also updated the basic tests
> > with a test case for this.
> >
>
> Thanks, your changes look good to me. I went ahead and changed the
> patch to track the partitionOids once rather than in two separate
> lists and use that list in PlanCacheRelCallback to invalidate the
> plans. I made few other changes:
> a. don't need to retain the lock on indexes as we can't change index
> expressions
> b. added/updated comments at few places in the code.
> c. made a few other cosmetic changes
> d. ran pgindent
> e. slightly modify the commit message.
> f. combine the code and test case patch
>
> For now, I have left 0005 and 0006 patches, we can come back to those
> once we are done with the first set of patches. The first patch looks
> good to me and I think we can commit it and then bikeshed about
> GUC/reloption patch.
>
>
>
> --
> With Regards,
> Amit Kapila.
>

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-05 Thread Amit Kapila

On Fri, Mar 5, 2021 at 6:34 PM Greg Nancarrow  wrote:
>
> On Fri, Mar 5, 2021 at 9:35 PM Amit Kapila  wrote:
> >
> > On Fri, Mar 5, 2021 at 8:24 AM Greg Nancarrow  wrote:
> > >
> >
> > In patch v21-0003-Add-new-parallel-dml-GUC-and-table-options, we are
> > introducing GUC (enable_parallel_dml) and table option
> > (parallel_dml_enabled) for this feature. I am a bit worried about
> > using *_dml in the names because it is quite possible that for
> > parallel updates and parallel deletes we might not need any such GUC.
> > The reason we mainly need here is due to checking of parallel-safety
> > of partitioned tables and updates/deletes handle partitioned tables
> > differently than inserts so those might not be that costly. It is
> > possible that they are costly due to a different reason but not sure
> > mapping those to one GUC or table option is a good idea. Can we
> > consider using *_insert instead? I think GUC having _insert can be
> > probably used for a parallel copy (from) as well which I think will
> > have a similar overhead.
> >
>
> I'll need to think about that one.
>

Okay, one more minor comment on the same patch:
+ /*
+ * Check if parallel_dml_enabled is enabled for the target table,
+ * if not, skip the safety checks.
+ *
+ * (Note: if the target table is partitioned, the parallel_dml_enabled
+ * option setting of the partitions are ignored).
+ */
+ rte = rt_fetch(parse->resultRelation, parse->rtable);
+ rel = table_open(rte->relid, NoLock);

I think here the patch is using NoLock based on the assumption that
the relation is already locked in parse/analyze phase, so it's better
to have a comment like the one we have in
target_rel_max_parallel_hazard.

> I may be wrong, but I would have thought at least updates would have
> similar parallel-safety checking requirements to inserts and would
> have similar potential cost issues.
>

For updates and deletes, we go via inheritance planner where we
already do some work related to partitions, so I think we might not
need to open all the partitions for parallel-safety checks as we do
for inserts. I am sure some part of the code for inserts like the one
we have max_parallel_hazard will be used for updates as well but there
will be probably some change for checking parallel-safety for
partitioned relations.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-05 Thread Amit Kapila

On Fri, Mar 5, 2021 at 8:24 AM Greg Nancarrow  wrote:
>

In patch v21-0003-Add-new-parallel-dml-GUC-and-table-options, we are
introducing GUC (enable_parallel_dml) and table option
(parallel_dml_enabled) for this feature. I am a bit worried about
using *_dml in the names because it is quite possible that for
parallel updates and parallel deletes we might not need any such GUC.
The reason we mainly need here is due to checking of parallel-safety
of partitioned tables and updates/deletes handle partitioned tables
differently than inserts so those might not be that costly. It is
possible that they are costly due to a different reason but not sure
mapping those to one GUC or table option is a good idea. Can we
consider using *_insert instead? I think GUC having _insert can be
probably used for a parallel copy (from) as well which I think will
have a similar overhead.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-04 Thread Amit Kapila

On Fri, Mar 5, 2021 at 5:07 AM Greg Nancarrow  wrote:
>
> On Thu, Mar 4, 2021 at 11:05 PM Amit Kapila  wrote:
> >
> > On Thu, Mar 4, 2021 at 5:24 PM Greg Nancarrow  wrote:
> > >
> > > On Thu, Mar 4, 2021 at 10:07 PM Amit Kapila  
> > > wrote:
> > > >
> > >
> > > >Also, in
> > > > standard_planner, we should add these partitionOids only for
> > > > parallel-mode.
> > > >
> > >
> > > It is doing that in v20 patch (what makes you think it isn't?).
> > >
> >
> > The below code snippet:
> > + /* For AcquireExecutorLocks(). */
> > + if (IsModifySupportedInParallelMode(parse->commandType))
> > + result->partitionOids = glob->partitionOids;
> >
> > I understand that you have a check for the parallel mode in
> > AcquireExecutorLocks but why can't we have it before adding that to
> > planned statement
> >
>
> OK, I think I'm on the same wavelength now (sorry, I didn't realise
> you're talking about PlannedStmt).
>
> What  I believe you're suggesting is in the planner where
> partitionOids are "added" to the returned PlannedStmt, they should
> only be added if glob->parallelModeNeeded is true:.
>
> i.e.
>
> /* For AcquireExecutorLocks(). */
> if (glob->partitionOids != NIL && glob->parallelModeNeeded)
> result->partitionOids = glob->partitionOids;
>
> (seems reasonable to me, as then it will match the condition for which
> glob->partitionOids are added to glob->relationOids)
>
> then in plancache.c the check on parallelModeNeeded can be removed:
>
> /* Lock partitions ahead of modifying them in parallel mode. */
> if (rti == resultRelation &&
> plannedstmt->partitionOids != NIL)
> AcquireExecutorLocksOnPartitions(plannedstmt->partitionOids,
>  rte->rellockmode, acquire);
>
> Let me know if this matches what you were thinking.
>

Yes, that is what I was thinking. But I have another question as well
regarding tracking these partitions at two places (once in
plannedstmt->relationOids and the second time in
plannedstmt->partitionOids). I think it is better if you can prepare a
patch with all the comments raised till now leaving this last question
aside.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-04 Thread Greg Nancarrow

On Thu, Mar 4, 2021 at 11:05 PM Amit Kapila  wrote:
>
> On Thu, Mar 4, 2021 at 5:24 PM Greg Nancarrow  wrote:
> >
> > On Thu, Mar 4, 2021 at 10:07 PM Amit Kapila  wrote:
> > >
> >
> > >Also, in
> > > standard_planner, we should add these partitionOids only for
> > > parallel-mode.
> > >
> >
> > It is doing that in v20 patch (what makes you think it isn't?).
> >
>
> The below code snippet:
> + /* For AcquireExecutorLocks(). */
> + if (IsModifySupportedInParallelMode(parse->commandType))
> + result->partitionOids = glob->partitionOids;
>
> I understand that you have a check for the parallel mode in
> AcquireExecutorLocks but why can't we have it before adding that to
> planned statement
>

OK, I think I'm on the same wavelength now (sorry, I didn't realise
you're talking about PlannedStmt).

What  I believe you're suggesting is in the planner where
partitionOids are "added" to the returned PlannedStmt, they should
only be added if glob->parallelModeNeeded is true:.

i.e.

/* For AcquireExecutorLocks(). */
if (glob->partitionOids != NIL && glob->parallelModeNeeded)
result->partitionOids = glob->partitionOids;

(seems reasonable to me, as then it will match the condition for which
glob->partitionOids are added to glob->relationOids)

then in plancache.c the check on parallelModeNeeded can be removed:

/* Lock partitions ahead of modifying them in parallel mode. */
if (rti == resultRelation &&
plannedstmt->partitionOids != NIL)
AcquireExecutorLocksOnPartitions(plannedstmt->partitionOids,
 rte->rellockmode, acquire);

Let me know if this matches what you were thinking.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-04 Thread Amit Kapila

On Thu, Mar 4, 2021 at 5:24 PM Greg Nancarrow  wrote:
>
> On Thu, Mar 4, 2021 at 10:07 PM Amit Kapila  wrote:
> >
>
> >Also, in
> > standard_planner, we should add these partitionOids only for
> > parallel-mode.
> >
>
> It is doing that in v20 patch (what makes you think it isn't?).
>

The below code snippet:
+ /* For AcquireExecutorLocks(). */
+ if (IsModifySupportedInParallelMode(parse->commandType))
+ result->partitionOids = glob->partitionOids;

I understand that you have a check for the parallel mode in
AcquireExecutorLocks but why can't we have it before adding that to
planned statement

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-04 Thread Greg Nancarrow

On Thu, Mar 4, 2021 at 10:07 PM Amit Kapila  wrote:
>
> On Fri, Feb 26, 2021 at 10:37 AM Amit Langote  wrote:
> >
> > I realized that there is a race condition that will allow a concurrent
> > backend to invalidate a parallel plan (for insert into a partitioned
> > table) at a point in time when it's too late for plancache.c to detect
> > it.  It has to do with how plancache.c locks the relations affected by
> > a cached query and its cached plan.  Specifically,
> > AcquireExecutorLocks(), which locks the relations that need to be
> > locked before the plan could be considered safe to execute, does not
> > notice the partitions that would have been checked for parallel safety
> > when the plan was made.  Given that AcquireExecutorLocks() only loops
> > over PlannedStmt.rtable and locks exactly the relations found there,
> > partitions are not locked.  This means that a concurrent backend can,
> > for example, add an unsafe trigger to a partition before a parallel
> > worker inserts into it only to fail when it does.
> >
> > Steps to reproduce (tried with v19 set of patches):
> >
> > drop table if exists rp, foo;
> > create table rp (a int) partition by range (a);
> > create table rp1 partition of rp for values from (minvalue) to (0);
> > create table rp2 partition of rp for values from (0) to (maxvalue);
> > create table foo (a) as select generate_series(1, 100);
> > set plan_cache_mode to force_generic_plan;
> > set enable_parallel_dml to on;
> > deallocate all;
> > prepare q as insert into rp select * from foo where a%2 = 0;
> > explain analyze execute q;
> >
> > At this point, attach a debugger to this backend and set a breakpoint
> > in AcquireExecutorLocks() and execute q again:
> >
> > -- will execute the cached plan
> > explain analyze execute q;
> >
> > Breakpoint will be hit.  Continue till return from the function and
> > stop. Start another backend and execute this:
> >
> > -- will go through, because no lock still taken on the partition
> > create or replace function make_table () returns trigger language
> > plpgsql as $$ begin create table bar(); return null; end; $$ parallel
> > unsafe;
> > create trigger ai_rp2 after insert on rp2 for each row execute
> > function make_table();
> >
> > Back to the debugger, hit continue to let the plan execute.  You
> > should see this error:
> >
> > ERROR:  cannot start commands during a parallel operation
> > CONTEXT:  SQL statement "create table bar()"
> > PL/pgSQL function make_table() line 1 at SQL statement parallel worker
> >
> > The attached patch fixes this,
> >
>
> One thing I am a bit worried about this fix is that after this for
> parallel-mode, we will maintain partitionOids at two places, one in
> plannedstmt->relationOids and the other in plannedstmt->partitionOids.
> I guess you have to do that because, in AcquireExecutorLocks, we can't
> find which relationOids are corresponding to partitionOids, am I
> right? If so, why not just maintain them in plannedstmt->partitionOids
> and then make PlanCacheRelCallback consider it?

Maybe Amit Langote can kindly comment on this, as it would involve
updates to his prior partition-related fixes.

>Also, in
> standard_planner, we should add these partitionOids only for
> parallel-mode.
>

It is doing that in v20 patch (what makes you think it isn't?).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-04 Thread Amit Kapila

On Thu, Mar 4, 2021 at 4:37 PM Amit Kapila  wrote:
>
> On Fri, Feb 26, 2021 at 10:37 AM Amit Langote  wrote:
> >
> > I realized that there is a race condition that will allow a concurrent
> > backend to invalidate a parallel plan (for insert into a partitioned
> > table) at a point in time when it's too late for plancache.c to detect
> > it.  It has to do with how plancache.c locks the relations affected by
> > a cached query and its cached plan.  Specifically,
> > AcquireExecutorLocks(), which locks the relations that need to be
> > locked before the plan could be considered safe to execute, does not
> > notice the partitions that would have been checked for parallel safety
> > when the plan was made.  Given that AcquireExecutorLocks() only loops
> > over PlannedStmt.rtable and locks exactly the relations found there,
> > partitions are not locked.  This means that a concurrent backend can,
> > for example, add an unsafe trigger to a partition before a parallel
> > worker inserts into it only to fail when it does.
> >
> > Steps to reproduce (tried with v19 set of patches):
> >
> > drop table if exists rp, foo;
> > create table rp (a int) partition by range (a);
> > create table rp1 partition of rp for values from (minvalue) to (0);
> > create table rp2 partition of rp for values from (0) to (maxvalue);
> > create table foo (a) as select generate_series(1, 100);
> > set plan_cache_mode to force_generic_plan;
> > set enable_parallel_dml to on;
> > deallocate all;
> > prepare q as insert into rp select * from foo where a%2 = 0;
> > explain analyze execute q;
> >
> > At this point, attach a debugger to this backend and set a breakpoint
> > in AcquireExecutorLocks() and execute q again:
> >
> > -- will execute the cached plan
> > explain analyze execute q;
> >
> > Breakpoint will be hit.  Continue till return from the function and
> > stop. Start another backend and execute this:
> >
> > -- will go through, because no lock still taken on the partition
> > create or replace function make_table () returns trigger language
> > plpgsql as $$ begin create table bar(); return null; end; $$ parallel
> > unsafe;
> > create trigger ai_rp2 after insert on rp2 for each row execute
> > function make_table();
> >
> > Back to the debugger, hit continue to let the plan execute.  You
> > should see this error:
> >
> > ERROR:  cannot start commands during a parallel operation
> > CONTEXT:  SQL statement "create table bar()"
> > PL/pgSQL function make_table() line 1 at SQL statement parallel worker
> >
> > The attached patch fixes this,
> >
>
> One thing I am a bit worried about this fix is that after this for
> parallel-mode, we will maintain partitionOids at two places, one in
> plannedstmt->relationOids and the other in plannedstmt->partitionOids.
> I guess you have to do that because, in AcquireExecutorLocks, we can't
> find which relationOids are corresponding to partitionOids, am I
> right? If so, why not just maintain them in plannedstmt->partitionOids
> and then make PlanCacheRelCallback consider it? Also, in
> standard_planner, we should add these partitionOids only for
> parallel-mode.
>

One more point I was thinking about is whether we need to worry about
locking indexes during prepared query execution (similar to what we do
for AcquireExecutorLocks). I think we shouldn't be bothered to lock
those or even retain lock during parallel-safety checks because one
cannot change index expression or predicate. Is my understanding
correct or am I missing something and we should be worried about them
as well.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-04 Thread Amit Kapila

On Fri, Feb 26, 2021 at 10:37 AM Amit Langote  wrote:
>
> I realized that there is a race condition that will allow a concurrent
> backend to invalidate a parallel plan (for insert into a partitioned
> table) at a point in time when it's too late for plancache.c to detect
> it.  It has to do with how plancache.c locks the relations affected by
> a cached query and its cached plan.  Specifically,
> AcquireExecutorLocks(), which locks the relations that need to be
> locked before the plan could be considered safe to execute, does not
> notice the partitions that would have been checked for parallel safety
> when the plan was made.  Given that AcquireExecutorLocks() only loops
> over PlannedStmt.rtable and locks exactly the relations found there,
> partitions are not locked.  This means that a concurrent backend can,
> for example, add an unsafe trigger to a partition before a parallel
> worker inserts into it only to fail when it does.
>
> Steps to reproduce (tried with v19 set of patches):
>
> drop table if exists rp, foo;
> create table rp (a int) partition by range (a);
> create table rp1 partition of rp for values from (minvalue) to (0);
> create table rp2 partition of rp for values from (0) to (maxvalue);
> create table foo (a) as select generate_series(1, 100);
> set plan_cache_mode to force_generic_plan;
> set enable_parallel_dml to on;
> deallocate all;
> prepare q as insert into rp select * from foo where a%2 = 0;
> explain analyze execute q;
>
> At this point, attach a debugger to this backend and set a breakpoint
> in AcquireExecutorLocks() and execute q again:
>
> -- will execute the cached plan
> explain analyze execute q;
>
> Breakpoint will be hit.  Continue till return from the function and
> stop. Start another backend and execute this:
>
> -- will go through, because no lock still taken on the partition
> create or replace function make_table () returns trigger language
> plpgsql as $$ begin create table bar(); return null; end; $$ parallel
> unsafe;
> create trigger ai_rp2 after insert on rp2 for each row execute
> function make_table();
>
> Back to the debugger, hit continue to let the plan execute.  You
> should see this error:
>
> ERROR:  cannot start commands during a parallel operation
> CONTEXT:  SQL statement "create table bar()"
> PL/pgSQL function make_table() line 1 at SQL statement parallel worker
>
> The attached patch fixes this,
>

One thing I am a bit worried about this fix is that after this for
parallel-mode, we will maintain partitionOids at two places, one in
plannedstmt->relationOids and the other in plannedstmt->partitionOids.
I guess you have to do that because, in AcquireExecutorLocks, we can't
find which relationOids are corresponding to partitionOids, am I
right? If so, why not just maintain them in plannedstmt->partitionOids
and then make PlanCacheRelCallback consider it? Also, in
standard_planner, we should add these partitionOids only for
parallel-mode.

Thoughts?

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-04 Thread Dilip Kumar

On Thu, Mar 4, 2021 at 9:03 AM Amit Kapila  wrote:
>
> I think for Update/Delete, we might not do parallel-safety checks by
> calling target_rel_max_parallel_hazard_recurse especially because
> partitions are handled differently for Updates and Deletes (see
> inheritance_planner()). I think what Dilip is telling doesn't sound
> unreasonable to me. So, even, if we want to extend it later by making
> some checks specific to Inserts/Updates, we can do it at that time.
> The comments you have at that place are sufficient to tell that in the
> future we can use those checks for Updates as well. They will need
> some adjustment if we remove that check but the intent is clear.

+1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-04 Thread Dilip Kumar

On Thu, Mar 4, 2021 at 7:16 AM Greg Nancarrow  wrote:
>
> On Thu, Mar 4, 2021 at 1:07 AM Dilip Kumar  wrote:
> >
> > On Wed, Mar 3, 2021 at 5:50 PM Greg Nancarrow  wrote:
> > >
> > I agree that assert is only for debug build, but once we add and
> > assert that means we are sure that it should only be called for insert
> > and if it is called for anything else then it is a programming error
> > from the caller's side.  So after the assert, adding if check for the
> > same condition doesn't look like a good idea.  That means we think
> > that the code can hit assert in the debug mode so we need an extra
> > protection in the release mode.
> >
>
> The if-check isn't there for "extra protection".
> It's to help with future changes; inside that if-block is code only
> applicable to INSERT (and to UPDATE - sorry, before I said DELETE), as
> the code-comment indicates, whereas the rest of the function is
> generic to all command types. I don't see any problem with having this
> if-block here, to help in this way, when other command types are
> added.

If that is the case then this check should also be added along with
that future patches, I mean when we will allow UPDATE then it makes
sense of that check and that time will have to get rid of that assert
as well.  I mean complete function will be in sync.  But now this
check looks a bit odd.  I think that is my opinion but otherwise, I
don't have any strong objection to that check.

> > >
> > > > 2.
> > But the cost_modifytable is setting the number of rows to 0 in
> > ModifyTablePath whereas the cost_gather will multiply the rows from
> > the GatherPath.  I can not see the rows from GatherPath is ever set to
> > 0.
> >
>
> OK, I see the problem now.
> It works the way I described, but currently there's a problem with
> where it's getting the rows for the GatherPath, so there's a
> disconnect.
> When generating the GatherPaths, it's currently always taking the
> rel's (possibly estimated) row-count, rather than using the rows from
> the cheapest_partial_path (the subpath: ModifyTablePath). See
> generate_gather_paths().
> So when generate_useful_gather_paths() is called from the planner, for
> the added partial paths for Parallel INSERT, it should be passing
> "true" for the "override_rows" parameter, not "false".
>
> So I think that in the 0004 patch, the if-block:
>
> +   if (parallel_modify_partial_path_added)
> +   {
> +   final_rel->rows = current_rel->rows;/* ??? why
> hasn't this been
> +
>   * set above somewhere  */
> +   generate_useful_gather_paths(root, final_rel, false);
> +   }
> +
>
> can be reduced to:
>
> +   if (parallel_modify_partial_path_added)
> +   generate_useful_gather_paths(root, final_rel, true);
> +

Okay. I will check this part after you provide an updated version.  Thanks.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-03 Thread Amit Kapila

On Thu, Mar 4, 2021 at 7:16 AM Greg Nancarrow  wrote:
>
> On Thu, Mar 4, 2021 at 1:07 AM Dilip Kumar  wrote:
> >
> > On Wed, Mar 3, 2021 at 5:50 PM Greg Nancarrow  wrote:
> > >
> > > Asserts are normally only enabled in a debug-build, so for a
> > > release-build that Assert has no effect.
> > > The Assert is being used as a sanity-check that the function is only
> > > currently getting called for INSERT, because that's all it currently
> > > supports.
> >
> > I agree that assert is only for debug build, but once we add and
> > assert that means we are sure that it should only be called for insert
> > and if it is called for anything else then it is a programming error
> > from the caller's side.  So after the assert, adding if check for the
> > same condition doesn't look like a good idea.  That means we think
> > that the code can hit assert in the debug mode so we need an extra
> > protection in the release mode.
> >
>
> The if-check isn't there for "extra protection".
> It's to help with future changes; inside that if-block is code only
> applicable to INSERT (and to UPDATE - sorry, before I said DELETE), as
> the code-comment indicates, whereas the rest of the function is
> generic to all command types. I don't see any problem with having this
> if-block here, to help in this way, when other command types are
> added.
>

I think for Update/Delete, we might not do parallel-safety checks by
calling target_rel_max_parallel_hazard_recurse especially because
partitions are handled differently for Updates and Deletes (see
inheritance_planner()). I think what Dilip is telling doesn't sound
unreasonable to me. So, even, if we want to extend it later by making
some checks specific to Inserts/Updates, we can do it at that time.
The comments you have at that place are sufficient to tell that in the
future we can use those checks for Updates as well. They will need
some adjustment if we remove that check but the intent is clear.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-03 Thread Greg Nancarrow

On Thu, Mar 4, 2021 at 1:07 AM Dilip Kumar  wrote:
>
> On Wed, Mar 3, 2021 at 5:50 PM Greg Nancarrow  wrote:
> >
> > Asserts are normally only enabled in a debug-build, so for a
> > release-build that Assert has no effect.
> > The Assert is being used as a sanity-check that the function is only
> > currently getting called for INSERT, because that's all it currently
> > supports.
>
> I agree that assert is only for debug build, but once we add and
> assert that means we are sure that it should only be called for insert
> and if it is called for anything else then it is a programming error
> from the caller's side.  So after the assert, adding if check for the
> same condition doesn't look like a good idea.  That means we think
> that the code can hit assert in the debug mode so we need an extra
> protection in the release mode.
>

The if-check isn't there for "extra protection".
It's to help with future changes; inside that if-block is code only
applicable to INSERT (and to UPDATE - sorry, before I said DELETE), as
the code-comment indicates, whereas the rest of the function is
generic to all command types. I don't see any problem with having this
if-block here, to help in this way, when other command types are
added.

>
> >
> > > 2.
> > > In patch 0004,  We are still charging the parallel_tuple_cost for each
> > > tuple, are we planning to do something about this?  I mean after this
> > > patch tuple will not be transferred through the tuple queue, so we
> > > should not add that cost.
> > >
> >
> > I believe that for Parallel INSERT, cost_modifytable() will set
> > path->path.rows to 0 (unless there is a RETURNING list), so, for
> > example, in cost_gather(), it will not add to the run_cost as
> > "run_cost += parallel_tuple_cost * path->path.rows;"
> >
>
> But the cost_modifytable is setting the number of rows to 0 in
> ModifyTablePath whereas the cost_gather will multiply the rows from
> the GatherPath.  I can not see the rows from GatherPath is ever set to
> 0.
>

OK, I see the problem now.
It works the way I described, but currently there's a problem with
where it's getting the rows for the GatherPath, so there's a
disconnect.
When generating the GatherPaths, it's currently always taking the
rel's (possibly estimated) row-count, rather than using the rows from
the cheapest_partial_path (the subpath: ModifyTablePath). See
generate_gather_paths().
So when generate_useful_gather_paths() is called from the planner, for
the added partial paths for Parallel INSERT, it should be passing
"true" for the "override_rows" parameter, not "false".

So I think that in the 0004 patch, the if-block:

+   if (parallel_modify_partial_path_added)
+   {
+   final_rel->rows = current_rel->rows;/* ??? why
hasn't this been
+
  * set above somewhere  */
+   generate_useful_gather_paths(root, final_rel, false);
+   }
+

can be reduced to:

+   if (parallel_modify_partial_path_added)
+   generate_useful_gather_paths(root, final_rel, true);
+

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-03 Thread Dilip Kumar

On Wed, Mar 3, 2021 at 5:50 PM Greg Nancarrow  wrote:
>
> Asserts are normally only enabled in a debug-build, so for a
> release-build that Assert has no effect.
> The Assert is being used as a sanity-check that the function is only
> currently getting called for INSERT, because that's all it currently
> supports.

I agree that assert is only for debug build, but once we add and
assert that means we are sure that it should only be called for insert
and if it is called for anything else then it is a programming error
from the caller's side.  So after the assert, adding if check for the
same condition doesn't look like a good idea.  That means we think
that the code can hit assert in the debug mode so we need an extra
protection in the release mode.

>
> > 2.
> > In patch 0004,  We are still charging the parallel_tuple_cost for each
> > tuple, are we planning to do something about this?  I mean after this
> > patch tuple will not be transferred through the tuple queue, so we
> > should not add that cost.
> >
>
> I believe that for Parallel INSERT, cost_modifytable() will set
> path->path.rows to 0 (unless there is a RETURNING list), so, for
> example, in cost_gather(), it will not add to the run_cost as
> "run_cost += parallel_tuple_cost * path->path.rows;"
>

But the cost_modifytable is setting the number of rows to 0 in
ModifyTablePath whereas the cost_gather will multiply the rows from
the GatherPath.  I can not see the rows from GatherPath is ever set to
0.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-03 Thread Greg Nancarrow

On Wed, Mar 3, 2021 at 10:45 PM Dilip Kumar  wrote:
>
> On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow  wrote:
> >
> > Posting an updated set of patches that includes Amit Langote's patch
> > to the partition tracking scheme...
> > (the alternative of adding partitions to the range table needs further
> > investigation)
>
> I was reviewing your latest patch and I have a few comments.
>
> In patch 0001
> 1.
> +static bool
> +target_rel_max_parallel_hazard_recurse(Relation rel,
> +   CmdType command_type,
> +   max_parallel_hazard_context *context)
> +{
> + TupleDesc tupdesc;
> + int attnum;
> +
> + /* Currently only CMD_INSERT is supported */
> + Assert(command_type == CMD_INSERT);
> …….
> + /*
> + * Column default expressions and check constraints are only applicable to
> + * INSERT and UPDATE, but since only parallel INSERT is currently supported,
> + * only command_type==CMD_INSERT is checked here.
> + */
> + if (command_type == CMD_INSERT)
>
> If we have an assert at the beginning of the function, then why do we
> want to put the if check here?
>

Asserts are normally only enabled in a debug-build, so for a
release-build that Assert has no effect.
The Assert is being used as a sanity-check that the function is only
currently getting called for INSERT, because that's all it currently
supports.
Further code below specifically checks for INSERT because the block
contains code that is specific to INSERT (and actually DELETE too, but
that is not currently supported - code is commented accordingly).
In future, this function will support DELETE and UPDATE, and the
Assert serves to alert the developer ASAP that it needs updating to
support those.


> 2.
> In patch 0004,  We are still charging the parallel_tuple_cost for each
> tuple, are we planning to do something about this?  I mean after this
> patch tuple will not be transferred through the tuple queue, so we
> should not add that cost.
>

I believe that for Parallel INSERT, cost_modifytable() will set
path->path.rows to 0 (unless there is a RETURNING list), so, for
example, in cost_gather(), it will not add to the run_cost as
"run_cost += parallel_tuple_cost * path->path.rows;"


Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-03 Thread Dilip Kumar

On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow  wrote:
>
> Posting an updated set of patches that includes Amit Langote's patch
> to the partition tracking scheme...
> (the alternative of adding partitions to the range table needs further
> investigation)

I was reviewing your latest patch and I have a few comments.

In patch 0001
1.
+static bool
+target_rel_max_parallel_hazard_recurse(Relation rel,
+   CmdType command_type,
+   max_parallel_hazard_context *context)
+{
+ TupleDesc tupdesc;
+ int attnum;
+
+ /* Currently only CMD_INSERT is supported */
+ Assert(command_type == CMD_INSERT);
…….
+ /*
+ * Column default expressions and check constraints are only applicable to
+ * INSERT and UPDATE, but since only parallel INSERT is currently supported,
+ * only command_type==CMD_INSERT is checked here.
+ */
+ if (command_type == CMD_INSERT)

If we have an assert at the beginning of the function, then why do we
want to put the if check here?

2.
In patch 0004,  We are still charging the parallel_tuple_cost for each
tuple, are we planning to do something about this?  I mean after this
patch tuple will not be transferred through the tuple queue, so we
should not add that cost.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-03 Thread Amit Kapila

On Wed, Mar 3, 2021 at 12:52 PM Greg Nancarrow  wrote:
>
> On Tue, Mar 2, 2021 at 11:19 PM Amit Kapila  wrote:
> >
> > On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow  wrote:
>
> > 2.
> > /*
> > + * Prepare for entering parallel mode by assigning a
> > + * FullTransactionId, to be included in the transaction state that is
> > + * serialized in the parallel DSM.
> > + */
> > + (void) GetCurrentTransactionId();
> > + }
> >
> > Similar to the previous comment this also seems to indicate that we
> > require TransactionId for workers. If that is not the case then this
> > comment also needs to be modified.
> >
>
> I'll update to comment to remove the part about the serialization (as
> this always happens, not a function of the patch) and mention it is
> needed to avoid attempting to assign a FullTransactionId in
> parallel-mode.
>

Okay, but just use TransactionId in comments if there is a need, we
still don't use FullTransactionId for the heap.

> > 4.
> >  * Assess whether it's feasible to use parallel mode for this query. We
> >   * can't do this in a standalone backend, or if the command will try to
> > - * modify any data, or if this is a cursor operation, or if GUCs are set
> > - * to values that don't permit parallelism, or if parallel-unsafe
> > - * functions are present in the query tree.
> > + * modify any data using a CTE, or if this is a cursor operation, or if
> > + * GUCs are set to values that don't permit parallelism, or if
> > + * parallel-unsafe functions are present in the query tree.
> >
> > This comment change is not required because this is quite similar to
> > what we do for CTAS. Your further comment changes in this context are
> > sufficient.
>
> An INSERT modifies data, so according to the original comment, then
> it's not feasible to use parallel mode, because the command tries to
> modify data ("We can't do this in a standalone backend, or if the
> command will try to modify any data ...").
> Except now we need to use parallel-mode for "INSERT with parallel
> SELECT", and INSERT is a command that modifies data.
> So isn't the comment change actually needed?
>

If we want to change, we can say "if the command will try to modify
any data except for inserts ..." or something like that but saying
only about CTE is not correct because then what about updates and
deletes.

>
> >
> > 5.
> > + (IsModifySupportedInParallelMode(parse->commandType) &&
> > + is_parallel_possible_for_modify(parse))) &&
> >
> > I think it would be better if we move the check
> > IsModifySupportedInParallelMode inside
> > is_parallel_possible_for_modify.
>
> The reason it is done outside of is_parallel_possible_for_modify() is
> to avoid the function overhead of calling
> is_parallel_possible_for_modify() for SELECT statements, only to
> return without doing anything. Note also that
> IsModifySupportedInParallelMode() is an inline function.
>

I don't see any reason to be worried about that here. It is more
important for code and checks to look simpler.

> >Also, it might be better to name this
> > function as is_parallel_allowed_for_modify.
>
> I do tend to think that in this case "possible" is better than "allowed".
> Only the "parallel_dml" GUC test is checking for something that is "allowed".
> The other two checks are checking for things that determine whether
> parallel-mode is even "possible".
>

I think I don't like this proposed name for the function. See, if you
can think of something better.

> >
> > 6.
> > @@ -260,6 +265,21 @@ set_plan_references(PlannerInfo *root, Plan *plan)
> >   */
> >   add_rtes_to_flat_rtable(root, false);
> >
> > + /*
> > + * If modifying a partitioned table, add its parallel-safety-checked
> > + * partitions too to glob->relationOids, to register them as plan
> > + * dependencies. This is only really needed in the case of a parallel
> > + * plan, so that if parallel-unsafe properties are subsequently defined
> > + * on the partitions, the cached parallel plan will be invalidated and
> > + * a non-parallel plan will be generated.
> > + */
> > + if (IsModifySupportedInParallelMode(root->parse->commandType))
> > + {
> > + if (glob->partitionOids != NIL && glob->parallelModeNeeded)
> > + glob->relationOids =
> > + list_concat(glob->relationOids, glob->partitionOids);
> > + }
> > +
> >
> > Isn't it possible to add these partitionOids in set_plan_refs with the
> > T_Gather(Merge) node h

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-02 Thread Greg Nancarrow

On Tue, Mar 2, 2021 at 11:19 PM Amit Kapila  wrote:
>
> On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow  wrote:
> >
> > Posting an updated set of patches that includes Amit Langote's patch
> > to the partition tracking scheme...
> >
>
> Few comments:
> ==
> 1.
> "Prior to entering parallel-mode for execution of INSERT with parallel SELECT,
> a TransactionId is acquired and assigned to the current transaction state 
> which
> is then serialized in the parallel DSM for the parallel workers to use."
>
> This point in the commit message seems a bit misleading to me because
> IIUC we need to use transaction id only in the master backend for the
> purpose of this patch. And, we are doing this at an early stage
> because we don't allow to allocate it in parallel-mode. I think here
> we should mention in some way that this has a disadvantage that if the
> underneath select doesn't return any row then this transaction id
> allocation would not be of use. However, that shouldn't happen in
> practice in many cases. But, if I am missing something and this is
> required by parallel workers then we can ignore what I said.
>

I'll remove the part that says "which is then serialized ...", as this
may imply that the patch is doing this (which of course it is not,
Postgres always serializes the transaction state in the parallel DSM
for the parallel workers to use).
The acquiring of the TransactionId up-front, prior to entering
parallel-mode, is absolutely required because heap_insert() lazily
tries to get the current transaction-id and attempts to assign one if
the current transaction state hasn't got one, and this assignment is
not allowed in parallel-mode ("ERROR:  cannot assign XIDs during a
parallel operation"), and this mode is required for use of parallel
SELECT.
I'll add the part about the disadvantage of the transaction-id not
being used if the underlying SELECT doesn't return any rows.

> 2.
> /*
> + * Prepare for entering parallel mode by assigning a
> + * FullTransactionId, to be included in the transaction state that is
> + * serialized in the parallel DSM.
> + */
> + (void) GetCurrentTransactionId();
> + }
>
> Similar to the previous comment this also seems to indicate that we
> require TransactionId for workers. If that is not the case then this
> comment also needs to be modified.
>

I'll update to comment to remove the part about the serialization (as
this always happens, not a function of the patch) and mention it is
needed to avoid attempting to assign a FullTransactionId in
parallel-mode.

> 3.
>  static int common_prefix_cmp(const void *a, const void *b);
>
> -
>  
> /*
>
> Spurious line removal.
>

I will reinstate that empty line.

> 4.
>  * Assess whether it's feasible to use parallel mode for this query. We
>   * can't do this in a standalone backend, or if the command will try to
> - * modify any data, or if this is a cursor operation, or if GUCs are set
> - * to values that don't permit parallelism, or if parallel-unsafe
> - * functions are present in the query tree.
> + * modify any data using a CTE, or if this is a cursor operation, or if
> + * GUCs are set to values that don't permit parallelism, or if
> + * parallel-unsafe functions are present in the query tree.
>
> This comment change is not required because this is quite similar to
> what we do for CTAS. Your further comment changes in this context are
> sufficient.

An INSERT modifies data, so according to the original comment, then
it's not feasible to use parallel mode, because the command tries to
modify data ("We can't do this in a standalone backend, or if the
command will try to modify any data ...").
Except now we need to use parallel-mode for "INSERT with parallel
SELECT", and INSERT is a command that modifies data.
So isn't the comment change actually needed?

>
> 5.
> + (IsModifySupportedInParallelMode(parse->commandType) &&
> + is_parallel_possible_for_modify(parse))) &&
>
> I think it would be better if we move the check
> IsModifySupportedInParallelMode inside
> is_parallel_possible_for_modify.

The reason it is done outside of is_parallel_possible_for_modify() is
to avoid the function overhead of calling
is_parallel_possible_for_modify() for SELECT statements, only to
return without doing anything. Note also that
IsModifySupportedInParallelMode() is an inline function.

>Also, it might be better to name this
> function as is_parallel_allowed_for_modify.

I do tend to think that in this case "possible" is better than "allowed".
Only the "parallel_dml" GUC test is checking for something that is "allowed".
The other t

Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-02 Thread Amit Kapila

On Mon, Mar 1, 2021 at 9:08 AM Greg Nancarrow  wrote:
>
> Posting an updated set of patches that includes Amit Langote's patch
> to the partition tracking scheme...
>

Few comments:
==
1.
"Prior to entering parallel-mode for execution of INSERT with parallel SELECT,
a TransactionId is acquired and assigned to the current transaction state which
is then serialized in the parallel DSM for the parallel workers to use."

This point in the commit message seems a bit misleading to me because
IIUC we need to use transaction id only in the master backend for the
purpose of this patch. And, we are doing this at an early stage
because we don't allow to allocate it in parallel-mode. I think here
we should mention in some way that this has a disadvantage that if the
underneath select doesn't return any row then this transaction id
allocation would not be of use. However, that shouldn't happen in
practice in many cases. But, if I am missing something and this is
required by parallel workers then we can ignore what I said.

2.
/*
+ * Prepare for entering parallel mode by assigning a
+ * FullTransactionId, to be included in the transaction state that is
+ * serialized in the parallel DSM.
+ */
+ (void) GetCurrentTransactionId();
+ }

Similar to the previous comment this also seems to indicate that we
require TransactionId for workers. If that is not the case then this
comment also needs to be modified.

3.
 static int common_prefix_cmp(const void *a, const void *b);

-
 /*

Spurious line removal.

4.
 * Assess whether it's feasible to use parallel mode for this query. We
  * can't do this in a standalone backend, or if the command will try to
- * modify any data, or if this is a cursor operation, or if GUCs are set
- * to values that don't permit parallelism, or if parallel-unsafe
- * functions are present in the query tree.
+ * modify any data using a CTE, or if this is a cursor operation, or if
+ * GUCs are set to values that don't permit parallelism, or if
+ * parallel-unsafe functions are present in the query tree.

This comment change is not required because this is quite similar to
what we do for CTAS. Your further comment changes in this context are
sufficient.

5.
+ (IsModifySupportedInParallelMode(parse->commandType) &&
+ is_parallel_possible_for_modify(parse))) &&

I think it would be better if we move the check
IsModifySupportedInParallelMode inside
is_parallel_possible_for_modify. Also, it might be better to name this
function as is_parallel_allowed_for_modify.

6.
@@ -260,6 +265,21 @@ set_plan_references(PlannerInfo *root, Plan *plan)
  */
  add_rtes_to_flat_rtable(root, false);

+ /*
+ * If modifying a partitioned table, add its parallel-safety-checked
+ * partitions too to glob->relationOids, to register them as plan
+ * dependencies. This is only really needed in the case of a parallel
+ * plan, so that if parallel-unsafe properties are subsequently defined
+ * on the partitions, the cached parallel plan will be invalidated and
+ * a non-parallel plan will be generated.
+ */
+ if (IsModifySupportedInParallelMode(root->parse->commandType))
+ {
+ if (glob->partitionOids != NIL && glob->parallelModeNeeded)
+ glob->relationOids =
+ list_concat(glob->relationOids, glob->partitionOids);
+ }
+

Isn't it possible to add these partitionOids in set_plan_refs with the
T_Gather(Merge) node handling? That looks like a more natural place to
me, if that is feasible then we don't need parallelModeNeeded check
and maybe we don't need to even check IsModifySupportedInParallelMode
but I am less sure of the second check requirement.

7.
+#include "access/table.h"
+#include "access/xact.h"
 #include "access/transam.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -24,6 +27,8 @@
 #include "optimizer/planmain.h"
 #include "optimizer/planner.h"
 #include "optimizer/tlist.h"
+#include "parser/parsetree.h"
+#include "partitioning/partdesc.h"

I think apart from xact.h, we don't need new additional includes.

8. I see that in function target_rel_max_parallel_hazard, we don't
release the lock on the target table after checking parallel-safety
but then in function target_rel_max_parallel_hazard_recurse, we do
release the lock on partition tables after checking their
parallel-safety. Can we add some comments to explain these two cases?

9. I noticed that the tests added for the first patch in
v18-0002-Parallel-SELECT-for-INSERT-INTO-.-SELECT-tests-and-doc take
even more time than select_parallel. I think we should see if we can
reduce the timing of this test. I haven't studied it in detail but
maybe some Inserts execution can be avoided. In some cases like below
just checking th

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-28 Thread Amit Langote

On Mon, Mar 1, 2021 at 12:38 PM Greg Nancarrow  wrote:
> On Fri, Feb 26, 2021 at 5:50 PM Amit Langote  wrote:
> >
> > On Fri, Feb 26, 2021 at 3:35 PM Greg Nancarrow  wrote:
> > > On Fri, Feb 26, 2021 at 4:07 PM Amit Langote  
> > > wrote:
> > > > The attached patch fixes this, although I am starting to have second
> > > > thoughts about how we're tracking partitions in this patch.  Wondering
> > > > if we should bite the bullet and add partitions into the main range
> > > > table instead of tracking them separately in partitionOids, which
> > > > might result in a cleaner patch overall.
> > >
> > > Thanks Amit,
> > >
> > > I was able to reproduce the problem using your instructions (though I
> > > found I had to run that explain an extra time, in order to hit the
> > > breakpoint).
> > > Also, I can confirm that the problem doesn't occur after application
> > > of your patch.
> > > I'll leave it to your better judgement as to what to do next  - if you
> > > feel the current tracking method is not sufficient
> >
> > Just to be clear, I think the tracking method added by the patch is
> > sufficient AFAICS for the problems we were able to discover.  The
> > concern I was trying to express is that we seem to be duct-taping
> > holes in our earlier chosen design to track partitions separately from
> > the range table.  If we had decided to add partitions to the range
> > table as "extra" target relations from the get-go, both the issues I
> > mentioned with cached plans -- partitions not being counted as a
> > dependency and partitions not being locked before execution -- would
> > have been prevented.  I haven't fully grasped how invasive that design
> > would be, but it sure sounds like it would be a bit more robust.
>
> Posting an updated set of patches that includes Amit Langote's patch
> to the partition tracking scheme...

Thanks Greg.

> (the alternative of adding partitions to the range table needs further
> investigation)

I looked at this today with an intention to write and post a PoC
patch, but was quickly faced with the task of integrating that design
with how ModifyTable works for inserts.  Specifically if, in addition
to adding partitions to the range table, we also their RT indexes to
ModifyTable.resultRelations, then maybe we'll need to rethink our
executor tuple routing code.  That code currently tracks the insert's
target partitions separately from the range table, exactly because
they are not there to begin with.  So if we change the latter as in
this hypothetical PoC, maybe we should also revisit the former.  Also,
it would not be great that the planner's arrays in PlannerInfo would
get longer as a result of Query.rtable getting longer by adding
partitions, thus making all the loops over those arrays slower for no
benefit.

So, let's do this in a follow-on patch, if at all, instead of
considering this topic a blocker for committing the current set of
patches.

--
Amit Langote
EDB: http://www.enterprisedb.com

RE: Parallel INSERT (INTO ... SELECT ...)

2021-02-26 Thread tanghy.f...@fujitsu.com

From: Tsunakawa, Takayuki/綱川 貴之 
>the current patch showd nice performance improvement in some (many?) patterns. 
> 
>So, I think it can be committed in PG 14, when it has addressed the plan cache 
>issue that Amit Langote-san posed.

Agreed. 
I summarized my test results for the current patch(V18) in the attached 
file(Please use VSCode or Notepad++ to open it, the context is a bit long). 
As you can see, the performance is good in many patterns(Please refer to  my 
test script, test NO in text is correspond to number in sql file).
If you have any question on my test, please feel free to ask.

Regards,
Tang


max_parallel_workers_per_gather=2
-
Test NO |   test case   

|   query plan  
|   patched(ms) |   master(ms)  |   %reg
--
1   |   Test INSERT with underlying query.  

|       parallel INSERT+SELECT  |   20894.069   |   
27030.623   |   -23%
2   |   Test INSERT into a table with a foreign key.

|   parallel SELECT |   80921.960   |   
84416.775   |   -4%
3   |   Test INSERT with ON CONFLICT ... DO UPDATE  

|   non-parallel|   28111.186   |   
29531.922   |   -5%
4   |   Test INSERT with parallel_leader_participation 
= off;   |  
 parallel INSERT+SELECT  |   2799.354|   5131.874|  
 -45%
5   |   Test INSERT with max_parallel_workers=0;

|   non-parallel|   27006.385   |   
26962.279   |   0%
6   |   Test INSERT with parallelized aggregate 

|   parallel SELECT |   95482.307   |   
105090.301  |   -9%
7   |   Test INSERT with parallel bitmap heap scan  

|       parallel INSERT+SELECT  |   606.664 |   844.471 
|   -28%
8   |   Test INSERT with parallel append

|       parallel INSERT+SELECT  |   109.896 |   
239.198 |   -54%
9   |   Test INSERT with parallel index scan

|       parallel INSERT+SELECT  |   552.481 |   
1238.079|   -55%
10  |   Test INSERT with parallel-safe index expression 
    
|       parallel INSERT+SELECT  |   1453.686|   2908.489
|   -50%
11  |   Test INSERT with parallel-unsafe index 
expression  
 |   non-parallel|   2828.559|  
 2837.570|   0%
12  |   Test INSERT with underlying query - and 
RETURNING (no projection)   |   
parallel INSERT+SELECT  |   417.493 |   685.430 |   
-39%
13  |   Test INSERT with underlying ordered query - and 
RETURNING (no projection)   |   parallel SELECT 
|   971.095 |   1717.502|   -43%
14  |   Test INSERT with underlying ordered query - and 
RETURNING (with projection) |   parallel SELECT 
|

RE: Parallel INSERT (INTO ... SELECT ...)

2021-02-26 Thread tsunakawa.ta...@fujitsu.com

From: Hou, Zhijie/侯 志杰 
> After doing some more tests on it (performance degradation will not happen
> when source table is out of order).
> I think we can say the performance degradation is related to the order of the
> data in source table.
...
> So, the order of data 's influence seems a normal phenomenon, I think it seems
> we do not need to do anything about it (currently).
> It seems better to mark it as todo which we can improve this in the future.
> 
> (Since the performance degradation in parallel bitmap is because of the lock 
> in
> _bt_search, It will not always happen when the target table already have data,
> less race condition will happened when parallel insert into a evenly 
> distributed
> btree).

I think so, too.  The slowness of parallel insert operation due to index page 
contention, and index bloat, would occur depending on the order of the index 
key values of source records.

I guess other DBMSs exhibit similar phenomenon, but I couldn't find such 
description in the manual, whitepapers, or several books on Oracle.  One 
relevant excerpt is the following.  This is about parallel index build.  Oracle 
tries to minimize page contention and index bloat.  This is completely my 
guess, but they may do similar things in parallel INSERT SELECT, because Oracle 
holds an exclusive lock on the target table.  SQL Server also acquires an 
exclusive lock.  Maybe we can provide an option to do so in the future.

https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-tips.html#GUID-08A08783-C243-4872-AFFA-56B603F1F0F5
--
Optimizing Performance by Creating Indexes in Parallel
...
Multiple processes can work simultaneously to create an index. By dividing the 
work necessary to create an index among multiple server processes, Oracle 
Database can create the index more quickly than if a single server process 
created the index serially.

Parallel index creation works in much the same way as a table scan with an 
ORDER BY clause. The table is randomly sampled and a set of index keys is found 
that equally divides the index into the same number of pieces as the DOP. A 
first set of query processes scans the table, extracts key-rowid pairs, and 
sends each pair to a process in a second set of query processes based on a key. 
Each process in the second set sorts the keys and builds an index in the usual 
fashion. After all index pieces are built, the parallel execution coordinator 
simply concatenates the pieces (which are ordered) to form the final index.
...
When creating an index in parallel, the STORAGE clause refers to the storage of 
each of the subindexes created by the query server processes. Therefore, an 
index created with an INITIAL value of 5 MB and a DOP of 12 consumes at least 
60 MB of storage during index creation because each process starts with an 
extent of 5 MB. When the query coordinator process combines the sorted 
subindexes, some extents might be trimmed, and the resulting index might be 
smaller than the requested 60 MB.
--


IIRC, the current patch showd nice performance improvement in some (many?) 
patterns.  So, I think it can be committed in PG 14, when it has addressed the 
plan cache issue that Amit Langote-san posed.  I remember the following 
issues/comments are pending, but they are not blockers:

1. Insert operation is run serially when the target table has a foreign key, 
sequence or identity column.
This can be added later based on the current design without requiring rework.  
That is, the current patch leaves no debt.  (Personally, foreign key and 
sequence support will also be wanted in PG 14.  We may try them in the last CF 
once the current patch is likely to be committable.)

2. There's a plausible reason for the performance variation and index bloat 
with the bitmap scan case.
Ideally, we want to come up with a solution that can be incorporated in PG 15.

Or, it may be one conclusion that we can't prevent performance degradation in 
all cases.  That may be one hidden reason why Oracle and SQL Server doesn't 
enable parallel DML by default.

We can advise the user in the manual that parallel DML is not always faster 
than serial operation so he should test performance by enabling and disabling 
parallel DML.  Also, maybe we should honestly state that indexes can get a bit 
bigger after parallel insert than after serial insert, and advise the user to 
do REINDEX CONCURRENTLY if necessary.

3. The total time of parallel execution can get longer because of unbalanced 
work distribution among parallel workers.
This seems to be an existing problem, so we can pursue the improvement later, 
hopefully before the release of PG 14.


Does anyone see any problem with committing the current patch (after polishing 
it)?


Regards
Takayuki Tsunakawa

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-25 Thread Amit Langote

On Fri, Feb 26, 2021 at 3:35 PM Greg Nancarrow  wrote:
> On Fri, Feb 26, 2021 at 4:07 PM Amit Langote  wrote:
> > The attached patch fixes this, although I am starting to have second
> > thoughts about how we're tracking partitions in this patch.  Wondering
> > if we should bite the bullet and add partitions into the main range
> > table instead of tracking them separately in partitionOids, which
> > might result in a cleaner patch overall.
>
> Thanks Amit,
>
> I was able to reproduce the problem using your instructions (though I
> found I had to run that explain an extra time, in order to hit the
> breakpoint).
> Also, I can confirm that the problem doesn't occur after application
> of your patch.
> I'll leave it to your better judgement as to what to do next  - if you
> feel the current tracking method is not sufficient

Just to be clear, I think the tracking method added by the patch is
sufficient AFAICS for the problems we were able to discover.  The
concern I was trying to express is that we seem to be duct-taping
holes in our earlier chosen design to track partitions separately from
the range table.  If we had decided to add partitions to the range
table as "extra" target relations from the get-go, both the issues I
mentioned with cached plans -- partitions not being counted as a
dependency and partitions not being locked before execution -- would
have been prevented.  I haven't fully grasped how invasive that design
would be, but it sure sounds like it would be a bit more robust.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-25 Thread Greg Nancarrow

On Fri, Feb 26, 2021 at 4:07 PM Amit Langote  wrote:
>
> Hi Greg,
>
> Replying to an earlier email in the thread because I think I found a
> problem relevant to the topic that was brought up.
>
> On Wed, Feb 17, 2021 at 10:34 PM Amit Langote  wrote:
> > On Wed, Feb 17, 2021 at 10:44 AM Greg Nancarrow  wrote:
> > > Is the use of "table_close(rel, NoLock)'' intentional? That will keep
> > > the lock (lockmode) until end-of-transaction.
> >
> > I think we always keep any locks on relations that are involved in a
> > plan until end-of-transaction.  What if a partition is changed in an
> > unsafe manner between being considered safe for parallel insertion and
> > actually performing the parallel insert?
>
> I realized that there is a race condition that will allow a concurrent
> backend to invalidate a parallel plan (for insert into a partitioned
> table) at a point in time when it's too late for plancache.c to detect
> it.  It has to do with how plancache.c locks the relations affected by
> a cached query and its cached plan.  Specifically,
> AcquireExecutorLocks(), which locks the relations that need to be
> locked before the plan could be considered safe to execute, does not
> notice the partitions that would have been checked for parallel safety
> when the plan was made.  Given that AcquireExecutorLocks() only loops
> over PlannedStmt.rtable and locks exactly the relations found there,
> partitions are not locked.  This means that a concurrent backend can,
> for example, add an unsafe trigger to a partition before a parallel
> worker inserts into it only to fail when it does.
>
> Steps to reproduce (tried with v19 set of patches):
>
> drop table if exists rp, foo;
> create table rp (a int) partition by range (a);
> create table rp1 partition of rp for values from (minvalue) to (0);
> create table rp2 partition of rp for values from (0) to (maxvalue);
> create table foo (a) as select generate_series(1, 100);
> set plan_cache_mode to force_generic_plan;
> set enable_parallel_dml to on;
> deallocate all;
> prepare q as insert into rp select * from foo where a%2 = 0;
> explain analyze execute q;
>
> At this point, attach a debugger to this backend and set a breakpoint
> in AcquireExecutorLocks() and execute q again:
>
> -- will execute the cached plan
> explain analyze execute q;
>
> Breakpoint will be hit.  Continue till return from the function and
> stop. Start another backend and execute this:
>
> -- will go through, because no lock still taken on the partition
> create or replace function make_table () returns trigger language
> plpgsql as $$ begin create table bar(); return null; end; $$ parallel
> unsafe;
> create trigger ai_rp2 after insert on rp2 for each row execute
> function make_table();
>
> Back to the debugger, hit continue to let the plan execute.  You
> should see this error:
>
> ERROR:  cannot start commands during a parallel operation
> CONTEXT:  SQL statement "create table bar()"
> PL/pgSQL function make_table() line 1 at SQL statement parallel worker
>
> The attached patch fixes this, although I am starting to have second
> thoughts about how we're tracking partitions in this patch.  Wondering
> if we should bite the bullet and add partitions into the main range
> table instead of tracking them separately in partitionOids, which
> might result in a cleaner patch overall.
>

Thanks Amit,

I was able to reproduce the problem using your instructions (though I
found I had to run that explain an extra time, in order to hit the
breakpoint).
Also, I can confirm that the problem doesn't occur after application
of your patch.
I'll leave it to your better judgement as to what to do next  - if you
feel the current tracking method is not sufficient

Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

2021-02-25 Thread houzj.f...@fujitsu.com

> I add some code to track the time spent in index operation.
> From the results[1], we can see more workers will bring more cost in
> _bt_search_insert() in each worker.
> After debugged, the most cost part is the following:
> -
>   /* drop the read lock on the page, then acquire one on its child
> */
>   *bufP = _bt_relandgetbuf(rel, *bufP, child, page_access);
> -
> 
> It seems the order of parallel bitmap scan's result will result in more lock 
> time in
> parallel insert.
> [1]---parallel bitmap scan-- worker 0:
> psql:test.sql:10: INFO:  insert index _bt_search_insert time:834735
> psql:test.sql:10: INFO:  insert index total time:1895330
> psql:test.sql:10: INFO:  insert tuple time:628064
> 
> worker 2:
> psql:test.sql:10: INFO:  insert index _bt_search_insert time:1552242
> psql:test.sql:10: INFO:  insert index total time:2374741
> psql:test.sql:10: INFO:  insert tuple time:314571
> 
> worker 4:
> psql:test.sql:10: INFO:  insert index _bt_search_insert time:2496424
> psql:test.sql:10: INFO:  insert index total time:3016150
> psql:test.sql:10: INFO:  insert tuple time:211741
> 
> 
> Based on above, I tried to change the order of results that bitmapscan return.
> In the original test, we prepare data in order (like: 
> generate_series(1,1,1)), If
> we change the order we insert the data in the source table, the performance
> degradation will not always happen[2].
> And table size difference will be small.
> 
> ---out of order source table-
> insert into x(a,b,c) select i,i+1,i+2 from generate_series(1,6) as 
> t(i)
> order by random();
> 
> 
> Test results when source table out of order(using bitmap heap scan):
> [2]
> Worker 0:
> Execution Time: 37028.006 ms
> Worker 2:
> Execution Time: 11355.153 ms
> Worker 4:
> Execution Time: 9273.398 ms
> 
> 
> So, this performance degradation issue seems related on the order of the data
> in the source table.
> It does not always happen. Do we need to do some specific fix for it ?

After doing some more tests on it (performance degradation will not happen when 
source table is out of order).
I think we can say the performance degradation is related to the order of the 
data in source table.
Also , In master branch, I found the order of data in source table will not 
affect the planner when generating plan(for select part).

As we can see from [1][2], If source table's data is in order, when set 
max_parallel_workers_per_gather to 4,
the planner will choose parallel seqscan here but it is actually slower than 
serial bitmapscan.
If data in source table is out of order, the performance degradation will not 
happen again[3][4].

So, the order of data 's influence seems a normal phenomenon, I think it seems 
we do not need to do anything about it (currently).
It seems better to mark it as todo which we can improve this in the future.

(Since the performance degradation in parallel bitmap is because of the lock in 
_bt_search, It will not always happen 
when the target table already have data, less race condition will happened when 
parallel insert into a evenly distributed btree).



Test result with sql: "postgres=# explain analyze verbose select a from x where 
a<8 or (a%2=0 and a>19900);"

[1]---Source table data in order and max_parallel_workers_per_gather = 
0---
Bitmap Heap Scan on public.x  (cost=22999.40..4991850.30 rows=91002 width=4) 
(actual time=45.445..201.322 rows=57 loops=1)
   Output: a
   Recheck Cond: ((x.a < 8) OR (x.a > 19900))
   Filter: ((x.a < 8) OR (((x.a % 2) = 0) AND (x.a > 19900)))
   Rows Removed by Filter: 50
   Heap Blocks: exact=5840
   ->  BitmapOr  (cost=22999.40..22999.40 rows=1242768 width=0) (actual 
time=44.622..44.637 rows=0 loops=1)
 ->  Bitmap Index Scan on x_a_idx  (cost=0.00..1575.70 rows=85217 
width=0) (actual time=3.622..3.634 rows=7 loops=1)
   Index Cond: (x.a < 8)
 ->  Bitmap Index Scan on x_a_idx  (cost=0.00..21378.20 rows=1157551 
width=0) (actual time=40.998..40.998 rows=100 loops=1)
   Index Cond: (x.a > 19900)
 Planning Time: 0.199 ms
 Execution Time: 232.327 ms

[2]---Source table data in order and max_parallel_workers_per_gather = 
4---
Gather  (cost=1000.00..2091183.08 rows=91002 width=4) (actual 
time=0.594..4216.197 rows=57 loops=1)
   Output: a
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on public.x  (cost=0.00..2081082.88 rows=22750 
width=4) (actual time=0.087..4197.570 rows=116000 loops=5)
 Output: a
 Filter: ((x.a < 8) OR (((x.a % 2) = 0) AND (x.a > 19900)))
 Rows Removed by Filter: 39884000

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-25 Thread Amit Langote

Hi Greg,

Replying to an earlier email in the thread because I think I found a
problem relevant to the topic that was brought up.

On Wed, Feb 17, 2021 at 10:34 PM Amit Langote  wrote:
> On Wed, Feb 17, 2021 at 10:44 AM Greg Nancarrow  wrote:
> > Is the use of "table_close(rel, NoLock)'' intentional? That will keep
> > the lock (lockmode) until end-of-transaction.
>
> I think we always keep any locks on relations that are involved in a
> plan until end-of-transaction.  What if a partition is changed in an
> unsafe manner between being considered safe for parallel insertion and
> actually performing the parallel insert?

I realized that there is a race condition that will allow a concurrent
backend to invalidate a parallel plan (for insert into a partitioned
table) at a point in time when it's too late for plancache.c to detect
it.  It has to do with how plancache.c locks the relations affected by
a cached query and its cached plan.  Specifically,
AcquireExecutorLocks(), which locks the relations that need to be
locked before the plan could be considered safe to execute, does not
notice the partitions that would have been checked for parallel safety
when the plan was made.  Given that AcquireExecutorLocks() only loops
over PlannedStmt.rtable and locks exactly the relations found there,
partitions are not locked.  This means that a concurrent backend can,
for example, add an unsafe trigger to a partition before a parallel
worker inserts into it only to fail when it does.

Steps to reproduce (tried with v19 set of patches):

drop table if exists rp, foo;
create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
create table foo (a) as select generate_series(1, 100);
set plan_cache_mode to force_generic_plan;
set enable_parallel_dml to on;
deallocate all;
prepare q as insert into rp select * from foo where a%2 = 0;
explain analyze execute q;

At this point, attach a debugger to this backend and set a breakpoint
in AcquireExecutorLocks() and execute q again:

-- will execute the cached plan
explain analyze execute q;

Breakpoint will be hit.  Continue till return from the function and
stop. Start another backend and execute this:

-- will go through, because no lock still taken on the partition
create or replace function make_table () returns trigger language
plpgsql as $$ begin create table bar(); return null; end; $$ parallel
unsafe;
create trigger ai_rp2 after insert on rp2 for each row execute
function make_table();

Back to the debugger, hit continue to let the plan execute.  You
should see this error:

ERROR:  cannot start commands during a parallel operation
CONTEXT:  SQL statement "create table bar()"
PL/pgSQL function make_table() line 1 at SQL statement parallel worker

The attached patch fixes this, although I am starting to have second
thoughts about how we're tracking partitions in this patch.  Wondering
if we should bite the bullet and add partitions into the main range
table instead of tracking them separately in partitionOids, which
might result in a cleaner patch overall.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

AcquireExecutorLocks-partition-insert.patch
Description: Binary data

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-25 Thread Amit Langote

On Wed, Feb 24, 2021 at 6:03 PM Amit Kapila  wrote:
> On Wed, Feb 24, 2021 at 2:14 PM Greg Nancarrow  wrote:
> > On Wed, Feb 24, 2021 at 3:12 PM Amit Kapila  wrote:
> > > On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow  
> > > wrote:
> > > > On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila  
> > > > wrote:
> > > > > > But the non-parallel plan was chosen (instead of a parallel plan)
> > > > > > because of parallel-safety checks on the partitions, which found
> > > > > > attributes of the partitions which weren't parallel-safe.
> > > > > > So it's not so clear to me that the dependency doesn't exist - the
> > > > > > non-parallel plan does in fact depend on the state of the 
> > > > > > partitions.
> > > > >
> > > > > Hmm, I think that is not what we can consider as a dependency.
> > > >
> > > > Then if it's not a dependency, then we shouldn't have to check the
> > > > attributes of the partitions for parallel-safety, to determine whether
> > > > we must use a non-parallel plan (or can use a parallel plan).
> > > > Except, of course, we do have to ...
> > >
> > > I don't think the plan-dependency and checking for parallel-safety are
> > > directly related.
> >
> > That is certainly not my understanding. Why do you think that they are
> > not directly related?
> > This whole issue came about because Amit L pointed out that there is a
> > need to add partition OIDs as plan-dependencies BECAUSE the checking
> > for parallel-safety and plan-dependency are related - since now, for
> > Parallel INSERT, we're executing extra parallel-safety checks that
> > check partition properties, so the resultant plan is dependent on the
> > partitions and their properties.
>
> He has pointed out an issue when the plan is parallel and you can see
> in that example that it fails if we didn't invalidate it. For
> non-parallel plans, there won't be any such issue.

Yes.  I checked around a bit (code and -hackers archive [1]) and came
away with the impression that there do not appear to be any set rules
for deciding which object changes to send an invalidation message for
(sending side: ddl, vacuum/analyze) and which items of a Query or a
Plan to track changes for (receiving side: planner, plancache).  One
could say the foremost rule is to avoid broken cached plans and only
in some really obvious cases do the thing that produces a better plan
[2].  While no compromises can be made for the former, whether or not
to go for the latter probably involves some cost-benefit analysis,
something we can probably revisit.

I don't think we're compromising by not adding the partition OIDs when
the insert plan is not parallel, but the benefits of adding them in
all cases are not so clear cut that maybe it's not worth it.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] Plan invalidation design:
https://www.postgresql.org/message-id/flat/20244.1171734513%40sss.pgh.pa.us

[2] ...contradicts what I said before, but I found this comment in
DefineIndex():

/*
 * The pg_index update will cause backends (including this one) to update
 * relcache entries for the index itself, but we should also send a
 * relcache inval on the parent table to force replanning of cached plans.
 * Otherwise existing sessions might fail to use the new index where it
 * would be useful.  (Note that our earlier commits did not create reasons
 * to replan; so relcache flush on the index itself was sufficient.)
 */
CacheInvalidateRelcacheByRelid(heaprelid.relId);

So this invalidates any plans referencing the index's parent relation
to trigger replanning so as to take the index into account.  The old
plans would not really be "unrunnable" without the index though.

RE: Parallel INSERT (INTO ... SELECT ...)

2021-02-24 Thread houzj.f...@fujitsu.com

> > It is quite possible what you are saying is correct but I feel that is
> > not this patch's fault. So, won't it better to discuss this in a
> > separate thread?
> >
> > Good use case but again, I think this can be done as a separate patch.
> 
> Agreed.
> I think even the current patch offers great benefits and can be committed in 
> PG
> 14, even if all my four feedback comments are left unaddressed.  I just 
> touched
> on them for completeness in terms of typically expected use cases.  They will
> probably be able to be implemented along the current design.
> 
> 
> 
> > I think here you are talking about the third patch (Parallel Inserts).
> > I guess if one has run inserts parallelly from psql then also similar
> > behavior would have been observed. For tables, it might lead to better
> > results once we have the patch discussed at [1]. Actually, this needs
> > more investigation.
> >
> > [1] -
> >
> https://www.postgresql.org/message-id/20200508072545.GA9701%40telsas
> o
> > ft.com
> 
> That looks interesting and worth a try.

Hi,

I test the bitmapscan with both multi-insert patch and parallel insert patch 
applied.
But the performance degradation and table size increased still happened in my 
machine.

To better analyze this issue, I did some more research on it (only applied 
parallel insert patch)

I add some code to track the time spent in index operation.
From the results[1], we can see more workers will bring more cost in 
_bt_search_insert() in each worker.
After debugged, the most cost part is the following:
-
/* drop the read lock on the page, then acquire one on its 
child */
*bufP = _bt_relandgetbuf(rel, *bufP, child, page_access);
-

It seems the order of parallel bitmap scan's result will result in more lock 
time in parallel insert.
[1]---parallel bitmap scan--
worker 0:
psql:test.sql:10: INFO:  insert index _bt_search_insert time:834735
psql:test.sql:10: INFO:  insert index total time:1895330
psql:test.sql:10: INFO:  insert tuple time:628064

worker 2:
psql:test.sql:10: INFO:  insert index _bt_search_insert time:1552242
psql:test.sql:10: INFO:  insert index total time:2374741
psql:test.sql:10: INFO:  insert tuple time:314571

worker 4:
psql:test.sql:10: INFO:  insert index _bt_search_insert time:2496424
psql:test.sql:10: INFO:  insert index total time:3016150
psql:test.sql:10: INFO:  insert tuple time:211741


Based on above, I tried to change the order of results that bitmapscan return.
In the original test, we prepare data in order (like: 
generate_series(1,1,1)),
If we change the order we insert the data in the source table, the performance 
degradation will not always happen[2].
And table size difference will be small.

---out of order source table-
insert into x(a,b,c) select i,i+1,i+2 from generate_series(1,6) as t(i) 
order by random();


Test results when source table out of order(using bitmap heap scan):
[2]
Worker 0:
Execution Time: 37028.006 ms
Worker 2:
Execution Time: 11355.153 ms
Worker 4:
Execution Time: 9273.398 ms


So, this performance degradation issue seems related on the order of the data 
in the source table.
It does not always happen. Do we need to do some specific fix for it ?

For multi-insert, I guess the reason why it does not solve the performance 
problem is that we do not actually have a api for multi-index insert,
Like the api for tableam rd_tableam->multi_insert(), so we still execute 
ExecInsertIndexTuples in a loop for the multi index insert.

I plan to do some more test for multi-insert and parallel insert with out of 
order source table.

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-24 Thread Greg Nancarrow

On Thu, Feb 25, 2021 at 12:19 AM Amit Kapila  wrote:
>
> On Wed, Feb 24, 2021 at 6:21 PM Greg Nancarrow  wrote:
> >
> > On Wed, Feb 24, 2021 at 10:38 PM Amit Kapila  
> > wrote:
> > > >
> > > > Thanks, I'll try it.
> > > > I did, however, notice a few concerns with your suggested alternative 
> > > > fix:
> > > > - It is not restricted to INSERT (as current fix is).
> > > >
> > >
> > > So what? The Select also has a similar problem.
> > >
> >
> > Yes, but you're potentially now adding overhead to every
> > SELECT/UPDATE/DELETE with a subquery, by the added recursive checking
> > and walking done by the new call to  max_parallel_hazard_walker().and
> > code block looking for a modifying CTE
> >
>
> Can you please share an example where it has added an overhead?
>
> > And anyway I'm not sure it's really right putting in a fix for SELECT
> > with a modifying CTE, into a patch that adds parallel INSERT
> > functionality - in any case you'd need to really spell this out in
> > code comments, as this is at best a temporary fix that would need to
> > be removed whenever the query rewriter is fixed to set hasModifyingCTE
> > correctly.
> >
> > > > - It does not set parse->hasModifyingCTE (as current fix does), so the
> > > > return value (PlannedStmt) from standard_planner() won't have
> > > > hasModifyingCTE set correctly in the cases where the rewriter doesn't
> > > > set it correctly (and I'm not sure what strange side effects ??? that
> > > > might have).
> > >
> > > Here end goal is not to set hasModifyingCTE but do let me know if you
> > > see any problem or impact.
> > >
> >
> > parse->hasModifyingCTE is not just used in the shortcut-test for
> > parallel-safety, its value is subsequently copied into the PlannedStmt
> > returned by standard_planner.
> > It's inconsistent to leave hasModifyingCTE FALSE when by the fix it
> > has found a modifying CTE.
> > Even if no existing tests detect an issue with this, PlannedStmt is
> > left with a bad hasModifyingCTE value in this case, so there is the
> > potential for something to go wrong.
> >
> > > > - Although the invocation of max_parallel_hazard_walker() on a RTE
> > > > subquery will "work" in finally locating your fix's added
> > > > "CommonTableExpr" parallel-safety disabling block for commandType !=
> > > > CMD_SELECT, it looks like it potentially results in checking and
> > > > walking over a lot of other stuff within the subquery not related to
> > > > CTEs. The current fix does a more specific and efficient search for a
> > > > modifying CTE.
> > > >
> > >
> > > I find the current fix proposed by you quite ad-hoc and don't think we
> > > can go that way.
> > >
> >
> > At least my current fix is very specific, efficient and clear in its
> > purpose, and suitably documented, so it is very clear when and how it
> > is to be removed, when the issue is fixed in the query rewriter.
> > Another concern with the alternative fix is that it always searches
> > for a modifying CTE, even when parse->hasModifyingCTE is true after
> > the query rewriter processing.
> >
>
> There is a check in standard_planner such that if
> parse->hasModifyingCTE is true then we won't try checking
> parallel-safety.
>

OK, I retract that last concern, parallel-safety checks are skipped
when parse->hasModifyingCTE is true.
Examples of overhead will need to wait until tomorrow (and would need
to test), but seems fairly clear max_parallel_hazard_walker() first
checks parallel-unsafe functions in the node, then does numerous
node-type checks before getting to CommonTableExpr - exactly how much
extra work would depend on the SQL.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-24 Thread Amit Kapila

On Wed, Feb 24, 2021 at 6:21 PM Greg Nancarrow  wrote:
>
> On Wed, Feb 24, 2021 at 10:38 PM Amit Kapila  wrote:
> > >
> > > Thanks, I'll try it.
> > > I did, however, notice a few concerns with your suggested alternative fix:
> > > - It is not restricted to INSERT (as current fix is).
> > >
> >
> > So what? The Select also has a similar problem.
> >
>
> Yes, but you're potentially now adding overhead to every
> SELECT/UPDATE/DELETE with a subquery, by the added recursive checking
> and walking done by the new call to  max_parallel_hazard_walker().and
> code block looking for a modifying CTE
>

Can you please share an example where it has added an overhead?

> And anyway I'm not sure it's really right putting in a fix for SELECT
> with a modifying CTE, into a patch that adds parallel INSERT
> functionality - in any case you'd need to really spell this out in
> code comments, as this is at best a temporary fix that would need to
> be removed whenever the query rewriter is fixed to set hasModifyingCTE
> correctly.
>
> > > - It does not set parse->hasModifyingCTE (as current fix does), so the
> > > return value (PlannedStmt) from standard_planner() won't have
> > > hasModifyingCTE set correctly in the cases where the rewriter doesn't
> > > set it correctly (and I'm not sure what strange side effects ??? that
> > > might have).
> >
> > Here end goal is not to set hasModifyingCTE but do let me know if you
> > see any problem or impact.
> >
>
> parse->hasModifyingCTE is not just used in the shortcut-test for
> parallel-safety, its value is subsequently copied into the PlannedStmt
> returned by standard_planner.
> It's inconsistent to leave hasModifyingCTE FALSE when by the fix it
> has found a modifying CTE.
> Even if no existing tests detect an issue with this, PlannedStmt is
> left with a bad hasModifyingCTE value in this case, so there is the
> potential for something to go wrong.
>
> > > - Although the invocation of max_parallel_hazard_walker() on a RTE
> > > subquery will "work" in finally locating your fix's added
> > > "CommonTableExpr" parallel-safety disabling block for commandType !=
> > > CMD_SELECT, it looks like it potentially results in checking and
> > > walking over a lot of other stuff within the subquery not related to
> > > CTEs. The current fix does a more specific and efficient search for a
> > > modifying CTE.
> > >
> >
> > I find the current fix proposed by you quite ad-hoc and don't think we
> > can go that way.
> >
>
> At least my current fix is very specific, efficient and clear in its
> purpose, and suitably documented, so it is very clear when and how it
> is to be removed, when the issue is fixed in the query rewriter.
> Another concern with the alternative fix is that it always searches
> for a modifying CTE, even when parse->hasModifyingCTE is true after
> the query rewriter processing.
>

There is a check in standard_planner such that if
parse->hasModifyingCTE is true then we won't try checking
parallel-safety.


-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-24 Thread Greg Nancarrow

On Wed, Feb 24, 2021 at 10:38 PM Amit Kapila  wrote:
> >
> > Thanks, I'll try it.
> > I did, however, notice a few concerns with your suggested alternative fix:
> > - It is not restricted to INSERT (as current fix is).
> >
>
> So what? The Select also has a similar problem.
>

Yes, but you're potentially now adding overhead to every
SELECT/UPDATE/DELETE with a subquery, by the added recursive checking
and walking done by the new call to  max_parallel_hazard_walker().and
code block looking for a modifying CTE
And anyway I'm not sure it's really right putting in a fix for SELECT
with a modifying CTE, into a patch that adds parallel INSERT
functionality - in any case you'd need to really spell this out in
code comments, as this is at best a temporary fix that would need to
be removed whenever the query rewriter is fixed to set hasModifyingCTE
correctly.

> > - It does not set parse->hasModifyingCTE (as current fix does), so the
> > return value (PlannedStmt) from standard_planner() won't have
> > hasModifyingCTE set correctly in the cases where the rewriter doesn't
> > set it correctly (and I'm not sure what strange side effects ??? that
> > might have).
>
> Here end goal is not to set hasModifyingCTE but do let me know if you
> see any problem or impact.
>

parse->hasModifyingCTE is not just used in the shortcut-test for
parallel-safety, its value is subsequently copied into the PlannedStmt
returned by standard_planner.
It's inconsistent to leave hasModifyingCTE FALSE when by the fix it
has found a modifying CTE.
Even if no existing tests detect an issue with this, PlannedStmt is
left with a bad hasModifyingCTE value in this case, so there is the
potential for something to go wrong.

> > - Although the invocation of max_parallel_hazard_walker() on a RTE
> > subquery will "work" in finally locating your fix's added
> > "CommonTableExpr" parallel-safety disabling block for commandType !=
> > CMD_SELECT, it looks like it potentially results in checking and
> > walking over a lot of other stuff within the subquery not related to
> > CTEs. The current fix does a more specific and efficient search for a
> > modifying CTE.
> >
>
> I find the current fix proposed by you quite ad-hoc and don't think we
> can go that way.
>

At least my current fix is very specific, efficient and clear in its
purpose, and suitably documented, so it is very clear when and how it
is to be removed, when the issue is fixed in the query rewriter.
Another concern with the alternative fix is that it always searches
for a modifying CTE, even when parse->hasModifyingCTE is true after
the query rewriter processing.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-24 Thread Amit Kapila

On Wed, Feb 24, 2021 at 4:30 PM Greg Nancarrow  wrote:
>
> On Wed, Feb 24, 2021 at 8:39 PM Amit Kapila  wrote:
> >
> > On Fri, Feb 19, 2021 at 6:56 AM Greg Nancarrow  wrote:
> > >
> > > Posting a new version of the patches, with the following updates:
> > >
> >
> > I am not happy with the below code changes, I think we need a better
> > way to deal with this.
> >
> > @@ -313,19 +314,35 @@ standard_planner(Query *parse, const char
> > *query_string, int cursorOptions,
> >   glob->transientPlan = false;
> >   glob->dependsOnRole = false;
> >
> > + if (IsModifySupportedInParallelMode(parse->commandType) &&
> > + !parse->hasModifyingCTE)
> > + {
> > + /*
> > + * FIXME
> > + * There is a known bug in the query rewriter: re-written queries with
> > + * a modifying CTE may not have the "hasModifyingCTE" flag set. When
> > + * that bug is fixed, this temporary fix must be removed.
> > + *
> > + * Note that here we've made a fix for this problem only for a
> > + * supported-in-parallel-mode table-modification statement (i.e.
> > + * INSERT), but this bug exists for SELECT too.
> > + */
> > + parse->hasModifyingCTE = query_has_modifying_cte(parse);
> > + }
> > +
> >
> > I understand that this is an existing bug but I am not happy with this
> > workaround. I feel it is better to check for modifyingCTE in
> > max_parallel_hazard_walker. See attached, this is atop
> > v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.
> >
>
> Thanks, I'll try it.
> I did, however, notice a few concerns with your suggested alternative fix:
> - It is not restricted to INSERT (as current fix is).
>

So what? The Select also has a similar problem.

> - It does not set parse->hasModifyingCTE (as current fix does), so the
> return value (PlannedStmt) from standard_planner() won't have
> hasModifyingCTE set correctly in the cases where the rewriter doesn't
> set it correctly (and I'm not sure what strange side effects ??? that
> might have).

Here end goal is not to set hasModifyingCTE but do let me know if you
see any problem or impact.

> - Although the invocation of max_parallel_hazard_walker() on a RTE
> subquery will "work" in finally locating your fix's added
> "CommonTableExpr" parallel-safety disabling block for commandType !=
> CMD_SELECT, it looks like it potentially results in checking and
> walking over a lot of other stuff within the subquery not related to
> CTEs. The current fix does a more specific and efficient search for a
> modifying CTE.
>

I find the current fix proposed by you quite ad-hoc and don't think we
can go that way.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-24 Thread Greg Nancarrow

On Wed, Feb 24, 2021 at 8:39 PM Amit Kapila  wrote:
>
> On Fri, Feb 19, 2021 at 6:56 AM Greg Nancarrow  wrote:
> >
> > Posting a new version of the patches, with the following updates:
> >
>
> I am not happy with the below code changes, I think we need a better
> way to deal with this.
>
> @@ -313,19 +314,35 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
>   glob->transientPlan = false;
>   glob->dependsOnRole = false;
>
> + if (IsModifySupportedInParallelMode(parse->commandType) &&
> + !parse->hasModifyingCTE)
> + {
> + /*
> + * FIXME
> + * There is a known bug in the query rewriter: re-written queries with
> + * a modifying CTE may not have the "hasModifyingCTE" flag set. When
> + * that bug is fixed, this temporary fix must be removed.
> + *
> + * Note that here we've made a fix for this problem only for a
> + * supported-in-parallel-mode table-modification statement (i.e.
> + * INSERT), but this bug exists for SELECT too.
> + */
> + parse->hasModifyingCTE = query_has_modifying_cte(parse);
> + }
> +
>
> I understand that this is an existing bug but I am not happy with this
> workaround. I feel it is better to check for modifyingCTE in
> max_parallel_hazard_walker. See attached, this is atop
> v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.
>

Thanks, I'll try it.
I did, however, notice a few concerns with your suggested alternative fix:
- It is not restricted to INSERT (as current fix is).
- It does not set parse->hasModifyingCTE (as current fix does), so the
return value (PlannedStmt) from standard_planner() won't have
hasModifyingCTE set correctly in the cases where the rewriter doesn't
set it correctly (and I'm not sure what strange side effects ??? that
might have).
- Although the invocation of max_parallel_hazard_walker() on a RTE
subquery will "work" in finally locating your fix's added
"CommonTableExpr" parallel-safety disabling block for commandType !=
CMD_SELECT, it looks like it potentially results in checking and
walking over a lot of other stuff within the subquery not related to
CTEs. The current fix does a more specific and efficient search for a
modifying CTE.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-24 Thread Amit Kapila

On Fri, Feb 19, 2021 at 6:56 AM Greg Nancarrow  wrote:
>
> Posting a new version of the patches, with the following updates:
>

I am not happy with the below code changes, I think we need a better
way to deal with this.

@@ -313,19 +314,35 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  glob->transientPlan = false;
  glob->dependsOnRole = false;

+ if (IsModifySupportedInParallelMode(parse->commandType) &&
+ !parse->hasModifyingCTE)
+ {
+ /*
+ * FIXME
+ * There is a known bug in the query rewriter: re-written queries with
+ * a modifying CTE may not have the "hasModifyingCTE" flag set. When
+ * that bug is fixed, this temporary fix must be removed.
+ *
+ * Note that here we've made a fix for this problem only for a
+ * supported-in-parallel-mode table-modification statement (i.e.
+ * INSERT), but this bug exists for SELECT too.
+ */
+ parse->hasModifyingCTE = query_has_modifying_cte(parse);
+ }
+

I understand that this is an existing bug but I am not happy with this
workaround. I feel it is better to check for modifyingCTE in
max_parallel_hazard_walker. See attached, this is atop
v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.

-- 
With Regards,
Amit Kapila.


fix_modifyingcte_parallel_safety_v1.patch
Description: Binary data

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-24 Thread Amit Kapila

On Wed, Feb 24, 2021 at 2:14 PM Greg Nancarrow  wrote:
>
> On Wed, Feb 24, 2021 at 3:12 PM Amit Kapila  wrote:
> >
> > On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow  wrote:
> > >
> > > On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila  
> > > wrote:
> > > >
> > > > > But the non-parallel plan was chosen (instead of a parallel plan)
> > > > > because of parallel-safety checks on the partitions, which found
> > > > > attributes of the partitions which weren't parallel-safe.
> > > > > So it's not so clear to me that the dependency doesn't exist - the
> > > > > non-parallel plan does in fact depend on the state of the partitions.
> > > > >
> > > >
> > > > Hmm, I think that is not what we can consider as a dependency.
> > > >
> > >
> > > Then if it's not a dependency, then we shouldn't have to check the
> > > attributes of the partitions for parallel-safety, to determine whether
> > > we must use a non-parallel plan (or can use a parallel plan).
> > > Except, of course, we do have to ...
> > >
> >
> > I don't think the plan-dependency and checking for parallel-safety are
> > directly related.
> >
>
> That is certainly not my understanding. Why do you think that they are
> not directly related?
> This whole issue came about because Amit L pointed out that there is a
> need to add partition OIDs as plan-dependencies BECAUSE the checking
> for parallel-safety and plan-dependency are related - since now, for
> Parallel INSERT, we're executing extra parallel-safety checks that
> check partition properties, so the resultant plan is dependent on the
> partitions and their properties.
>

He has pointed out an issue when the plan is parallel and you can see
in that example that it fails if we didn't invalidate it. For
non-parallel plans, there won't be any such issue.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-24 Thread Greg Nancarrow

On Wed, Feb 24, 2021 at 3:12 PM Amit Kapila  wrote:
>
> On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow  wrote:
> >
> > On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila  
> > wrote:
> > >
> > > > But the non-parallel plan was chosen (instead of a parallel plan)
> > > > because of parallel-safety checks on the partitions, which found
> > > > attributes of the partitions which weren't parallel-safe.
> > > > So it's not so clear to me that the dependency doesn't exist - the
> > > > non-parallel plan does in fact depend on the state of the partitions.
> > > >
> > >
> > > Hmm, I think that is not what we can consider as a dependency.
> > >
> >
> > Then if it's not a dependency, then we shouldn't have to check the
> > attributes of the partitions for parallel-safety, to determine whether
> > we must use a non-parallel plan (or can use a parallel plan).
> > Except, of course, we do have to ...
> >
>
> I don't think the plan-dependency and checking for parallel-safety are
> directly related.
>

That is certainly not my understanding. Why do you think that they are
not directly related?
This whole issue came about because Amit L pointed out that there is a
need to add partition OIDs as plan-dependencies BECAUSE the checking
for parallel-safety and plan-dependency are related - since now, for
Parallel INSERT, we're executing extra parallel-safety checks that
check partition properties, so the resultant plan is dependent on the
partitions and their properties.

Amit L originally explained this as follows:

"I think that the concerns raised by Tsunakawa-san in:
https://www.postgresql.org/message-id/TYAPR01MB2990CCB6E24B10D35D28B949FEA30%40TYAPR01MB2990.jpnprd01.prod.outlook.com
regarding how this interacts with plancache.c deserve a look.
Specifically, a plan that uses parallel insert may fail to be
invalidated when partitions are altered directly (that is without
altering their root parent).  That would be because we are not adding
partition OIDs to PlannerGlobal.invalItems despite making a plan
that's based on checking their properties."

Tsunakawa-san: "Is the cached query plan invalidated when some
alteration is done to change the parallel safety, such as adding a
trigger with a parallel-unsafe trigger action?"

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-23 Thread Amit Kapila

On Wed, Feb 24, 2021 at 8:41 AM Greg Nancarrow  wrote:
>
> On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila  wrote:
> >
> > > But the non-parallel plan was chosen (instead of a parallel plan)
> > > because of parallel-safety checks on the partitions, which found
> > > attributes of the partitions which weren't parallel-safe.
> > > So it's not so clear to me that the dependency doesn't exist - the
> > > non-parallel plan does in fact depend on the state of the partitions.
> > >
> >
> > Hmm, I think that is not what we can consider as a dependency.
> >
>
> Then if it's not a dependency, then we shouldn't have to check the
> attributes of the partitions for parallel-safety, to determine whether
> we must use a non-parallel plan (or can use a parallel plan).
> Except, of course, we do have to ...
>

I don't think the plan-dependency and checking for parallel-safety are
directly related.

> > > I know you're suggesting to reduce plan-cache invalidation by only
> > > recording a dependency in the parallel-plan case, but I've yet to see
> > > that in the existing code, and in fact it seems to be inconsistent
> > > with the behaviour I've tested so far (one example given prior, but
> > > will look for more).
> > >
> >
> > I don't see your example matches what you are saying as in it the
> > regular table exists in the plan whereas for the case we are
> > discussing partitions doesn't exist in the plan. Amit L. seems to have
> > given a correct explanation [1] of why we don't need to invalidate for
> > non-parallel plans which match my understanding.
> >
> > [1] - 
> > https://www.postgresql.org/message-id/CA%2BHiwqFKJfzgBbkg0i0Fz_FGsCiXW-Fw0tBjdsaUbNbpyv0JhA%40mail.gmail.com
> >
>
> Amit L's explanation was:
>
> I may be wrong but it doesn't seem to me that the possibility of
> constructing a better plan due to a given change is enough reason for
> plancache.c to invalidate plans that depend on that change. AIUI,
> plancache.c only considers a change interesting if it would *break* a
> Query or a plan.
>
> So in this case, a non-parallel plan may be slower, but it isn't
> exactly rendered *wrong* by changes that make a parallel plan
> possible.
>
>
> This explanation doesn't seem to match existing planner behavior
> AFAICS, and we should try to be consistent with existing behavior
> (rather than doing something different, for partitions specifically in
> this case).
>

I still think it matches. You have missed the important point in your
example and explanation which Amit L and I am trying to explain to
you. See below.

> Using a concrete example, to avoid any confusion, consider the
> following SQL (using unpatched Postgres, master branch):
>
>
> -- encourage use of parallel plans, where possible
> set parallel_setup_cost=0;
> set parallel_tuple_cost=0;
> set min_parallel_table_scan_size=0;
> set max_parallel_workers_per_gather=4;
>
> create or replace function myfunc() returns boolean as $$
> begin
> return true;
> end;
> $$ language plpgsql parallel unsafe immutable;
>
> create table mytable(x int);
> insert into mytable(x) values(generate_series(1,10));
>
> prepare q as select * from mytable, myfunc();
> explain execute q;
>
> -- change myfunc to be parallel-safe, to see the effect
> -- on the planner for the same select
> create or replace function myfunc() returns boolean as $$
> begin
> return true;
> end;
> $$ language plpgsql parallel safe immutable;
>
> explain execute q;
>
>
> Here a function referenced in the SELECT statement is changed from
> parallel-unsafe to parallel-safe, to see the effect on plancache.
> According to your referenced explanation, that shouldn't be considered
> an "interesting" change by plancache.c, as it wouldn't "break" the
> previously planned and cached (non-parallel) query - the existing
> non-parallel plan could and should be used, as it still would execute
> OK, even if slower. Right?
>

No that is not what I said or maybe you have misunderstood. In your
example, if you use the verbose option, then you will see the output
as below.

postgres=# explain (verbose) execute q;
QUERY PLAN
--
 Seq Scan on public.mytable  (cost=0.00..35.50 rows=2550 width=5)
   Output: mytable.x, true
(2 rows)

Here, you can see that Plan depends on function (it's return value),
so it needs to invalidated when the function changes. To, see it in a
bit different way, if you change your function as below then, you can
clearly see FunctionScan in the plan:

create or replace function myfunc(c1 int) returns int as $$
begin
return c1 * 10;
end;
$$ language plpgsql parallel unsafe;

postgres=# prepare q as select * from mytable, myfunc(2);
PREPARE
postgres=# explain select * from mytable, myfunc(2);
   QUERY PLAN
-
 Nested Loop  (cost=0.25..61.26

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-23 Thread Greg Nancarrow

On Tue, Feb 23, 2021 at 10:53 PM Amit Kapila  wrote:
>
> On Tue, Feb 23, 2021 at 4:47 PM Greg Nancarrow  wrote:
> >
> > On Tue, Feb 23, 2021 at 2:30 PM Amit Kapila  wrote:
> > >
> > > On Tue, Feb 23, 2021 at 6:37 AM Greg Nancarrow  
> > > wrote:
> > > >
> > > > On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > > > It also occurred to me that we can avoid pointless adding of
> > > > > > > > > > partitions if the final plan won't use parallelism.  For 
> > > > > > > > > > that, the
> > > > > > > > > > patch adds checking glob->parallelModeNeeded, which seems 
> > > > > > > > > > to do the
> > > > > > > > > > trick though I don't know if that's the correct way of 
> > > > > > > > > > doing that.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm not sure if's pointless adding partitions even in the 
> > > > > > > > > case of NOT
> > > > > > > > > using parallelism, because we may be relying on the result of
> > > > > > > > > parallel-safety checks on partitions in both cases.
> > > > > > > > > For example, insert_parallel.sql currently includes a test 
> > > > > > > > > (that you
> > > > > > > > > originally provided in a previous post) that checks a 
> > > > > > > > > non-parallel
> > > > > > > > > plan is generated after a parallel-unsafe trigger is created 
> > > > > > > > > on a
> > > > > > > > > partition involved in the INSERT.
> > > > > > > > > If I further add to that test by then dropping that trigger 
> > > > > > > > > and then
> > > > > > > > > again using EXPLAIN to see what plan is generated, then I'd 
> > > > > > > > > expect a
> > > > > > > > > parallel-plan to be generated, but with the setrefs-v3.patch 
> > > > > > > > > it still
> > > > > > > > > generates a non-parallel plan. So I think the "&&
> > > > > > > > > glob->parallelModeNeeded" part of test needs to be removed.
> > > > > > > >
> > > > > > > > Ah, okay, I didn't retest my case after making that change.
> > > > > > > >
> > > > > > >
> > > > > > > Greg has point here but I feel something on previous lines 
> > > > > > > (having a
> > > > > > > test of glob->parallelModeNeeded) is better. We only want to
> > > > > > > invalidate the plan if the prepared plan is unsafe to execute next
> > > > > > > time. It is quite possible that there are unsafe triggers on 
> > > > > > > different
> > > > > > > partitions and only one of them is dropped, so next time planning 
> > > > > > > will
> > > > > > > again yield to the same non-parallel plan. If we agree with that I
> > > > > > > think it is better to add this dependency in set_plan_refs (along 
> > > > > > > with
> > > > > > > Gather node handling).
> > > > > > >
> > > > > >
> > > > > > I think we should try to be consistent with current plan-cache
> > > > > > functionality, rather than possibly inventing new rules for
> > > > > > partitions.
> > > > > > I'm finding that with current Postgres functionality (master 
> > > > > > branch),
> > > > > > if there are two parallel-unsafe triggers defined on a normal table
> > > > > > and one is dropped, it results in replanning and it yields the same
> > > > > > (non-parallel) plan.
> > > > > >
> > > > >
> > > > > Does such a plan have partitions access in it? Can you share the plan?
> > > > >
> > > >
> > > > Er, no (it's just a regular table), but that was exactly my point:
> > > > aren't you suggesting functionality for partitions that doesn't seem
> > > > to work the same way for non-partitions?
> > > >
> > >
> > > I don't think so. The non-parallel plan for Insert doesn't directly
> > > depend on partitions so we don't need to invalidate those.
> > >
> >
> > But the non-parallel plan was chosen (instead of a parallel plan)
> > because of parallel-safety checks on the partitions, which found
> > attributes of the partitions which weren't parallel-safe.
> > So it's not so clear to me that the dependency doesn't exist - the
> > non-parallel plan does in fact depend on the state of the partitions.
> >
>
> Hmm, I think that is not what we can consider as a dependency.
>

Then if it's not a dependency, then we shouldn't have to check the
attributes of the partitions for parallel-safety, to determine whether
we must use a non-parallel plan (or can use a parallel plan).
Except, of course, we do have to ...

> > I know you're suggesting to reduce plan-cache invalidation by only
> > recording a dependency in the parallel-plan case, but I've yet to see
> > that in the existing code, and in fact it seems to be inconsistent
> > with the behaviour I've tested so far (one example given prior, but
> > will look for more).
> >
>
> I don't see your example matches what you are saying as in it the
> regular table exists in the plan whereas for

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-23 Thread Amit Kapila

On Tue, Feb 23, 2021 at 4:47 PM Greg Nancarrow  wrote:
>
> On Tue, Feb 23, 2021 at 2:30 PM Amit Kapila  wrote:
> >
> > On Tue, Feb 23, 2021 at 6:37 AM Greg Nancarrow  wrote:
> > >
> > > On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow  
> > > > wrote:
> > > > >
> > > > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote 
> > > > > >  wrote:
> > > > > > >
> > > > > > > > > It also occurred to me that we can avoid pointless adding of
> > > > > > > > > partitions if the final plan won't use parallelism.  For 
> > > > > > > > > that, the
> > > > > > > > > patch adds checking glob->parallelModeNeeded, which seems to 
> > > > > > > > > do the
> > > > > > > > > trick though I don't know if that's the correct way of doing 
> > > > > > > > > that.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I'm not sure if's pointless adding partitions even in the case 
> > > > > > > > of NOT
> > > > > > > > using parallelism, because we may be relying on the result of
> > > > > > > > parallel-safety checks on partitions in both cases.
> > > > > > > > For example, insert_parallel.sql currently includes a test 
> > > > > > > > (that you
> > > > > > > > originally provided in a previous post) that checks a 
> > > > > > > > non-parallel
> > > > > > > > plan is generated after a parallel-unsafe trigger is created on 
> > > > > > > > a
> > > > > > > > partition involved in the INSERT.
> > > > > > > > If I further add to that test by then dropping that trigger and 
> > > > > > > > then
> > > > > > > > again using EXPLAIN to see what plan is generated, then I'd 
> > > > > > > > expect a
> > > > > > > > parallel-plan to be generated, but with the setrefs-v3.patch it 
> > > > > > > > still
> > > > > > > > generates a non-parallel plan. So I think the "&&
> > > > > > > > glob->parallelModeNeeded" part of test needs to be removed.
> > > > > > >
> > > > > > > Ah, okay, I didn't retest my case after making that change.
> > > > > > >
> > > > > >
> > > > > > Greg has point here but I feel something on previous lines (having a
> > > > > > test of glob->parallelModeNeeded) is better. We only want to
> > > > > > invalidate the plan if the prepared plan is unsafe to execute next
> > > > > > time. It is quite possible that there are unsafe triggers on 
> > > > > > different
> > > > > > partitions and only one of them is dropped, so next time planning 
> > > > > > will
> > > > > > again yield to the same non-parallel plan. If we agree with that I
> > > > > > think it is better to add this dependency in set_plan_refs (along 
> > > > > > with
> > > > > > Gather node handling).
> > > > > >
> > > > >
> > > > > I think we should try to be consistent with current plan-cache
> > > > > functionality, rather than possibly inventing new rules for
> > > > > partitions.
> > > > > I'm finding that with current Postgres functionality (master branch),
> > > > > if there are two parallel-unsafe triggers defined on a normal table
> > > > > and one is dropped, it results in replanning and it yields the same
> > > > > (non-parallel) plan.
> > > > >
> > > >
> > > > Does such a plan have partitions access in it? Can you share the plan?
> > > >
> > >
> > > Er, no (it's just a regular table), but that was exactly my point:
> > > aren't you suggesting functionality for partitions that doesn't seem
> > > to work the same way for non-partitions?
> > >
> >
> > I don't think so. The non-parallel plan for Insert doesn't directly
> > depend on partitions so we don't need to invalidate those.
> >
>
> But the non-parallel plan was chosen (instead of a parallel plan)
> because of parallel-safety checks on the partitions, which found
> attributes of the partitions which weren't parallel-safe.
> So it's not so clear to me that the dependency doesn't exist - the
> non-parallel plan does in fact depend on the state of the partitions.
>

Hmm, I think that is not what we can consider as a dependency.

> I know you're suggesting to reduce plan-cache invalidation by only
> recording a dependency in the parallel-plan case, but I've yet to see
> that in the existing code, and in fact it seems to be inconsistent
> with the behaviour I've tested so far (one example given prior, but
> will look for more).
>

I don't see your example matches what you are saying as in it the
regular table exists in the plan whereas for the case we are
discussing partitions doesn't exist in the plan. Amit L. seems to have
given a correct explanation [1] of why we don't need to invalidate for
non-parallel plans which match my understanding.

[1] - 
https://www.postgresql.org/message-id/CA%2BHiwqFKJfzgBbkg0i0Fz_FGsCiXW-Fw0tBjdsaUbNbpyv0JhA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-23 Thread Greg Nancarrow

On Tue, Feb 23, 2021 at 2:30 PM Amit Kapila  wrote:
>
> On Tue, Feb 23, 2021 at 6:37 AM Greg Nancarrow  wrote:
> >
> > On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila  
> > wrote:
> > >
> > > On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow  
> > > wrote:
> > > >
> > > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote 
> > > > >  wrote:
> > > > > >
> > > > > > > > It also occurred to me that we can avoid pointless adding of
> > > > > > > > partitions if the final plan won't use parallelism.  For that, 
> > > > > > > > the
> > > > > > > > patch adds checking glob->parallelModeNeeded, which seems to do 
> > > > > > > > the
> > > > > > > > trick though I don't know if that's the correct way of doing 
> > > > > > > > that.
> > > > > > > >
> > > > > > >
> > > > > > > I'm not sure if's pointless adding partitions even in the case of 
> > > > > > > NOT
> > > > > > > using parallelism, because we may be relying on the result of
> > > > > > > parallel-safety checks on partitions in both cases.
> > > > > > > For example, insert_parallel.sql currently includes a test (that 
> > > > > > > you
> > > > > > > originally provided in a previous post) that checks a non-parallel
> > > > > > > plan is generated after a parallel-unsafe trigger is created on a
> > > > > > > partition involved in the INSERT.
> > > > > > > If I further add to that test by then dropping that trigger and 
> > > > > > > then
> > > > > > > again using EXPLAIN to see what plan is generated, then I'd 
> > > > > > > expect a
> > > > > > > parallel-plan to be generated, but with the setrefs-v3.patch it 
> > > > > > > still
> > > > > > > generates a non-parallel plan. So I think the "&&
> > > > > > > glob->parallelModeNeeded" part of test needs to be removed.
> > > > > >
> > > > > > Ah, okay, I didn't retest my case after making that change.
> > > > > >
> > > > >
> > > > > Greg has point here but I feel something on previous lines (having a
> > > > > test of glob->parallelModeNeeded) is better. We only want to
> > > > > invalidate the plan if the prepared plan is unsafe to execute next
> > > > > time. It is quite possible that there are unsafe triggers on different
> > > > > partitions and only one of them is dropped, so next time planning will
> > > > > again yield to the same non-parallel plan. If we agree with that I
> > > > > think it is better to add this dependency in set_plan_refs (along with
> > > > > Gather node handling).
> > > > >
> > > >
> > > > I think we should try to be consistent with current plan-cache
> > > > functionality, rather than possibly inventing new rules for
> > > > partitions.
> > > > I'm finding that with current Postgres functionality (master branch),
> > > > if there are two parallel-unsafe triggers defined on a normal table
> > > > and one is dropped, it results in replanning and it yields the same
> > > > (non-parallel) plan.
> > > >
> > >
> > > Does such a plan have partitions access in it? Can you share the plan?
> > >
> >
> > Er, no (it's just a regular table), but that was exactly my point:
> > aren't you suggesting functionality for partitions that doesn't seem
> > to work the same way for non-partitions?
> >
>
> I don't think so. The non-parallel plan for Insert doesn't directly
> depend on partitions so we don't need to invalidate those.
>

But the non-parallel plan was chosen (instead of a parallel plan)
because of parallel-safety checks on the partitions, which found
attributes of the partitions which weren't parallel-safe.
So it's not so clear to me that the dependency doesn't exist - the
non-parallel plan does in fact depend on the state of the partitions.
I know you're suggesting to reduce plan-cache invalidation by only
recording a dependency in the parallel-plan case, but I've yet to see
that in the existing code, and in fact it seems to be inconsistent
with the behaviour I've tested so far (one example given prior, but
will look for more).

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-22 Thread Amit Kapila

On Tue, Feb 23, 2021 at 6:37 AM Greg Nancarrow  wrote:
>
> On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila  wrote:
> >
> > On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow  wrote:
> > >
> > > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote  
> > > > wrote:
> > > > >
> > > > > > > It also occurred to me that we can avoid pointless adding of
> > > > > > > partitions if the final plan won't use parallelism.  For that, the
> > > > > > > patch adds checking glob->parallelModeNeeded, which seems to do 
> > > > > > > the
> > > > > > > trick though I don't know if that's the correct way of doing that.
> > > > > > >
> > > > > >
> > > > > > I'm not sure if's pointless adding partitions even in the case of 
> > > > > > NOT
> > > > > > using parallelism, because we may be relying on the result of
> > > > > > parallel-safety checks on partitions in both cases.
> > > > > > For example, insert_parallel.sql currently includes a test (that you
> > > > > > originally provided in a previous post) that checks a non-parallel
> > > > > > plan is generated after a parallel-unsafe trigger is created on a
> > > > > > partition involved in the INSERT.
> > > > > > If I further add to that test by then dropping that trigger and then
> > > > > > again using EXPLAIN to see what plan is generated, then I'd expect a
> > > > > > parallel-plan to be generated, but with the setrefs-v3.patch it 
> > > > > > still
> > > > > > generates a non-parallel plan. So I think the "&&
> > > > > > glob->parallelModeNeeded" part of test needs to be removed.
> > > > >
> > > > > Ah, okay, I didn't retest my case after making that change.
> > > > >
> > > >
> > > > Greg has point here but I feel something on previous lines (having a
> > > > test of glob->parallelModeNeeded) is better. We only want to
> > > > invalidate the plan if the prepared plan is unsafe to execute next
> > > > time. It is quite possible that there are unsafe triggers on different
> > > > partitions and only one of them is dropped, so next time planning will
> > > > again yield to the same non-parallel plan. If we agree with that I
> > > > think it is better to add this dependency in set_plan_refs (along with
> > > > Gather node handling).
> > > >
> > >
> > > I think we should try to be consistent with current plan-cache
> > > functionality, rather than possibly inventing new rules for
> > > partitions.
> > > I'm finding that with current Postgres functionality (master branch),
> > > if there are two parallel-unsafe triggers defined on a normal table
> > > and one is dropped, it results in replanning and it yields the same
> > > (non-parallel) plan.
> > >
> >
> > Does such a plan have partitions access in it? Can you share the plan?
> >
>
> Er, no (it's just a regular table), but that was exactly my point:
> aren't you suggesting functionality for partitions that doesn't seem
> to work the same way for non-partitions?
>

I don't think so. The non-parallel plan for Insert doesn't directly
depend on partitions so we don't need to invalidate those.

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-22 Thread Greg Nancarrow

On Tue, Feb 23, 2021 at 12:33 AM Amit Kapila  wrote:
>
> On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow  wrote:
> >
> > On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila  wrote:
> > >
> > > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote  
> > > wrote:
> > > >
> > > > > > It also occurred to me that we can avoid pointless adding of
> > > > > > partitions if the final plan won't use parallelism.  For that, the
> > > > > > patch adds checking glob->parallelModeNeeded, which seems to do the
> > > > > > trick though I don't know if that's the correct way of doing that.
> > > > > >
> > > > >
> > > > > I'm not sure if's pointless adding partitions even in the case of NOT
> > > > > using parallelism, because we may be relying on the result of
> > > > > parallel-safety checks on partitions in both cases.
> > > > > For example, insert_parallel.sql currently includes a test (that you
> > > > > originally provided in a previous post) that checks a non-parallel
> > > > > plan is generated after a parallel-unsafe trigger is created on a
> > > > > partition involved in the INSERT.
> > > > > If I further add to that test by then dropping that trigger and then
> > > > > again using EXPLAIN to see what plan is generated, then I'd expect a
> > > > > parallel-plan to be generated, but with the setrefs-v3.patch it still
> > > > > generates a non-parallel plan. So I think the "&&
> > > > > glob->parallelModeNeeded" part of test needs to be removed.
> > > >
> > > > Ah, okay, I didn't retest my case after making that change.
> > > >
> > >
> > > Greg has point here but I feel something on previous lines (having a
> > > test of glob->parallelModeNeeded) is better. We only want to
> > > invalidate the plan if the prepared plan is unsafe to execute next
> > > time. It is quite possible that there are unsafe triggers on different
> > > partitions and only one of them is dropped, so next time planning will
> > > again yield to the same non-parallel plan. If we agree with that I
> > > think it is better to add this dependency in set_plan_refs (along with
> > > Gather node handling).
> > >
> >
> > I think we should try to be consistent with current plan-cache
> > functionality, rather than possibly inventing new rules for
> > partitions.
> > I'm finding that with current Postgres functionality (master branch),
> > if there are two parallel-unsafe triggers defined on a normal table
> > and one is dropped, it results in replanning and it yields the same
> > (non-parallel) plan.
> >
>
> Does such a plan have partitions access in it? Can you share the plan?
>

Er, no (it's just a regular table), but that was exactly my point:
aren't you suggesting functionality for partitions that doesn't seem
to work the same way for non-partitions?

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-22 Thread Amit Kapila

On Mon, Feb 22, 2021 at 8:41 AM Greg Nancarrow  wrote:
>
> On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila  wrote:
> >
> > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote  
> > wrote:
> > >
> > > > > It also occurred to me that we can avoid pointless adding of
> > > > > partitions if the final plan won't use parallelism.  For that, the
> > > > > patch adds checking glob->parallelModeNeeded, which seems to do the
> > > > > trick though I don't know if that's the correct way of doing that.
> > > > >
> > > >
> > > > I'm not sure if's pointless adding partitions even in the case of NOT
> > > > using parallelism, because we may be relying on the result of
> > > > parallel-safety checks on partitions in both cases.
> > > > For example, insert_parallel.sql currently includes a test (that you
> > > > originally provided in a previous post) that checks a non-parallel
> > > > plan is generated after a parallel-unsafe trigger is created on a
> > > > partition involved in the INSERT.
> > > > If I further add to that test by then dropping that trigger and then
> > > > again using EXPLAIN to see what plan is generated, then I'd expect a
> > > > parallel-plan to be generated, but with the setrefs-v3.patch it still
> > > > generates a non-parallel plan. So I think the "&&
> > > > glob->parallelModeNeeded" part of test needs to be removed.
> > >
> > > Ah, okay, I didn't retest my case after making that change.
> > >
> >
> > Greg has point here but I feel something on previous lines (having a
> > test of glob->parallelModeNeeded) is better. We only want to
> > invalidate the plan if the prepared plan is unsafe to execute next
> > time. It is quite possible that there are unsafe triggers on different
> > partitions and only one of them is dropped, so next time planning will
> > again yield to the same non-parallel plan. If we agree with that I
> > think it is better to add this dependency in set_plan_refs (along with
> > Gather node handling).
> >
>
> I think we should try to be consistent with current plan-cache
> functionality, rather than possibly inventing new rules for
> partitions.
> I'm finding that with current Postgres functionality (master branch),
> if there are two parallel-unsafe triggers defined on a normal table
> and one is dropped, it results in replanning and it yields the same
> (non-parallel) plan.
>

Does such a plan have partitions access in it? Can you share the plan?

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-22 Thread Amit Kapila

On Mon, Feb 22, 2021 at 8:46 AM Amit Langote  wrote:
>
> On Fri, Feb 19, 2021 at 7:38 PM Amit Kapila  wrote:
> > On Thu, Feb 18, 2021 at 11:05 AM Amit Langote  
> > wrote:
> > >
> > > > > It also occurred to me that we can avoid pointless adding of
> > > > > partitions if the final plan won't use parallelism.  For that, the
> > > > > patch adds checking glob->parallelModeNeeded, which seems to do the
> > > > > trick though I don't know if that's the correct way of doing that.
> > > > >
> > > >
> > > > I'm not sure if's pointless adding partitions even in the case of NOT
> > > > using parallelism, because we may be relying on the result of
> > > > parallel-safety checks on partitions in both cases.
> > > > For example, insert_parallel.sql currently includes a test (that you
> > > > originally provided in a previous post) that checks a non-parallel
> > > > plan is generated after a parallel-unsafe trigger is created on a
> > > > partition involved in the INSERT.
> > > > If I further add to that test by then dropping that trigger and then
> > > > again using EXPLAIN to see what plan is generated, then I'd expect a
> > > > parallel-plan to be generated, but with the setrefs-v3.patch it still
> > > > generates a non-parallel plan. So I think the "&&
> > > > glob->parallelModeNeeded" part of test needs to be removed.
> > >
> > > Ah, okay, I didn't retest my case after making that change.
> >
> > Greg has point here but I feel something on previous lines (having a
> > test of glob->parallelModeNeeded) is better. We only want to
> > invalidate the plan if the prepared plan is unsafe to execute next
> > time.
> >
> > It is quite possible that there are unsafe triggers on different
> > partitions and only one of them is dropped, so next time planning will
> > again yield to the same non-parallel plan. If we agree with that I
> > think it is better to add this dependency in set_plan_refs (along with
> > Gather node handling).
>
> Are you saying that partitions shouldn't be added to the dependency
> list if a parallel plan was not chosen for insert into a partitioned
> table for whatever reason (parallel unsafe expressions or beaten by
> other paths in terms of cost)?
>

Right.

>  If so, I am inclined to agree with
> that.
>
> I may be wrong but it doesn't seem to me that the possibility of
> constructing a better plan due to a given change is enough reason for
> plancache.c to invalidate plans that depend on that change.  AIUI,
> plancache.c only considers a change interesting if it would *break* a
> Query or a plan.
>

that makes sense to me.

> So in this case, a non-parallel plan may be slower, but it isn't
> exactly rendered *wrong* by changes that make a parallel plan
> possible.
>

Right.

> > Also, if we agree that we don't have any cheap way to determine
> > parallel-safety of partitioned relations then shall we consider the
> > patch being discussed [1] to be combined here?
>
> Yes, I think it does make sense to consider the GUC patch with the
> patches on this thread.
>

Cool, Greg/Hou, can we consider this along with the next patch?

-- 
With Regards,
Amit Kapila.

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-22 Thread Greg Nancarrow

On Mon, Feb 22, 2021 at 6:25 PM houzj.f...@fujitsu.com
 wrote:
>
> Hi
>
> (I may be wrong here)
> I noticed that the patch does not have check for partial index(index 
> predicate).
> It seems parallel index build will check it like the following:
> --
> /*
>  * Determine if it's safe to proceed.
>  *
>  * Currently, parallel workers can't access the leader's temporary 
> tables.
>  * Furthermore, any index predicate or index expressions must be 
> parallel
>  * safe.
>  */
> if (heap->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
> !is_parallel_safe(root, (Node *) 
> RelationGetIndexExpressions(index)) ||
> !is_parallel_safe(root, (Node *) 
> RelationGetIndexPredicate(index)))
> --
>
> Should we do parallel safety check for it ?
>

Thanks, it looks like you're right, it is missing (and there's no test for it).
I can add a fix to the index-checking code, something like:

+if (!found_max_hazard)
+{
+ii_Predicate = RelationGetIndexPredicate(index_rel);
+if (ii_Predicate != NIL)
+{
+if (max_parallel_hazard_walker((Node *)ii_Predicate, context))
+{
+found_max_hazard = true;
+}
+}
+}

Also will need a bit of renaming of that function.
I'll include this in the next patch update.

Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

2021-02-21 Thread houzj.f...@fujitsu.com

> Posting a new version of the patches, with the following updates:
> - Moved the update of glob->relationOIDs (i.e. addition of partition OIDs that
> plan depends on, resulting from parallel-safety checks) from within
> max_parallel_hazard() to set_plan_references().
> - Added an extra test for partition plan-cache invalidation.
> - Simplified query_has_modifying_cte() temporary bug-fix.
> - Added a comment explaining why parallel-safety of partition column defaults
> is not checked.
> - Minor simplification: hasSubQuery return to
> is_parallel_possible_for_modify().

Hi

(I may be wrong here)
I noticed that the patch does not have check for partial index(index predicate).
It seems parallel index build will check it like the following:
--
/*
 * Determine if it's safe to proceed.
 *
 * Currently, parallel workers can't access the leader's temporary 
tables.
 * Furthermore, any index predicate or index expressions must be 
parallel
 * safe.
 */
if (heap->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
!is_parallel_safe(root, (Node *) 
RelationGetIndexExpressions(index)) ||
!is_parallel_safe(root, (Node *) 
RelationGetIndexPredicate(index)))
--

Should we do parallel safety check for it ?

Best regards,
houzj

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-21 Thread Amit Langote

On Fri, Feb 19, 2021 at 7:38 PM Amit Kapila  wrote:
> On Thu, Feb 18, 2021 at 11:05 AM Amit Langote  wrote:
> >
> > > > It also occurred to me that we can avoid pointless adding of
> > > > partitions if the final plan won't use parallelism.  For that, the
> > > > patch adds checking glob->parallelModeNeeded, which seems to do the
> > > > trick though I don't know if that's the correct way of doing that.
> > > >
> > >
> > > I'm not sure if's pointless adding partitions even in the case of NOT
> > > using parallelism, because we may be relying on the result of
> > > parallel-safety checks on partitions in both cases.
> > > For example, insert_parallel.sql currently includes a test (that you
> > > originally provided in a previous post) that checks a non-parallel
> > > plan is generated after a parallel-unsafe trigger is created on a
> > > partition involved in the INSERT.
> > > If I further add to that test by then dropping that trigger and then
> > > again using EXPLAIN to see what plan is generated, then I'd expect a
> > > parallel-plan to be generated, but with the setrefs-v3.patch it still
> > > generates a non-parallel plan. So I think the "&&
> > > glob->parallelModeNeeded" part of test needs to be removed.
> >
> > Ah, okay, I didn't retest my case after making that change.
>
> Greg has point here but I feel something on previous lines (having a
> test of glob->parallelModeNeeded) is better. We only want to
> invalidate the plan if the prepared plan is unsafe to execute next
> time.
>
> It is quite possible that there are unsafe triggers on different
> partitions and only one of them is dropped, so next time planning will
> again yield to the same non-parallel plan. If we agree with that I
> think it is better to add this dependency in set_plan_refs (along with
> Gather node handling).

Are you saying that partitions shouldn't be added to the dependency
list if a parallel plan was not chosen for insert into a partitioned
table for whatever reason (parallel unsafe expressions or beaten by
other paths in terms of cost)?  If so, I am inclined to agree with
that.

I may be wrong but it doesn't seem to me that the possibility of
constructing a better plan due to a given change is enough reason for
plancache.c to invalidate plans that depend on that change.  AIUI,
plancache.c only considers a change interesting if it would *break* a
Query or a plan.

So in this case, a non-parallel plan may be slower, but it isn't
exactly rendered *wrong* by changes that make a parallel plan
possible.

> Also, if we agree that we don't have any cheap way to determine
> parallel-safety of partitioned relations then shall we consider the
> patch being discussed [1] to be combined here?

Yes, I think it does make sense to consider the GUC patch with the
patches on this thread.

> I feel we should focus on getting the first patch of Greg
> (v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT, along with
> a test case patch) and Hou-San's patch discussed at [1] ready. Then we
> can focus on the
> v18-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO. Because
> even if we get the first patch that is good enough for some users.

+1.
--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-21 Thread Greg Nancarrow

On Fri, Feb 19, 2021 at 9:38 PM Amit Kapila  wrote:
>
> On Thu, Feb 18, 2021 at 11:05 AM Amit Langote  wrote:
> >
> > > > It also occurred to me that we can avoid pointless adding of
> > > > partitions if the final plan won't use parallelism.  For that, the
> > > > patch adds checking glob->parallelModeNeeded, which seems to do the
> > > > trick though I don't know if that's the correct way of doing that.
> > > >
> > >
> > > I'm not sure if's pointless adding partitions even in the case of NOT
> > > using parallelism, because we may be relying on the result of
> > > parallel-safety checks on partitions in both cases.
> > > For example, insert_parallel.sql currently includes a test (that you
> > > originally provided in a previous post) that checks a non-parallel
> > > plan is generated after a parallel-unsafe trigger is created on a
> > > partition involved in the INSERT.
> > > If I further add to that test by then dropping that trigger and then
> > > again using EXPLAIN to see what plan is generated, then I'd expect a
> > > parallel-plan to be generated, but with the setrefs-v3.patch it still
> > > generates a non-parallel plan. So I think the "&&
> > > glob->parallelModeNeeded" part of test needs to be removed.
> >
> > Ah, okay, I didn't retest my case after making that change.
> >
>
> Greg has point here but I feel something on previous lines (having a
> test of glob->parallelModeNeeded) is better. We only want to
> invalidate the plan if the prepared plan is unsafe to execute next
> time. It is quite possible that there are unsafe triggers on different
> partitions and only one of them is dropped, so next time planning will
> again yield to the same non-parallel plan. If we agree with that I
> think it is better to add this dependency in set_plan_refs (along with
> Gather node handling).
>

I think we should try to be consistent with current plan-cache
functionality, rather than possibly inventing new rules for
partitions.
I'm finding that with current Postgres functionality (master branch),
if there are two parallel-unsafe triggers defined on a normal table
and one is dropped, it results in replanning and it yields the same
(non-parallel) plan. This would seem to go against what you are
suggesting.

Regards,
Greg Nancarrow
Fujitsu Australia

RE: Parallel INSERT (INTO ... SELECT ...)

2021-02-21 Thread tsunakawa.ta...@fujitsu.com

From: Amit Kapila 
> It is quite possible what you are saying is correct but I feel that is
> not this patch's fault. So, won't it better to discuss this in a
> separate thread?
> 
> Good use case but again, I think this can be done as a separate patch.

Agreed.
I think even the current patch offers great benefits and can be committed in PG 
14, even if all my four feedback comments are left unaddressed.  I just touched 
on them for completeness in terms of typically expected use cases.  They will 
probably be able to be implemented along the current design.



> I think here you are talking about the third patch (Parallel Inserts).
> I guess if one has run inserts parallelly from psql then also similar
> behavior would have been observed. For tables, it might lead to better
> results once we have the patch discussed at [1]. Actually, this needs
> more investigation.
> 
> [1] -
> https://www.postgresql.org/message-id/20200508072545.GA9701%40telsaso
> ft.com

That looks interesting and worth a try.



Regards
Takayuki Tsunakawa

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-19 Thread Amit Kapila

On Fri, Feb 19, 2021 at 10:13 AM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Greg Nancarrow 
> --
> On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.ta...@fujitsu.com 
>  wrote:
> > (8)
> > +   /*
> > +* If the trigger type is RI_TRIGGER_FK, this indicates a 
> > FK exists in
> > +* the relation, and this would result in creation of new 
> > CommandIds
> > +* on insert/update/delete and this isn't supported in a 
> > parallel
> > +* worker (but is safe in the parallel leader).
> > +*/
> > +   trigtype = RI_FKey_trigger_type(trigger->tgfoid);
> > +   if (trigtype == RI_TRIGGER_FK)
> > +   {
> > +   if 
> > (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> > +   return true;
> > +   }
> >
> > Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK 
> > triggers do not generate command IDs.  See RI_FKey_check() which is called 
> > in RI_TRIGGER_FK case.  In there, ri_PerformCheck() is called with the 
> > detectNewRows argument set to false, which causes CommandCounterIncrement() 
> > to not be called.
> >
>
> Hmmm, I'm not sure that you have read and interpreted the patch code 
> correctly.
> The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign 
> key, and an insert into such a table will generate a new commandId (so we 
> must avoid that, as we don't currently have the technology to support sharing 
> of new command IDs across the participants in the parallel operation). This 
> is what the code comment says, It does not say that such a trigger generates 
> a new command ID.
>
> See Amit's updated comment here: 
> https://github.com/postgres/postgres/commit/0d32511eca5aec205cb6b609638ea67129ef6665
>
> In addition, the 2nd patch has an explicit test case for this (testing insert 
> into a table that has a FK).
> --
>
>
> First of all, I anticipate this parallel INSERT SELECT feature will typically 
> shine, and expected to work, in the ETL or ELT into a data warehouse or an 
> ODS for analytics.  Bearing that in mind, let me list some issues or 
> questions below.  But the current state of the patch would be of course 
> attractive in some workloads, so I don't think these are not necessarily 
> blockers.
>
>
> (1)
> According to the classic book "The Data Warehouse Toolkit" and the website 
> [1] by its author, the fact table (large transaction history) in the data 
> warehouse has foreign keys referencing to the dimension tables (small or 
> medium-sized master or reference data).  So, parallel insert will be 
> effective if it works when loading data into the fact table with foreign keys.
>
> To answer the above question, I'm assuming:
>
> CREATE TABLE some_dimension (key_col int PRIMARY KEY);
> CREATE TABLE some_fact (some_key int REFERENCES some_dimension);
> INSERT INTO some_fact SELECT ...;
>
>
> My naive question is, "why should new command IDs be generated to check 
> foreign key constraints in this INSERT case?  The check just reads the parent 
> (some_dimension table here)..."
>

It is quite possible what you are saying is correct but I feel that is
not this patch's fault. So, won't it better to discuss this in a
separate thread?

>
>
> (2)
> Likewise, dimension tables have surrogate keys that are typically implemented 
> as a sequence or an identity column.  It is suggested that even fact tables 
> sometimes (or often?) have surrogate keys.  But the current patch does not 
> parallelize the statement when the target table has a sequence or an identity 
> column.
>
> I was looking at the sequence code, and my naive (again) idea is that the 
> parallel leader and workers allocates numbers from the sequence 
> independently, and sets the largest number of them as the session's currval 
> at the end of parallel operation.  We have to note in the documentation that 
> gaps in the sequence numbers will arise and not used in parallel DML.
>

Good use case but again, I think this can be done as a separate patch.

>
> (3)
> As Hou-san demonstrated, the current patch causes the resulting table and 
> index to become larger when inserted in parallel than in inserted serially.  
> This could be a problem for analytics use cases where the table is just 
> inserted and read only afterwards.  We could advise the user to run REINDEX 
> CONCURRENTLY after loading data, but what about tables?
>

I

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-19 Thread Amit Kapila

On Thu, Feb 18, 2021 at 11:05 AM Amit Langote  wrote:
>
> > > It also occurred to me that we can avoid pointless adding of
> > > partitions if the final plan won't use parallelism.  For that, the
> > > patch adds checking glob->parallelModeNeeded, which seems to do the
> > > trick though I don't know if that's the correct way of doing that.
> > >
> >
> > I'm not sure if's pointless adding partitions even in the case of NOT
> > using parallelism, because we may be relying on the result of
> > parallel-safety checks on partitions in both cases.
> > For example, insert_parallel.sql currently includes a test (that you
> > originally provided in a previous post) that checks a non-parallel
> > plan is generated after a parallel-unsafe trigger is created on a
> > partition involved in the INSERT.
> > If I further add to that test by then dropping that trigger and then
> > again using EXPLAIN to see what plan is generated, then I'd expect a
> > parallel-plan to be generated, but with the setrefs-v3.patch it still
> > generates a non-parallel plan. So I think the "&&
> > glob->parallelModeNeeded" part of test needs to be removed.
>
> Ah, okay, I didn't retest my case after making that change.
>

Greg has point here but I feel something on previous lines (having a
test of glob->parallelModeNeeded) is better. We only want to
invalidate the plan if the prepared plan is unsafe to execute next
time. It is quite possible that there are unsafe triggers on different
partitions and only one of them is dropped, so next time planning will
again yield to the same non-parallel plan. If we agree with that I
think it is better to add this dependency in set_plan_refs (along with
Gather node handling).

Also, if we agree that we don't have any cheap way to determine
parallel-safety of partitioned relations then shall we consider the
patch being discussed [1] to be combined here?

Amit L, do you agree with that line of thought, or you have any other ideas?

I feel we should focus on getting the first patch of Greg
(v18-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT, along with
a test case patch) and Hou-San's patch discussed at [1] ready. Then we
can focus on the
v18-0003-Enable-parallel-INSERT-and-or-SELECT-for-INSERT-INTO. Because
even if we get the first patch that is good enough for some users.

What do you think?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

RE: Parallel INSERT (INTO ... SELECT ...)

2021-02-18 Thread tsunakawa.ta...@fujitsu.com

From: Greg Nancarrow  
--
On Mon, Jan 25, 2021 at 10:23 AM tsunakawa.ta...@fujitsu.com 
 wrote:
> (8)
> +   /*
> +* If the trigger type is RI_TRIGGER_FK, this indicates a FK 
> exists in
> +* the relation, and this would result in creation of new 
> CommandIds
> +* on insert/update/delete and this isn't supported in a 
> parallel
> +* worker (but is safe in the parallel leader).
> +*/
> +   trigtype = RI_FKey_trigger_type(trigger->tgfoid);
> +   if (trigtype == RI_TRIGGER_FK)
> +   {
> +   if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, 
> context))
> +   return true;
> +   }
>
> Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK 
> triggers do not generate command IDs.  See RI_FKey_check() which is called in 
> RI_TRIGGER_FK case.  In there, ri_PerformCheck() is called with the 
> detectNewRows argument set to false, which causes CommandCounterIncrement() 
> to not be called.
>

Hmmm, I'm not sure that you have read and interpreted the patch code correctly.
The existence of a RI_TRIGGER_FK trigger indicates the table has a foreign key, 
and an insert into such a table will generate a new commandId (so we must avoid 
that, as we don't currently have the technology to support sharing of new 
command IDs across the participants in the parallel operation). This is what 
the code comment says, It does not say that such a trigger generates a new 
command ID.

See Amit's updated comment here: 
https://github.com/postgres/postgres/commit/0d32511eca5aec205cb6b609638ea67129ef6665

In addition, the 2nd patch has an explicit test case for this (testing insert 
into a table that has a FK).
----------

First of all, I anticipate this parallel INSERT SELECT feature will typically 
shine, and expected to work, in the ETL or ELT into a data warehouse or an ODS 
for analytics.  Bearing that in mind, let me list some issues or questions 
below.  But the current state of the patch would be of course attractive in 
some workloads, so I don't think these are not necessarily blockers.

(1)
According to the classic book "The Data Warehouse Toolkit" and the website [1] 
by its author, the fact table (large transaction history) in the data warehouse 
has foreign keys referencing to the dimension tables (small or medium-sized 
master or reference data).  So, parallel insert will be effective if it works 
when loading data into the fact table with foreign keys.

To answer the above question, I'm assuming:

CREATE TABLE some_dimension (key_col int PRIMARY KEY);
CREATE TABLE some_fact (some_key int REFERENCES some_dimension);
INSERT INTO some_fact SELECT ...;

My naive question is, "why should new command IDs be generated to check foreign 
key constraints in this INSERT case?  The check just reads the parent 
(some_dimension table here)..."

Looking a bit deeper into the code, although ri_PerformCheck() itself tries to 
avoid generating command IDs, it calls _SPI_execute_snapshot() with the 
read_only argument always set to false.  It in turn calls _SPI_execute_plan() 
-> CommandCounterIncrement() as follows:

[_SPI_execute_plan()]
/*
 * If not read-only mode, advance the command counter before each
 * command and update the snapshot.
 */
if (!read_only && !plan->no_snapshots)
{
CommandCounterIncrement();
UpdateActiveSnapshotCommandId();
}

Can't we pass true to read_only from ri_PerformCheck() in some cases?

(2)
Likewise, dimension tables have surrogate keys that are typically implemented 
as a sequence or an identity column.  It is suggested that even fact tables 
sometimes (or often?) have surrogate keys.  But the current patch does not 
parallelize the statement when the target table has a sequence or an identity 
column.

I was looking at the sequence code, and my naive (again) idea is that the 
parallel leader and workers allocates numbers from the sequence independently, 
and sets the largest number of them as the session's currval at the end of 
parallel operation.  We have to note in the documentation that gaps in the 
sequence numbers will arise and not used in parallel DML.

(3)
As Hou-san demonstrated, the current patch causes the resulting table and index 
to become larger when inserted in parallel than in inserted serially.  This 
could be a problem for analytics use cases where the table is just inserted and 
read only afterwards.  We could advise the user to run REINDEX CONCURRENTLY 
after loading data, but what about tables?

BTW, I don't know if Oracle and SQL Server have similar i

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-18 Thread Greg Nancarrow

On Thu, Feb 18, 2021 at 4:35 PM Amit Langote  wrote:
>
> Looking at this again, I am a bit concerned about going over the whole
> partition tree *twice* when making a parallel plan for insert into
> partitioned tables.  Maybe we should do what you did in your first
> attempt a slightly differently -- add partition OIDs during the
> max_parallel_hazard() initiated scan of the partition tree as you did.
> Instead of adding them directly to PlannerGlobal.relationOids, add to,
> say, PlannerInfo.targetPartitionOids and have set_plan_references() do
> list_concat(glob->relationOids, list_copy(root->targetPartitionOids)
> in the same place as setrefs-v3 does
> add_target_partition_oids_recurse().  Thoughts?
>

Agreed, that might be a better approach, and that way we're also only
recording the partition OIDs that the parallel-safety checks are
relying on.
I'll give it a go and see if I can detect any issues with this method.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-17 Thread Amit Langote

On Thu, Feb 18, 2021 at 10:03 AM Greg Nancarrow  wrote:
> On Thu, Feb 18, 2021 at 12:34 AM Amit Langote  wrote:
> > All that is to say that we should move our code to add partition OIDs
> > as plan dependencies to somewhere in set_plan_references(), which
> > otherwise populates PlannedStmt.relationOids.  I updated the patch to
> > do that.
>
> OK, understood. Thanks for the detailed explanation.
>
> > It also occurred to me that we can avoid pointless adding of
> > partitions if the final plan won't use parallelism.  For that, the
> > patch adds checking glob->parallelModeNeeded, which seems to do the
> > trick though I don't know if that's the correct way of doing that.
> >
>
> I'm not sure if's pointless adding partitions even in the case of NOT
> using parallelism, because we may be relying on the result of
> parallel-safety checks on partitions in both cases.
> For example, insert_parallel.sql currently includes a test (that you
> originally provided in a previous post) that checks a non-parallel
> plan is generated after a parallel-unsafe trigger is created on a
> partition involved in the INSERT.
> If I further add to that test by then dropping that trigger and then
> again using EXPLAIN to see what plan is generated, then I'd expect a
> parallel-plan to be generated, but with the setrefs-v3.patch it still
> generates a non-parallel plan. So I think the "&&
> glob->parallelModeNeeded" part of test needs to be removed.

Ah, okay, I didn't retest my case after making that change.

Looking at this again, I am a bit concerned about going over the whole
partition tree *twice* when making a parallel plan for insert into
partitioned tables.  Maybe we should do what you did in your first
attempt a slightly differently -- add partition OIDs during the
max_parallel_hazard() initiated scan of the partition tree as you did.
Instead of adding them directly to PlannerGlobal.relationOids, add to,
say, PlannerInfo.targetPartitionOids and have set_plan_references() do
list_concat(glob->relationOids, list_copy(root->targetPartitionOids)
in the same place as setrefs-v3 does
add_target_partition_oids_recurse().  Thoughts?

--
Amit Langote
EDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-17 Thread Greg Nancarrow

On Thu, Feb 18, 2021 at 12:34 AM Amit Langote  wrote:
>
> > Your revised version seems OK, though I do have a concern:
> > Is the use of "table_close(rel, NoLock)'' intentional? That will keep
> > the lock (lockmode) until end-of-transaction.
>
> I think we always keep any locks on relations that are involved in a
> plan until end-of-transaction.  What if a partition is changed in an
> unsafe manner between being considered safe for parallel insertion and
> actually performing the parallel insert?
>
> BTW, I just noticed that exctract_query_dependencies() runs on a
> rewritten, but not-yet-planned query tree, that is, I didn't know that
> extract_query_dependencies() only populates the CachedPlanSource's
> relationOids and not CachedPlan's.  The former is only for tracking
> the dependencies of an unplanned Query, so partitions should never be
> added to it.  Instead, they should be added to
> PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan),
> which is kind of what your earlier patch did.  Needless to say,
> PlanCacheRelCallback checks both CachedPlanSource.relationOids and
> PlannedStmt.relationOids, so if it receives a message about a
> partition, its OID is matched from the latter.
>
> All that is to say that we should move our code to add partition OIDs
> as plan dependencies to somewhere in set_plan_references(), which
> otherwise populates PlannedStmt.relationOids.  I updated the patch to
> do that.

OK, understood. Thanks for the detailed explanation.

> It also occurred to me that we can avoid pointless adding of
> partitions if the final plan won't use parallelism.  For that, the
> patch adds checking glob->parallelModeNeeded, which seems to do the
> trick though I don't know if that's the correct way of doing that.
>

I'm not sure if's pointless adding partitions even in the case of NOT
using parallelism, because we may be relying on the result of
parallel-safety checks on partitions in both cases.
For example, insert_parallel.sql currently includes a test (that you
originally provided in a previous post) that checks a non-parallel
plan is generated after a parallel-unsafe trigger is created on a
partition involved in the INSERT.
If I further add to that test by then dropping that trigger and then
again using EXPLAIN to see what plan is generated, then I'd expect a
parallel-plan to be generated, but with the setrefs-v3.patch it still
generates a non-parallel plan. So I think the "&&
glob->parallelModeNeeded" part of test needs to be removed.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-17 Thread Amit Langote

On Wed, Feb 17, 2021 at 10:44 AM Greg Nancarrow  wrote:
> On Wed, Feb 17, 2021 at 12:19 AM Amit Langote  wrote:
> > On Mon, Feb 15, 2021 at 4:39 PM Greg Nancarrow  wrote:
> > > On Sat, Feb 13, 2021 at 12:17 AM Amit Langote  
> > > wrote:
> > > > On Thu, Feb 11, 2021 at 4:43 PM Greg Nancarrow  
> > > > wrote:
> > > > > Actually, I tried adding the following in the loop that checks the
> > > > > parallel-safety of each partition and it seemed to work:
> > > > >
> > > > > glob->relationOids =
> > > > > lappend_oid(glob->relationOids, pdesc->oids[i]);
> > > > >
> > > > > Can you confirm, is that what you were referring to?
> > > >
> > > > Right.  I had mistakenly mentioned PlannerGlobal.invalItems, sorry.
> > > >
> > > > Although it gets the job done, I'm not sure if manipulating
> > > > relationOids from max_parallel_hazard() or its subroutines is okay,
> > > > but I will let the committer decide that.  As I mentioned above, the
> > > > person who designed this decided for some reason that it is
> > > > extract_query_dependencies()'s job to populate
> > > > PlannerGlobal.relationOids/invalItems.
> > >
> > > Yes, it doesn't really seem right doing it within max_parallel_hazard().
> > > I tried doing it in extract_query_dependencies() instead - see
> > > attached patch - and it seems to work, but I'm not sure if there might
> > > be any unintended side-effects.
> >
> > One issue I see with the patch is that it fails to consider
> > multi-level partitioning, because it's looking up partitions only in
> > the target table's PartitionDesc and no other.
> >
> > @@ -3060,8 +3066,36 @@ extract_query_dependencies_walker(Node *node,
> > PlannerInfo *context)
> > RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
> >
> > if (rte->rtekind == RTE_RELATION)
> > -   context->glob->relationOids =
> > -   lappend_oid(context->glob->relationOids, rte->relid);
> > +   {
> > +   PlannerGlobal   *glob;
> > +
> > +   glob = context->glob;
> > +   glob->relationOids =
> > +   lappend_oid(glob->relationOids, rte->relid);
> > +   if (query->commandType == CMD_INSERT &&
> > +   rte->relkind == 
> > RELKIND_PARTITIONED_TABLE)
> >
> > The RTE whose relkind is being checked here may not be the INSERT
> > target relation's RTE, even though that's perhaps always true today.
> > So, I suggest to pull the new block out of the loop over rtable and
> > perform its deeds on the result RTE explicitly fetched using
> > rt_fetch(), preferably using a separate recursive function.  I'm
> > thinking something like the attached revised version.
>
> Thanks. Yes, I'd forgotten about the fact a partition may itself be
> partitioned, so it needs to be recursive (like in the parallel-safety
> checks).
> Your revised version seems OK, though I do have a concern:
> Is the use of "table_close(rel, NoLock)'' intentional? That will keep
> the lock (lockmode) until end-of-transaction.

I think we always keep any locks on relations that are involved in a
plan until end-of-transaction.  What if a partition is changed in an
unsafe manner between being considered safe for parallel insertion and
actually performing the parallel insert?

BTW, I just noticed that exctract_query_dependencies() runs on a
rewritten, but not-yet-planned query tree, that is, I didn't know that
extract_query_dependencies() only populates the CachedPlanSource's
relationOids and not CachedPlan's.  The former is only for tracking
the dependencies of an unplanned Query, so partitions should never be
added to it.  Instead, they should be added to
PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan),
which is kind of what your earlier patch did.  Needless to say,
PlanCacheRelCallback checks both CachedPlanSource.relationOids and
PlannedStmt.relationOids, so if it receives a message about a
partition, its OID is matched from the latter.

All that is to say that we should move our code to add partition OIDs
as plan dependencies to somewhere in set_plan_references(), which
otherwise populates PlannedStmt.relationOids.  I updated the patch to
do that.  It also occurred to me that we can avoid pointless adding of
partitions if the final plan won't use parallelism.  For that, the
patch adds checking glob->parallelModeNeeded, which seems to do the
trick though I don't know if that's the correct way of doing that.

--
Amit Langote
EDB: http://www.enterprisedb.com

setrefs-v3.patch
Description: Binary data

1 2 3 4 >

1 - 100 of 342 matches

Mail list logo