Re: A reloption for partitioned tables - parallel_workers

2021-02-19 Thread Seamus Abshere
hi Amit,

Thanks so much for doing this. I had created

https://commitfest.postgresql.org/32/2987/

and it looks like it now shows your patch as the one to use. Let me know if I 
should do anything else.

Best,
Seamus

On Fri, Feb 19, 2021, at 2:30 AM, Amit Langote wrote:
> On Tue, Feb 16, 2021 at 3:05 PM Amit Langote  wrote:
> > On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere  wrote:
> > > Here we go, my first patch... solves 
> > > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520e...@www.fastmail.com
> >
> > Thanks for sending the patch here.
> >
> > It seems you haven't made enough changes for reloptions code to
> > recognize parallel_workers as valid for partitioned tables, because
> > even with the patch applied, I get this:
> >
> > create table rp (a int) partition by range (a);
> > create table rp1 partition of rp for values from (minvalue) to (0);
> > create table rp2 partition of rp for values from (0) to (maxvalue);
> > alter table rp set (parallel_workers = 1);
> > ERROR:  unrecognized parameter "parallel_workers"
> >
> > You need this:
> >
> > diff --git a/src/backend/access/common/reloptions.c
> > b/src/backend/access/common/reloptions.c
> > index 029a73325e..9eb8a0c10d 100644
> > --- a/src/backend/access/common/reloptions.c
> > +++ b/src/backend/access/common/reloptions.c
> > @@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
> > {
> > "parallel_workers",
> > "Number of parallel processes that can be used per
> > executor node for this relation.",
> > -   RELOPT_KIND_HEAP,
> > +   RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
> > ShareUpdateExclusiveLock
> > },
> > -1, 0, 1024
> >
> > which tells reloptions parsing code that parallel_workers is
> > acceptable for both heap and partitioned relations.
> >
> > Other comments on the patch:
> >
> > * This function calling style, where the first argument is not placed
> > on the same line as the function itself, is not very common in
> > Postgres:
> >
> > +   /* First see if there is a root-level setting for parallel_workers 
> > */
> > +   parallel_workers = compute_parallel_worker(
> > +   rel,
> > +   -1,
> > +   -1,
> > +   max_parallel_workers_per_gather
> > +
> >
> > This makes the new code look very different from the rest of the
> > codebase.  Better to stick to existing styles.
> >
> > 2. It might be a good idea to use this opportunity to add a function,
> > say compute_append_parallel_workers(), for the code that does what the
> > function name says.  Then the patch will simply add the new
> > compute_parallel_worker() call at the top of that function.
> >
> > 3. I think we should consider the Append parent relation's
> > parallel_workers ONLY if it is a partitioned relation, because it
> > doesn't make a lot of sense for other types of parent relations.  So
> > the new code should look like this:
> >
> > if (IS_PARTITIONED_REL(rel))
> > parallel_workers = compute_parallel_worker(rel, -1, -1,
> > max_parallel_workers_per_gather);
> >
> > 4. Maybe it also doesn't make sense to consider the parent relation's
> > parallel_workers if Parallel Append is disabled
> > (enable_parallel_append = off).  That's because with a simple
> > (non-parallel) Append running under Gather, all launched parallel
> > workers process the same partition before moving to the next one.
> > OTOH, one's intention of setting parallel_workers on the parent
> > partitioned table would most likely be to distribute workers across
> > partitions, which is only possible with parallel Append
> > (enable_parallel_append = on).  So, the new code should look like
> > this:
> >
> > if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
> > parallel_workers = compute_parallel_worker(rel, -1, -1,
> > max_parallel_workers_per_gather);
> 
> Here is an updated version of the Seamus' patch that takes into
> account these and other comments received on this thread so far.
> Maybe warrants adding some tests too but I haven't.
> 
> Seamus, please register this patch in the next commit-fest:
> https://commitfest.postgresql.org/32/
> 
> If you haven't already, you will need to create a community account to
> use that site.
> 
> -- 
> Amit Langote
> EDB: http://www.enterprisedb.com
> 
> Attachments:
> * v2-0001-Allow-setting-parallel_workers-on-partitioned-tab.patch




Re: A reloption for partitioned tables - parallel_workers

2021-02-15 Thread Seamus Abshere
hi,

Here we go, my first patch... solves 
https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520e...@www.fastmail.com

Best,
Seamus

On Mon, Feb 15, 2021, at 3:53 AM, Amit Langote wrote:
> Hi Seamus,
> 
> On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere  wrote:
> > It turns out parallel_workers may be a useful reloption for certain uses of 
> > partitioned tables, at least if they're made up of fancy column store 
> > partitions (see 
> > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> >
> > Would somebody tell me what I'm doing wrong? I would love to submit a patch 
> > but I'm stuck:
> >
> > diff --git a/src/backend/optimizer/path/allpaths.c 
> > b/src/backend/optimizer/path/allpaths.c
> > index cd3fdd259c..f1ade035ac 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -3751,6 +3751,7 @@ compute_parallel_worker(RelOptInfo *rel, double 
> > heap_pages, double index_pages,
> >  * If the user has set the parallel_workers reloption, use that; 
> > otherwise
> >  * select a default number of workers.
> >  */
> > +   // I want to affect this
> > if (rel->rel_parallel_workers != -1)
> > parallel_workers = rel->rel_parallel_workers;
> > else
> >
> > so I do this
> >
> > diff --git a/src/backend/access/common/reloptions.c 
> > b/src/backend/access/common/reloptions.c
> > index c687d3ee9e..597b209bfb 100644
> > --- a/src/backend/access/common/reloptions.c
> > +++ b/src/backend/access/common/reloptions.c
> > @@ -1961,13 +1961,15 @@ build_local_reloptions(local_relopts *relopts, 
> > Datum options, bool validate)
> >  bytea *
> >  partitioned_table_reloptions(Datum reloptions, bool validate)
> >  {
> > -   /*
> > -* There are no options for partitioned tables yet, but this is 
> > able to do
> > -* some validation.
> > -*/
> > +   static const relopt_parse_elt tab[] = {
> > +   {"parallel_workers", RELOPT_TYPE_INT,
> > +   offsetof(StdRdOptions, parallel_workers)},
> > +   };
> > +
> > return (bytea *) build_reloptions(reloptions, validate,
> >   
> > RELOPT_KIND_PARTITIONED,
> > - 
> > 0, NULL, 0);
> > + 
> > sizeof(StdRdOptions),
> > + 
> > tab, lengthof(tab));
> >  }
> >
> > That "works":
> >
> > postgres=# alter table test_3pd_cstore_partitioned set (parallel_workers = 
> > 33);
> > ALTER TABLE
> > postgres=# select relname, relkind, reloptions from pg_class where relname 
> > = 'test_3pd_cstore_partitioned';
> >relname   | relkind |  reloptions
> > -+-+---
> >  test_3pd_cstore_partitioned | p   | {parallel_workers=33}
> > (1 row)
> >
> > But it seems to be ignored:
> >
> > diff --git a/src/backend/optimizer/path/allpaths.c 
> > b/src/backend/optimizer/path/allpaths.c
> > index cd3fdd259c..c68835ce38 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -3751,6 +3751,8 @@ compute_parallel_worker(RelOptInfo *rel, double 
> > heap_pages, double index_pages,
> >  * If the user has set the parallel_workers reloption, use that; 
> > otherwise
> >  * select a default number of workers.
> >  */
> > +   // I want to affect this, but this assertion always passes
> > +   Assert(rel->rel_parallel_workers == -1)
> > if (rel->rel_parallel_workers != -1)
> > parallel_workers = rel->rel_parallel_workers;
> > else
> 
> You may see by inspecting the callers of compute_parallel_worker()
> that it never gets called on a partitioned table, only its leaf
> partitions.  Maybe you could try calling compute_parallel_worker()
> somewhere in add_paths_to_append_rel(), which has this code to figure
> out parallel_workers to use for a parallel Append path for a given
> partitioned table:
> 
> /* Find the highest number of workers requested for any subpath. */
> foreach(lc, partial_subpaths)
>

A reloption for partitioned tables - parallel_workers

2021-02-15 Thread Seamus Abshere
hi,

It turns out parallel_workers may be a useful reloption for certain uses of 
partitioned tables, at least if they're made up of fancy column store 
partitions (see 
https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).

Would somebody tell me what I'm doing wrong? I would love to submit a patch but 
I'm stuck:

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..f1ade035ac 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3751,6 +3751,7 @@ compute_parallel_worker(RelOptInfo *rel, double 
heap_pages, double index_pages,
 * If the user has set the parallel_workers reloption, use that; 
otherwise
 * select a default number of workers.
 */
+   // I want to affect this
if (rel->rel_parallel_workers != -1)
parallel_workers = rel->rel_parallel_workers;
else

so I do this

diff --git a/src/backend/access/common/reloptions.c 
b/src/backend/access/common/reloptions.c
index c687d3ee9e..597b209bfb 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1961,13 +1961,15 @@ build_local_reloptions(local_relopts *relopts, Datum 
options, bool validate)
 bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
-   /*
-* There are no options for partitioned tables yet, but this is able to 
do
-* some validation.
-*/
+   static const relopt_parse_elt tab[] = {
+   {"parallel_workers", RELOPT_TYPE_INT,
+   offsetof(StdRdOptions, parallel_workers)},
+   };
+
return (bytea *) build_reloptions(reloptions, validate,
  
RELOPT_KIND_PARTITIONED,
- 0, 
NULL, 0);
+ 
sizeof(StdRdOptions),
+ tab, 
lengthof(tab));
 }

That "works":

postgres=# alter table test_3pd_cstore_partitioned set (parallel_workers = 33);
ALTER TABLE
postgres=# select relname, relkind, reloptions from pg_class where relname = 
'test_3pd_cstore_partitioned';
   relname   | relkind |  reloptions   
-+-+---
 test_3pd_cstore_partitioned | p   | {parallel_workers=33}
(1 row)

But it seems to be ignored:

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..c68835ce38 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3751,6 +3751,8 @@ compute_parallel_worker(RelOptInfo *rel, double 
heap_pages, double index_pages,
 * If the user has set the parallel_workers reloption, use that; 
otherwise
 * select a default number of workers.
 */
+   // I want to affect this, but this assertion always passes
+   Assert(rel->rel_parallel_workers == -1)
if (rel->rel_parallel_workers != -1)
parallel_workers = rel->rel_parallel_workers;
else

Thanks and please forgive my code pasting etiquette as this is my first post to 
pgsql-hackers and I'm not quite sure what the right format is.

Thank you,
Seamus