subject:"\[HACKERS\] Partition\-wise aggregation\/grouping"

Re: [HACKERS] Partition-wise aggregation/grouping

2019-07-05 Thread Andrey Lepikhov


> Regarding apply_scanjoin_target_to_paths in 0001 and 0007, it seems
> like what happens is: we first build an Append path for the topmost
> scan/join rel.  That uses paths from the individual relations that
> don't necessarily produce the final scan/join target.  Then we mutate
> those relations in place during partition-wise aggregate so that they
> now do produce the final scan/join target and generate some more paths
> using the results.  So there's an ordering dependency, and the same
> pathlist represents different things at different times.  That is, I
> suppose, not technically any worse than what we're doing for the
> scan/join rel's pathlist in general, but here there's the additional
> complexity that the paths get used both before and after being
> mutated.  The UPPERREL_TLIST proposal would clean this up, although I
> realize that has unresolved issues.

I discouraged by this logic.
Now I use set_rel_pathlist_hook and make some optimizations at partition 
scanning paths. But apply_scanjoin_target_to_paths() deletes pathlist 
and violates all optimizations.
May be it is possible to introduce some flag, that hook can set to 
prevent pathlist cleaning?


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-04-10 Thread Jeevan Chalke

On Tue, Apr 10, 2018 at 7:30 PM, David Steele  wrote:

> Hi Jeevan,
>
> On 4/2/18 10:57 AM, Robert Haas wrote:
> > On Thu, Mar 29, 2018 at 9:02 AM, Jeevan Chalke
> >  wrote:
> >> Yep, I see the risk.
> >
> > Committed 0001 last week and 0002 just now.  I don't really see 0003 a
> > a critical need.  If somebody demonstrates that this saves a
> > meaningful amount of planning time, we can consider that part for a
> > future release.
>
> The bulk of this patch was committed so I have marked it that way.
>

Thanks, David.


>
> If you would like to pursue patch 03 I think it would be best to start a
> new thread and demonstrate how the patch will improve performance.
>

Sure.


>
> Regards,
> --
> -David
> da...@pgmasters.net
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-04-10 Thread David Steele

Hi Jeevan,

On 4/2/18 10:57 AM, Robert Haas wrote:
> On Thu, Mar 29, 2018 at 9:02 AM, Jeevan Chalke
>  wrote:
>> Yep, I see the risk.
> 
> Committed 0001 last week and 0002 just now.  I don't really see 0003 a
> a critical need.  If somebody demonstrates that this saves a
> meaningful amount of planning time, we can consider that part for a
> future release.

The bulk of this patch was committed so I have marked it that way.

If you would like to pursue patch 03 I think it would be best to start a
new thread and demonstrate how the patch will improve performance.

Regards,
-- 
-David
da...@pgmasters.net

Re: [HACKERS] Partition-wise aggregation/grouping

2018-04-02 Thread Robert Haas

On Thu, Mar 29, 2018 at 9:02 AM, Jeevan Chalke
 wrote:
> Yep, I see the risk.

Committed 0001 last week and 0002 just now.  I don't really see 0003 a
a critical need.  If somebody demonstrates that this saves a
meaningful amount of planning time, we can consider that part for a
future release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-29 Thread Jeevan Chalke

On Thu, Mar 29, 2018 at 4:13 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Wed, Mar 28, 2018 at 7:21 PM, Ashutosh Bapat
>  wrote:
> >
> > Ah sorry, I was wrong about remote_conds. remote_conds and local_conds
> > are basically the conditions on the relation being pushed down.
> > havingQuals are conditions on a grouped relation so treating them like
> > baserestrictinfo or join conditions looks more straight forward,
> > rather than having a separate member in PgFdwRelationInfo. So, remote
> > havingQuals go into remote_conds and local havingQuals go to
> > local_conds.
>
> Looks like we already do that. Then we have remote_conds, local_conds
> which together should be equivalent to havingQual. Storing all those
> three doesn't make sense. In future someone may use havingQual instead
> of remote_conds/local_conds just because its available and then there
> is risk of these three lists going out of sync.
>

Yep, I see the risk.


>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-29 Thread Jeevan Chalke

On Wed, Mar 28, 2018 at 7:21 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Tue, Mar 27, 2018 at 2:43 PM, Jeevan Chalke
>  wrote:
>
> > I am not sure on what we should Assetrt here. Note that we end-up here
> only
> > when doing grouping, and thus I don't think we need any Assert here.
> > Let me know if I missed anything.
>
> Since we are just checking whether it's an upper relation and directly
> using root->upper_targets[UPPERREL_GROUP_AGG], I thought we could add
> an assert to verify that it's really the grouping rel we are dealing
> with. But I guess, we can't really check that from given relation. But
> then for a grouped rel we can get its target from RelOptInfo. So, we
> shouldn't need to root->upper_targets[UPPERREL_GROUP_AGG]. Am I
> missing something? For other upper relations we do not set the target
> yet but then we could assert that there exists one in the grouped
> relation.
>

Yes. We fetch target from the grouped_rel itself.
Added Assert() per out off-list discussion.


>
> >
> >>
> >>
> >> -get_agg_clause_costs(root, (Node *)
> >> root->parse->havingQual,
> >> +get_agg_clause_costs(root, fpinfo->havingQual,
> >>   AGGSPLIT_SIMPLE, );
> >>  }
> >> Should we pass agg costs as well through GroupPathExtraData to avoid
> >> calculating it again in this function?
> >
> >
> > Adding an extra member in GroupPathExtraData just for FDW does not look
> good
> > to me.
> > But yes, if we do that, then we can save this calculation.
> > Let me know if its OK to have an extra member for just FDW use, will
> prepare
> > a separate patch for that.
>
> I think that should be fine. A separate patch would be good, so that a
> committer can decide whether or not to include it.
>

Attached patch 0003 for this.


>
>
> >>
> >> +Node   *havingQual;
> >> I am wondering whether we could use remote_conds member for storing
> this.
> >
> >
> > This havingQual is later checked for shippability and classified into
> > pushable and non-pushable quals and stored in remote_conds and
> local_conds
> > respectively.
> > Storing it directly in remote_conds and then splitting it does not look
> good
> > to me.
> > Also, remote_conds is list of RestrictInfo nodes whereas havingQual is
> not.
> > So using that for storing havingQual does not make sense. So better to
> have
> > a separate member in PgFdwRelationInfo.
>
> Ah sorry, I was wrong about remote_conds. remote_conds and local_conds
> are basically the conditions on the relation being pushed down.
> havingQuals are conditions on a grouped relation so treating them like
> baserestrictinfo or join conditions looks more straight forward,
> rather than having a separate member in PgFdwRelationInfo. So, remote
> havingQuals go into remote_conds and local havingQuals go to
> local_conds.
>

OK. Agree.
In this version, I have not added anything in PgFdwRelationInfo.
Having qual is needed at two places; (1) in foreign_grouping_ok() to check
shippability, so passed this translated HAVING qual as a parameter to it,
and (2) estimating aggregates costs in estimate_path_cost_size(); there we
can use havingQual from root itself as costs won't change for parent and
child.
Thus no need of storing a havingQual in PgFdwRelationInfo.

Thanks


--
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
From 761d46b1b255a7521529bbe7d1b128c9d79d6b4e Mon Sep 17 00:00:00 2001
From: Jeevan Chalke 
Date: Tue, 27 Mar 2018 14:23:00 +0530
Subject: [PATCH 1/3] Remove target from GroupPathExtraData, instead fetch it
 from grouped_rel.

---
 src/backend/optimizer/plan/planner.c | 4 +---
 src/include/nodes/relation.h | 2 --
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a19f5d0..d4acde6 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3734,7 +3734,6 @@ create_grouping_paths(PlannerInfo *root,
 			flags |= GROUPING_CAN_PARTIAL_AGG;
 
 		extra.flags = flags;
-		extra.target = target;
 		extra.target_parallel_safe = target_parallel_safe;
 		extra.havingQual = parse->havingQual;
 		extra.targetList = parse->targetList;
@@ -6909,7 +6908,7 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 	int			cnt_parts;
 	List	   *grouped_live_children = NIL;
 	List	   *partially_grouped_live_children = NIL;
-	PathTarget *target = extra->target;
+	PathTarget *target = grouped_rel->reltarget;
 
 	Assert(patype != PARTITIONWISE_AGGREGATE_NONE);
 	Assert(patype != PARTITIONWISE_AGGREGATE_PARTIAL ||
@@ -6943,7 +6942,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			adjust_appendrel_attrs(root,
    (Node *)

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-29 Thread Ashutosh Bapat

On Wed, Mar 28, 2018 at 7:21 PM, Ashutosh Bapat
 wrote:
>
> Ah sorry, I was wrong about remote_conds. remote_conds and local_conds
> are basically the conditions on the relation being pushed down.
> havingQuals are conditions on a grouped relation so treating them like
> baserestrictinfo or join conditions looks more straight forward,
> rather than having a separate member in PgFdwRelationInfo. So, remote
> havingQuals go into remote_conds and local havingQuals go to
> local_conds.

Looks like we already do that. Then we have remote_conds, local_conds
which together should be equivalent to havingQual. Storing all those
three doesn't make sense. In future someone may use havingQual instead
of remote_conds/local_conds just because its available and then there
is risk of these three lists going out of sync.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-28 Thread Ashutosh Bapat

On Tue, Mar 27, 2018 at 2:43 PM, Jeevan Chalke
 wrote:

>>  else if (IS_UPPER_REL(foreignrel))
>>  {
>>  PgFdwRelationInfo *ofpinfo;
>> -PathTarget *ptarget =
>> root->upper_targets[UPPERREL_GROUP_AGG];
>> +PathTarget *ptarget = fpinfo->grouped_target;
>>
>> I think we need an assert there to make sure that the upper relation is a
>> grouping relation. That way any future push down will notice it.
>
>
> I am not sure on what we should Assetrt here. Note that we end-up here only
> when doing grouping, and thus I don't think we need any Assert here.
> Let me know if I missed anything.

Since we are just checking whether it's an upper relation and directly
using root->upper_targets[UPPERREL_GROUP_AGG], I thought we could add
an assert to verify that it's really the grouping rel we are dealing
with. But I guess, we can't really check that from given relation. But
then for a grouped rel we can get its target from RelOptInfo. So, we
shouldn't need to root->upper_targets[UPPERREL_GROUP_AGG]. Am I
missing something? For other upper relations we do not set the target
yet but then we could assert that there exists one in the grouped
relation.

>
>>
>>
>> -get_agg_clause_costs(root, (Node *)
>> root->parse->havingQual,
>> +get_agg_clause_costs(root, fpinfo->havingQual,
>>   AGGSPLIT_SIMPLE, );
>>  }
>> Should we pass agg costs as well through GroupPathExtraData to avoid
>> calculating it again in this function?
>
>
> Adding an extra member in GroupPathExtraData just for FDW does not look good
> to me.
> But yes, if we do that, then we can save this calculation.
> Let me know if its OK to have an extra member for just FDW use, will prepare
> a separate patch for that.

I think that should be fine. A separate patch would be good, so that a
committer can decide whether or not to include it.

>>
>> +Node   *havingQual;
>> I am wondering whether we could use remote_conds member for storing this.
>
>
> This havingQual is later checked for shippability and classified into
> pushable and non-pushable quals and stored in remote_conds and local_conds
> respectively.
> Storing it directly in remote_conds and then splitting it does not look good
> to me.
> Also, remote_conds is list of RestrictInfo nodes whereas havingQual is not.
> So using that for storing havingQual does not make sense. So better to have
> a separate member in PgFdwRelationInfo.

Ah sorry, I was wrong about remote_conds. remote_conds and local_conds
are basically the conditions on the relation being pushed down.
havingQuals are conditions on a grouped relation so treating them like
baserestrictinfo or join conditions looks more straight forward,
rather than having a separate member in PgFdwRelationInfo. So, remote
havingQuals go into remote_conds and local havingQuals go to
local_conds.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-27 Thread Jeevan Chalke

On Mon, Mar 26, 2018 at 5:24 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Fri, Mar 23, 2018 at 4:35 PM, Jeevan Chalke
>  wrote:
> >
> > Changes related to postgres_fdw which allows pushing aggregate on the
> > foreign server is not yet committed. Due to this, we will end up getting
> an
> > error when we have foreign partitions + aggregation.
> >
> > Attached 0001 patch here (0006 from my earlier patch-set) which adds
> support
> > for this and thus will not have any error.
>

I have observed that, target member in GroupPathExtraData is not needed as
we store the target in grouped_rel itself.
Attached 0001 patch to remove that.


>
>
>  else if (IS_UPPER_REL(foreignrel))
>  {
>  PgFdwRelationInfo *ofpinfo;
> -PathTarget *ptarget = root->upper_targets[UPPERREL_
> GROUP_AGG];
> +PathTarget *ptarget = fpinfo->grouped_target;
>
> I think we need an assert there to make sure that the upper relation is a
> grouping relation. That way any future push down will notice it.
>

I am not sure on what we should Assetrt here. Note that we end-up here only
when doing grouping, and thus I don't think we need any Assert here.
Let me know if I missed anything.


>
> -get_agg_clause_costs(root, (Node *)
> root->parse->havingQual,
> +get_agg_clause_costs(root, fpinfo->havingQual,
>   AGGSPLIT_SIMPLE, );
>  }
> Should we pass agg costs as well through GroupPathExtraData to avoid
> calculating it again in this function?
>

Adding an extra member in GroupPathExtraData just for FDW does not look
good to me.
But yes, if we do that, then we can save this calculation.
Let me know if its OK to have an extra member for just FDW use, will
prepare a separate patch for that.


>
>  /*
> +/*
> + * Store passed-in target and havingQual in fpinfo. If its a foreign
> + * partition, then path target and HAVING quals fetched from the root
> are
> + * not correct as Vars within it won't match with this child relation.
> + * However, server passed them through extra and thus fetch from it.
> + */
> +if (extra)
> +{
> +/* Partial aggregates are not supported. */
> +Assert(extra->patype != PARTITIONWISE_AGGREGATE_PARTIAL);
> +
> +fpinfo->grouped_target = extra->target;
> +fpinfo->havingQual = extra->havingQual;
> +}
> +else
> +{
> +fpinfo->grouped_target = root->upper_targets[UPPERREL_GROUP_AGG];
> +fpinfo->havingQual = parse->havingQual;
> +}
> I think both these cases, extra should always be present whether a child
> relation or a parent relation. Just pick from extra always.
>

Yes.
Done.


>
>  /* Grouping information */
>  List   *grouped_tlist;
> +PathTarget *grouped_target;
>
> We should use the target stored in the grouped rel directly.
>

Yep.


>
> +Node   *havingQual;
> I am wondering whether we could use remote_conds member for storing this.
>

This havingQual is later checked for shippability and classified into
pushable and non-pushable quals and stored in remote_conds and local_conds
respectively.
Storing it directly in remote_conds and then splitting it does not look
good to me.
Also, remote_conds is list of RestrictInfo nodes whereas havingQual is not.
So using that for storing havingQual does not make sense. So better to have
a separate member in PgFdwRelationInfo.


>  /*
>   * Likewise, copy the relids that are represented by this foreign
> scan. An
> - * upper rel doesn't have relids set, but it covers all the base
> relations
> - * participating in the underlying scan, so use root's all_baserels.
> + * upper rel (but not the other upper rel) doesn't have relids set,
> but it
> + * covers all the base relations participating in the underlying
> scan, so
> + * use root's all_baserels.
>   */
>
> This is correct only for "other" grouping relations. We are yet to
> decide what to do
> for the other upper relations.
>

I have removed this comment change as existing comments look good after
doing following changes:


>
> -if (IS_UPPER_REL(rel))
> +if (IS_UPPER_REL(rel) && !IS_OTHER_REL(rel))
> I guess, this condition boils down to rel->reloptkind == RELOPT_UPPER_REL.
> Use
> it that way?
>

Done.

Attached 0002 for this.

Thanks


>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
From dabe9c41402cdc8dc4afd3bb01202de654e43bf4 Mon Sep 17 00:00:00 2001
From: Jeevan Chalke 
Date: Tue, 27 Mar 2018 14:23:00 +0530
Subject: [PATCH 1/2] Remove target from GroupPathExtraData, instead fetch it
 from grouped_rel.

---
 src/backend/optimizer/plan/planner.c | 4 +---
 src/include/nodes/relation.h |

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-27 Thread Jeevan Chalke

On Tue, Mar 27, 2018 at 3:33 AM, Andres Freund  wrote:

> Hi,
>
> On 2018-03-23 17:01:54 +0530, Jeevan Chalke wrote:
> > Attached patch which fixes that.
>
> Thanks, will push. For the future, I'd be more likely to notice if you
> CC me ;)
>

Sure. Thanks.


>
>
> > However, I am not sure whether it is expected to have stable regression
> run
> > with installcheck having local settings.
> > For example, If I have enabale_hashagg = false locally; I will definitely
> > see failures.
> >
> > ISTM, that I am missing Andres point here.
>
> I don't think there's a hard and fast rule here. I personally often
> during development disable parallelism because it makes some things
> harder (you can't easily debug crashes with gdb, benchmarks show larger
> variance, ...).


Yep.


>   There doesn't seem to be an equivalent benefit to
> support running e.g. with enabale_hashagg = false.
>

OK.
Noted.

Thanks for the explanation.


>
> - Andres
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-26 Thread Andres Freund

Hi,

On 2018-03-23 17:01:54 +0530, Jeevan Chalke wrote:
> Attached patch which fixes that.

Thanks, will push. For the future, I'd be more likely to notice if you
CC me ;)

> However, I am not sure whether it is expected to have stable regression run
> with installcheck having local settings.
> For example, If I have enabale_hashagg = false locally; I will definitely
> see failures.
> 
> ISTM, that I am missing Andres point here.

I don't think there's a hard and fast rule here. I personally often
during development disable parallelism because it makes some things
harder (you can't easily debug crashes with gdb, benchmarks show larger
variance, ...).  There doesn't seem to be an equivalent benefit to
support running e.g. with enabale_hashagg = false.

- Andres

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-23 Thread Jeevan Chalke

Hi Robert,

On pgsql-committers Andres reported one concern about test case failure
with installcheck with local settings.
(Sorry, I have not subscribed to that mailing list and thus not able to
reply there).

Attached patch which fixes that.

However, I am not sure whether it is expected to have stable regression run
with installcheck having local settings.
For example, If I have enabale_hashagg = false locally; I will definitely
see failures.

ISTM, that I am missing Andres point here.

Thanks

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index bf8272e..76a8209 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -6,6 +6,8 @@
 SET enable_partitionwise_aggregate TO true;
 -- Enable partitionwise join, which by default is disabled.
 SET enable_partitionwise_join TO true;
+-- Disable parallel plans.
+SET max_parallel_workers_per_gather TO 0;
 --
 -- Tests for list partitioned tables.
 --
@@ -921,6 +923,8 @@ ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0
 ALTER TABLE pagg_tab_ml ATTACH PARTITION pagg_tab_ml_p3 FOR VALUES FROM (20) TO (30);
 INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM') FROM generate_series(0, 2) i;
 ANALYZE pagg_tab_ml;
+-- For Parallel Append
+SET max_parallel_workers_per_gather TO 2;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, but still we do not see a partial aggregation as array_agg()
@@ -1146,7 +1150,6 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
 (12 rows)
 
 -- Parallelism within partitionwise aggregates
-SET max_parallel_workers_per_gather TO 2;
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index f7b5f5a..c60d7d2 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -7,6 +7,8 @@
 SET enable_partitionwise_aggregate TO true;
 -- Enable partitionwise join, which by default is disabled.
 SET enable_partitionwise_join TO true;
+-- Disable parallel plans.
+SET max_parallel_workers_per_gather TO 0;
 
 --
 -- Tests for list partitioned tables.
@@ -206,6 +208,9 @@ ALTER TABLE pagg_tab_ml ATTACH PARTITION pagg_tab_ml_p3 FOR VALUES FROM (20) TO
 INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM') FROM generate_series(0, 2) i;
 ANALYZE pagg_tab_ml;
 
+-- For Parallel Append
+SET max_parallel_workers_per_gather TO 2;
+
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, but still we do not see a partial aggregation as array_agg()
@@ -238,7 +243,6 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
 
 -- Parallelism within partitionwise aggregates
 
-SET max_parallel_workers_per_gather TO 2;
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-23 Thread Jeevan Chalke

On Thu, Mar 22, 2018 at 10:28 PM, Robert Haas  wrote:

> On Thu, Mar 22, 2018 at 6:15 AM, Jeevan Chalke
>  wrote:
> > Leeks cleaner now. Thanks for refactoring it.
> >
> > I have merged these changes and flatten all previuos changes into the
> main
> > patch.
>
> Committed 0001-0005.


Thanks Robert.


>   I made a few further modifications.  These were
> mostly cosmetic, but with two exceptions:
>
> 1. I moved one set_cheapest() call to avoid doing that twice for the
> top-level grouped_rel.
>
> 2. I removed the logic to set partition properties for grouped_rels.
> As far as I can see, there's nothing that needs this.  It would be
> important if we wanted subsequent planning stages to be able to do
> partition-wise stuff, e.g. when doing window functions or setops, or
> at higher query levels.  Maybe we'll have that someday; until then, I
> think this is just a waste of cycles.
>

OK.

Changes related to postgres_fdw which allows pushing aggregate on the
foreign server is not yet committed. Due to this, we will end up getting an
error when we have foreign partitions + aggregation.

Attached 0001 patch here (0006 from my earlier patch-set) which adds
support for this and thus will not have any error.

Thanks


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
From 1173c609f6c6b2b45c0d1ce05533b840f7885717 Mon Sep 17 00:00:00 2001
From: Jeevan Chalke 
Date: Fri, 23 Mar 2018 14:23:53 +0530
Subject: [PATCH] Teach postgres_fdw to push aggregates for child relations
 too.

GetForeignUpperPaths() now takes an extra void parameter which will
be used to pass any additional details required to create an upper
path at the remote server. However, we support only grouping over
remote server today and thus it passes grouping specific details i.e.
GroupPathExtradata, NULL otherwise.

Since we don't know how to get a partially aggregated result from a
remote server, only full aggregation is pushed on the remote server.
---
 contrib/postgres_fdw/expected/postgres_fdw.out | 132 +
 contrib/postgres_fdw/postgres_fdw.c|  48 ++---
 contrib/postgres_fdw/postgres_fdw.h|   2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql  |  51 ++
 doc/src/sgml/fdwhandler.sgml   |   8 +-
 src/backend/optimizer/plan/createplan.c|   7 +-
 src/backend/optimizer/plan/planner.c   |  29 +++---
 src/backend/optimizer/prep/prepunion.c |   2 +-
 src/include/foreign/fdwapi.h   |   3 +-
 src/include/optimizer/planner.h|   3 +-
 10 files changed, 253 insertions(+), 32 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d6e387..a211aa9 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7852,3 +7852,135 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
 (14 rows)
 
 RESET enable_partitionwise_join;
+-- ===
+-- test partitionwise aggregates
+-- ===
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+CREATE TABLE pagg_tab_p1 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p2 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p3 (LIKE pagg_tab);
+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20;
+-- Create foreign partitions
+CREATE FOREIGN TABLE fpagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10) SERVER loopback OPTIONS (table_name 'pagg_tab_p1');
+CREATE FOREIGN TABLE fpagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'pagg_tab_p2');;
+CREATE FOREIGN TABLE fpagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30) SERVER loopback OPTIONS (table_name 'pagg_tab_p3');;
+ANALYZE pagg_tab;
+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan with partitionwise aggregates is disabled
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+  QUERY PLAN   
+---
+ Sort
+   Sort Key: fpagg_tab_p1.a
+

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-22 Thread Robert Haas

On Thu, Mar 22, 2018 at 6:15 AM, Jeevan Chalke
 wrote:
> Leeks cleaner now. Thanks for refactoring it.
>
> I have merged these changes and flatten all previuos changes into the main
> patch.

Committed 0001-0005.  I made a few further modifications.  These were
mostly cosmetic, but with two exceptions:

1. I moved one set_cheapest() call to avoid doing that twice for the
top-level grouped_rel.

2. I removed the logic to set partition properties for grouped_rels.
As far as I can see, there's nothing that needs this.  It would be
important if we wanted subsequent planning stages to be able to do
partition-wise stuff, e.g. when doing window functions or setops, or
at higher query levels.  Maybe we'll have that someday; until then, I
think this is just a waste of cycles.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-22 Thread Jeevan Chalke

On Thu, Mar 22, 2018 at 10:06 AM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Thu, Mar 22, 2018 at 3:26 AM, Robert Haas 
> wrote:
> >
> > Is there a good reason not to use input_rel->relids as the input to
> > fetch_upper_rel() in all cases, rather than just at subordinate
> > levels?
> >
>
> That would simplify some code in these patches. We have set
> upper_rel->relids to NULL for non-other upper relation since Tom
> expected to use relids to mean something other than scan/join relids.
> With these patch-sets for grouping rels we are using upper_rel->relids
> to the relids of underlying scan/join relation. So it does make sense
> to set relids to input_rel->relids for all the grouping rels whether
> "other" or non-"other" grouping rels.
>
> But with this change, we have to change all the existing code to pass
> input_rel->relids to fetch_upper_rel(). If we don't do that or in
> future somebody calls that function with relids = NULL we will produce
> two relations which are supposed to do the same thing but have
> different relids set. That's because fetch_upper_rel() creates a
> relation if one does not exist whether or not the caller intends to
> create one. We should probably create two functions 1. to build an
> upper relation and 2. to search for it similar to what we have done
> for join relations and base relation. The other possibility is to pass
> a flag to fetch_upper_rel() indicating whether a caller intends to
> create a new relation when one doesn't exist. With this design a
> caller can be sure that an upper relation will not be created when it
> wants to just fetch an existing relation (and error out/assert if it
> doesn't find one.).
>

Like Ashutosh said, splitting fetch_upper_rel() in two functions,
build_upper_rel() and find_upper_rel() looks better.

However, I am not sure whether setting relids in a top-most grouped rel is
a good idea or not. I remember we need this while working on Aggregate
PushDown, and in [1] Tom Lane opposed the idea of setting the relids in
grouped_rel.

If we want to go with this, then I think it should be done as a separate
stand-alone patch.

[1]
https://www.postgresql.org/message-id/CAFjFpRdUz6h6cmFZFYAngmQAX8Zvo+MZsPXidZ077h=gp9b...@mail.gmail.com


>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-22 Thread Jeevan Chalke

On Thu, Mar 22, 2018 at 3:26 AM, Robert Haas  wrote:

> On Wed, Mar 21, 2018 at 11:33 AM, Jeevan Chalke
>  wrote:
> > Let me try to explain this:
> > 1. GROUPING_CAN_PARTITIONWISE_AGG
> > 2. extra->is_partial_aggregation
> > 3. extra->perform_partial_partitionwise_aggregation
>
> Please find attached an incremental patch that attempts to refactor
> this logic into a simpler form.  What I've done is merged all three of
> the above Booleans into a single state variable called 'patype', which
> can be one of PARTITIONWISE_AGGREGATE_NONE,
> PARTITIONWISE_AGGREGATE_FULL, and PARTITIONWISE_AGGREGATE_PARTIAL.
> When create_ordinary_grouping_paths() is called, extra.patype is the
> value for the parent relation; that function computes a new value and
> passes it down to create_partitionwise_grouping_paths(), which inserts
> into the new 'extra' structure for the child.
>
> Essentially, in your system, extra->is_partial_aggregation and
> extra->perform_partial_partitionwise_aggregation both corresponded to
> whether or not patype == PARTITIONWISE_AGGREGATE_PARTIAL, but the
> former indicated whether the *parent* was doing partition-wise
> aggregation (and thus we needed to generate only partial paths) while
> the latter indicated whether the *current* relation was doing
> partition-wise aggregation (and thus we needed to force creation of
> partially_grouped_rel).  This took me a long time to understand
> because of the way the fields were named; they didn't indicate that
> one was for the parent and one for the current relation.  Meanwhile,
> GROUPING_CAN_PARTITIONWISE_AGG indicated whether partition-wise
> aggregate should be tried at all for the current relation; there was
> no analogous indicator for the parent relation because we can't be
> processing a child at all if the parent didn't decide to do
> partition-wise aggregation.  So to know what was happening for the
> current relation you had to look at GROUPING_CAN_PARTITIONWISE_AGG +
> extra->perform_partial_partitionwise_aggregation, and to know what was
> happening for the parent relation you just looked at
> extra->is_partial_aggregation.  With this proposed refactoring patch,
> there's just one patype value at each level, which at least to me
> seems simpler.  I tried to improve the comments somewhat, too.
>

Leeks cleaner now. Thanks for refactoring it.

I have merged these changes and flatten all previuos changes into the main
patch.


>
> You have some long lines that it would be good to break, like this:
>
>  child_extra.targetList = (List *) adjust_appendrel_attrs(root,
>
> (Node *) extra->targetList,
>
> nappinfos,
>
> appinfos);
>
> If you put a newline after (List *), the formatting will come out
> nicer -- it will fit within 80 columns.  Please go through the patches
> and make these kinds of changes for lines over 80 columns where
> possible.
>

OK. Done.
I was under impression that it is not good to split the first line. What if
in split version, while applying any new patch or in Merge process
something gets inserted between them? That will result into something
unexpected. But I am not sure if that's ever a possibility.


> I guess we'd better move the GROUPING_CAN_* constants to a header
> file, if they're going to be exposed through GroupPathExtraData.  That
> can go in some refactoring patch.
>

Yep. Moved that into 0003 where we create GroupPathExtraData.


> Is there a good reason not to use input_rel->relids as the input to
> fetch_upper_rel() in all cases, rather than just at subordinate
> levels?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v23.tar.gz
Description: application/gzip

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-21 Thread Ashutosh Bapat

On Thu, Mar 22, 2018 at 3:26 AM, Robert Haas  wrote:
>
> Is there a good reason not to use input_rel->relids as the input to
> fetch_upper_rel() in all cases, rather than just at subordinate
> levels?
>

That would simplify some code in these patches. We have set
upper_rel->relids to NULL for non-other upper relation since Tom
expected to use relids to mean something other than scan/join relids.
With these patch-sets for grouping rels we are using upper_rel->relids
to the relids of underlying scan/join relation. So it does make sense
to set relids to input_rel->relids for all the grouping rels whether
"other" or non-"other" grouping rels.

But with this change, we have to change all the existing code to pass
input_rel->relids to fetch_upper_rel(). If we don't do that or in
future somebody calls that function with relids = NULL we will produce
two relations which are supposed to do the same thing but have
different relids set. That's because fetch_upper_rel() creates a
relation if one does not exist whether or not the caller intends to
create one. We should probably create two functions 1. to build an
upper relation and 2. to search for it similar to what we have done
for join relations and base relation. The other possibility is to pass
a flag to fetch_upper_rel() indicating whether a caller intends to
create a new relation when one doesn't exist. With this design a
caller can be sure that an upper relation will not be created when it
wants to just fetch an existing relation (and error out/assert if it
doesn't find one.).

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-21 Thread Robert Haas

On Wed, Mar 21, 2018 at 11:33 AM, Jeevan Chalke
 wrote:
> Let me try to explain this:
> 1. GROUPING_CAN_PARTITIONWISE_AGG
> 2. extra->is_partial_aggregation
> 3. extra->perform_partial_partitionwise_aggregation

Please find attached an incremental patch that attempts to refactor
this logic into a simpler form.  What I've done is merged all three of
the above Booleans into a single state variable called 'patype', which
can be one of PARTITIONWISE_AGGREGATE_NONE,
PARTITIONWISE_AGGREGATE_FULL, and PARTITIONWISE_AGGREGATE_PARTIAL.
When create_ordinary_grouping_paths() is called, extra.patype is the
value for the parent relation; that function computes a new value and
passes it down to create_partitionwise_grouping_paths(), which inserts
into the new 'extra' structure for the child.

Essentially, in your system, extra->is_partial_aggregation and
extra->perform_partial_partitionwise_aggregation both corresponded to
whether or not patype == PARTITIONWISE_AGGREGATE_PARTIAL, but the
former indicated whether the *parent* was doing partition-wise
aggregation (and thus we needed to generate only partial paths) while
the latter indicated whether the *current* relation was doing
partition-wise aggregation (and thus we needed to force creation of
partially_grouped_rel).  This took me a long time to understand
because of the way the fields were named; they didn't indicate that
one was for the parent and one for the current relation.  Meanwhile,
GROUPING_CAN_PARTITIONWISE_AGG indicated whether partition-wise
aggregate should be tried at all for the current relation; there was
no analogous indicator for the parent relation because we can't be
processing a child at all if the parent didn't decide to do
partition-wise aggregation.  So to know what was happening for the
current relation you had to look at GROUPING_CAN_PARTITIONWISE_AGG +
extra->perform_partial_partitionwise_aggregation, and to know what was
happening for the parent relation you just looked at
extra->is_partial_aggregation.  With this proposed refactoring patch,
there's just one patype value at each level, which at least to me
seems simpler.  I tried to improve the comments somewhat, too.

You have some long lines that it would be good to break, like this:

 child_extra.targetList = (List *) adjust_appendrel_attrs(root,

(Node *) extra->targetList,
  nappinfos,
  appinfos);

If you put a newline after (List *), the formatting will come out
nicer -- it will fit within 80 columns.  Please go through the patches
and make these kinds of changes for lines over 80 columns where
possible.

I guess we'd better move the GROUPING_CAN_* constants to a header
file, if they're going to be exposed through GroupPathExtraData.  That
can go in some refactoring patch.

Is there a good reason not to use input_rel->relids as the input to
fetch_upper_rel() in all cases, rather than just at subordinate
levels?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0001-Refactor-partitonwise-aggregate-signalling.patch
Description: Binary data

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-21 Thread Jeevan Chalke

On Wed, Mar 21, 2018 at 7:46 PM, Robert Haas  wrote:

> On Wed, Mar 21, 2018 at 8:01 AM, Jeevan Chalke
>  wrote:
> >> In the patch as proposed, create_partial_grouping_paths() can get
> >> called even if GROUPING_CAN_PARTIAL_AGG is not set.  I think that's
> >> wrong.
> >
> > I don't think so. For parallel case, we do check that. And for
> partitionwise
> > aggregation check, it was checked inside can_partitionwise_grouping()
> > function and flags were set accordingly. Am I missing something?
>
> Well, one of us is missing something somewhere.  If
> GROUPING_CAN_PARTIAL_AGG means that we're allowed to do partial
> grouping, and if create_partial_grouping_paths() is where partial
> grouping happens, then we should only be calling the latter if the
> former is set.  I mean, how can it make sense to create
> partially-grouped paths if we're not allowed to do partial grouping?
>

Yes, that's true. If we are not allowed to do partial grouping,
partially_grouped paths should not be created.

However, what I mean is that the partitionwise related checks added, if
evaluates to true it implies that GROUPING_CAN_PARTIAL_AGG is also set as
it was checked earlier. And thus does not need explicit check again.

Anyway, after your refactoring, it becomes more readable now.

>
> > I have tweaked these conditions and posted in a separate patch (0006).
> > However, I have merged all your three patches in one (0005).
>
> OK, thanks.  I wasn't sure I had understood what was going on, so
> thanks for checking it.
>
> Thanks also for keeping 0004-0006 separate here, but I think you can
> flatten them into one patch in the next version.
>

OK. Sure.

>
> >> I think that if the last test in can_partitionwise_grouping were moved
> >> before the previous test, it could be simplified to test only
> >> (extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
> >> *perform_partial_partitionwise_aggregation.
> >
> > I think we can't do this way. If *perform_partial_
> partitionwise_aggregation
> > found to be true then only we need to check whether partial aggregation
> > itself is possible or not. If we are going to perform a full
> partitionwise
> > aggregation then test for can_partial_agg is not needed. Have I misread
> your
> > comments?
>
> It seems you're correct, because when I change it the tests fail.  I
> don't yet understand why.
>
> Basically, the main patch seems to use three Boolean signaling mechanisms:
>
> 1. GROUPING_CAN_PARTITIONWISE_AGG
> 2. is_partial_aggregation
> 3. perform_partial_partitionwise_aggregation
>
> Stuff I don't understand:
>
> - Why is one of them a Boolean shoved into "flags", even though it's
> not static across the whole hierarchy like the other flags, and the
> other two are separate Booleans?
> - What do they all do, anyway?
>

Let me try to explain this:

1. GROUPING_CAN_PARTITIONWISE_AGG
Tells us whether or not partitionwise grouping and/or aggregation is ever
possible. If it is FALSE, other two have no meaning and they will be
useless. However, if it is TRUE, then only we attempt to create paths
partitionwise.
I have kept it in "flags" as it looks similar in behavior with other flag
members like can_sort, can_hash etc. And, for given grouped relation
whether parent or child, they all work similarly. But yes, for child
relation, we inherit can_sort/can_hash from the parent as they won't
change. But need to evaluate this for every child.
If required, I can move that to a GroupPathExtraData struct.

2. extra->is_partial_aggregation
This boolean var is used to identify at any given time whether we are
computing a full aggregation or a partial aggregation. This boolean is
necessary when doing partial aggregation to skip finalization. And also
tells us to use partially_grouped_rel when true.

3. extra->perform_partial_partitionwise_aggregation
This boolean var is used to instruct child that it has to create a
partially aggregated paths when TRUE. And then it transferred to
child_extra->is_partial_aggregation in
create_partitionwise_grouping_paths().

Basically (3) is required as we wanted to create a partially_grouped_rel
upfront. So that if the child is going to create a partially aggregated
paths, they can append those into the parent's partially grouped rel and
thus we need to create that before even we enter into the child paths
creation.
Since (3) is only valid if (1) is true, we need to compute (1) upfront too.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-21 Thread Robert Haas

On Wed, Mar 21, 2018 at 8:01 AM, Jeevan Chalke
 wrote:
>> In the patch as proposed, create_partial_grouping_paths() can get
>> called even if GROUPING_CAN_PARTIAL_AGG is not set.  I think that's
>> wrong.
>
> I don't think so. For parallel case, we do check that. And for partitionwise
> aggregation check, it was checked inside can_partitionwise_grouping()
> function and flags were set accordingly. Am I missing something?

Well, one of us is missing something somewhere.  If
GROUPING_CAN_PARTIAL_AGG means that we're allowed to do partial
grouping, and if create_partial_grouping_paths() is where partial
grouping happens, then we should only be calling the latter if the
former is set.  I mean, how can it make sense to create
partially-grouped paths if we're not allowed to do partial grouping?

> I have tweaked these conditions and posted in a separate patch (0006).
> However, I have merged all your three patches in one (0005).

OK, thanks.  I wasn't sure I had understood what was going on, so
thanks for checking it.

Thanks also for keeping 0004-0006 separate here, but I think you can
flatten them into one patch in the next version.

>> I think that if the last test in can_partitionwise_grouping were moved
>> before the previous test, it could be simplified to test only
>> (extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
>> *perform_partial_partitionwise_aggregation.
>
> I think we can't do this way. If *perform_partial_partitionwise_aggregation
> found to be true then only we need to check whether partial aggregation
> itself is possible or not. If we are going to perform a full partitionwise
> aggregation then test for can_partial_agg is not needed. Have I misread your
> comments?

It seems you're correct, because when I change it the tests fail.  I
don't yet understand why.

Basically, the main patch seems to use three Boolean signaling mechanisms:

1. GROUPING_CAN_PARTITIONWISE_AGG
2. is_partial_aggregation
3. perform_partial_partitionwise_aggregation

Stuff I don't understand:

- Why is one of them a Boolean shoved into "flags", even though it's
not static across the whole hierarchy like the other flags, and the
other two are separate Booleans?
- What do they all do, anyway?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-21 Thread Jeevan Chalke

On Wed, Mar 21, 2018 at 2:04 AM, Robert Haas  wrote:

> On Tue, Mar 20, 2018 at 10:46 AM, Jeevan Chalke
>  wrote:
> > I have added all these three patches in the attached patch-set and
> rebased
> > my changes over it.
> >
> > However, I have not yet made this patch-set dependednt on UPPERREL_TLIST
> > changes you have proposed on another mail-thread and thus it has 0001
> patch
> > refactoring the scanjoin issue.
> > 0002, 0003 and 0004 are your patches added in this patchset.
> > 0005 and 0006 are further refactoring patches. 0006 adds a
> > GroupPathExtraData which stores mostly child variant data and costs.
> > 0007 is main partitionwise aggregation patch which is then rebased
> > accordingly.
> > 0008 contains testcase and 0009 contains FDW changes.
>
> Committed my refactoring patches (your 0002-0004).
>

Thanks Robert.

>
> Regarding apply_scanjoin_target_to_paths in 0001 and 0007, it seems
> like what happens is: we first build an Append path for the topmost
> scan/join rel.  That uses paths from the individual relations that
> don't necessarily produce the final scan/join target.  Then we mutate
> those relations in place during partition-wise aggregate so that they
> now do produce the final scan/join target and generate some more paths
> using the results.  So there's an ordering dependency, and the same
> pathlist represents different things at different times.  That is, I
> suppose, not technically any worse than what we're doing for the
> scan/join rel's pathlist in general, but here there's the additional
> complexity that the paths get used both before and after being
> mutated.  The UPPERREL_TLIST proposal would clean this up, although I
> realize that has unresolved issues.
>
> In create_partial_grouping_paths, the loop that does "for (i = 0; i <
> 2; i++)" is not exactly what I had in mind when I said that we should
> use two loops.  I did not mean a loop with two iterations.  I meant
> adding a loop like foreach(lc, input_rel->pathlist) in each place
> where we currently have a loop like
> foreach(input_rel->partial_pathlist).

The path creation logic for partial_pathlist and pathlist was identical and
thus I thought of just loop over it twice switching the pathlist, so that
we have minimal code to maintain. But yes I agree that it adds additional
complexity.

  See 0001, attached.
>

Looks great. Thanks.

>
> Don't write if (a) Assert(b) but rather Assert(!a || b).  See 0002,
> attached.
>

OK. Noted.

> In the patch as proposed, create_partial_grouping_paths() can get
> called even if GROUPING_CAN_PARTIAL_AGG is not set.  I think that's
> wrong.

I don't think so. For parallel case, we do check that. And for
partitionwise aggregation check, it was checked inside
can_partitionwise_grouping() function and flags were set accordingly. Am I
missing something?

>   If can_partial_agg() isn't accurately determining whether
> partial aggregation is possible,

I think it does accurately determine.
if (!parse->hasAggs && parse->groupClause == NIL)
is only valid for DISTINCT queries which we are anyway not handling here
and for partitionwise aggregate it won't be true otherwise it will be a
degenerate grouping case.

> and as Ashutosh and I have been
> discussing, there's room for improvement in that area, then that's a
> topic for some other set of patches.  Also, the test in
> create_ordinary_grouping_paths for whether or not to call
> create_partial_grouping_paths() is super-complicated and uncommented.
> I think a simpler approach is to allow create_partial_grouping_paths()
> the option of returning NULL.  See 0003, attached.
>

Thanks for simplifying it.

However, after this simplification, we were unnecessary creating
non-parallel partial aggregation paths for the root input rel when not
needed.
Consider a case where we need a partial aggregation from a child, in this
case, extra->is_partial_aggregation = 0 at root level entry as the parent
is still doing full aggregation but
perform_partial_partitionwise_aggregation is true, which tells a child to
perform partial partitionwise aggregation. In this case,
cheapest_total_path will be set and thus we will go ahead and create
partial aggregate paths for the parent rel, which is not needed.

I have tweaked these conditions and posted in a separate patch (0006).
However, I have merged all your three patches in one (0005).

>
> make_grouping_rel() claims that "For now, all aggregated paths are
> added to the (GROUP_AGG, NULL) upperrel", but this is false: we no
> longer have only one grouped upper rel.
>

Done.

>
> I'm having a heck of a time understanding what is_partial_aggregation
> and perform_partial_partitionwise_aggregation are supposed to be
> doing.

As you said correctly, is_partial_aggregation denotes that we are doing
ONLY a partial aggregation at this level of partitioning hierarchy whereas
perform_partial_partitionwise_aggregation is used to instruct the child

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-20 Thread Robert Haas

On Tue, Mar 20, 2018 at 10:46 AM, Jeevan Chalke
 wrote:
> I have added all these three patches in the attached patch-set and rebased
> my changes over it.
>
> However, I have not yet made this patch-set dependednt on UPPERREL_TLIST
> changes you have proposed on another mail-thread and thus it has 0001 patch
> refactoring the scanjoin issue.
> 0002, 0003 and 0004 are your patches added in this patchset.
> 0005 and 0006 are further refactoring patches. 0006 adds a
> GroupPathExtraData which stores mostly child variant data and costs.
> 0007 is main partitionwise aggregation patch which is then rebased
> accordingly.
> 0008 contains testcase and 0009 contains FDW changes.

Committed my refactoring patches (your 0002-0004).

Regarding apply_scanjoin_target_to_paths in 0001 and 0007, it seems
like what happens is: we first build an Append path for the topmost
scan/join rel.  That uses paths from the individual relations that
don't necessarily produce the final scan/join target.  Then we mutate
those relations in place during partition-wise aggregate so that they
now do produce the final scan/join target and generate some more paths
using the results.  So there's an ordering dependency, and the same
pathlist represents different things at different times.  That is, I
suppose, not technically any worse than what we're doing for the
scan/join rel's pathlist in general, but here there's the additional
complexity that the paths get used both before and after being
mutated.  The UPPERREL_TLIST proposal would clean this up, although I
realize that has unresolved issues.

In create_partial_grouping_paths, the loop that does "for (i = 0; i <
2; i++)" is not exactly what I had in mind when I said that we should
use two loops.  I did not mean a loop with two iterations.  I meant
adding a loop like foreach(lc, input_rel->pathlist) in each place
where we currently have a loop like
foreach(input_rel->partial_pathlist).  See 0001, attached.

Don't write if (a) Assert(b) but rather Assert(!a || b).  See 0002, attached.

In the patch as proposed, create_partial_grouping_paths() can get
called even if GROUPING_CAN_PARTIAL_AGG is not set.  I think that's
wrong.  If can_partial_agg() isn't accurately determining whether
partial aggregation is possible, and as Ashutosh and I have been
discussing, there's room for improvement in that area, then that's a
topic for some other set of patches.  Also, the test in
create_ordinary_grouping_paths for whether or not to call
create_partial_grouping_paths() is super-complicated and uncommented.
I think a simpler approach is to allow create_partial_grouping_paths()
the option of returning NULL.  See 0003, attached.

make_grouping_rel() claims that "For now, all aggregated paths are
added to the (GROUP_AGG, NULL) upperrel", but this is false: we no
longer have only one grouped upper rel.

I'm having a heck of a time understanding what is_partial_aggregation
and perform_partial_partitionwise_aggregation are supposed to be
doing.  It seems like is_partial_aggregation means that we should ONLY
do partial aggregation, which is not exactly what the name implies.
It also seems like perform_partial_partitionwise_aggregation and
is_partial_aggregation differ only in that they apply to the current
level and to the child level respectively; can't we merge these
somehow so that we don't need both of them?

I think that if the last test in can_partitionwise_grouping were moved
before the previous test, it could be simplified to test only
(extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
*perform_partial_partitionwise_aggregation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0003-Allow-create_partial_grouping_paths-to-return-NULL.patch
Description: Binary data

0002-Fix-Assert-style.patch
Description: Binary data

0001-Remove-loop-from-0-to-1.patch
Description: Binary data

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-20 Thread Jeevan Chalke

Hi,

On Mon, Mar 19, 2018 at 10:56 PM, Robert Haas  wrote:

> On Fri, Mar 16, 2018 at 1:50 PM, Ashutosh Bapat
>  wrote:
> > Ok. That looks good.
>
> Here's an updated version.  In this version, based on a voice
> discussion with Ashutosh and Jeevan, I adjusted 0001 to combine it
> with an earlier idea of splitting Gather/Gather Merge path generation
> out of the function that creates partially aggregated paths.  The idea
> here is that create_ordinary_gather_paths() could first call
> create_partial_grouping_paths(), then add additional paths which might
> be partial or non-partial by invoking the partition-wise aggregate
> logic, then call gather_grouping_paths() and set_cheapest() to
> finalize the partially grouped rel.  Also, I added draft commit
> messages.
>

I have added all these three patches in the attached patch-set and rebased
my changes over it.

However, I have not yet made this patch-set dependednt on UPPERREL_TLIST
changes you have proposed on another mail-thread and thus it has 0001 patch
refactoring the scanjoin issue.
0002, 0003 and 0004 are your patches added in this patchset.
0005 and 0006 are further refactoring patches. 0006 adds a
GroupPathExtraData which stores mostly child variant data and costs.
0007 is main partitionwise aggregation patch which is then rebased
accordingly.
0008 contains testcase and 0009 contains FDW changes.

Let me know if I missed any point to consider while rebasing.

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

partition-wise-agg-v21.tar.gz
Description: application/gzip

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-19 Thread Ashutosh Bapat

On Mon, Mar 19, 2018 at 11:15 PM, Robert Haas  wrote:
> On Fri, Mar 16, 2018 at 1:50 PM, Ashutosh Bapat
>  wrote:
>>> This patch also renames can_parallel_agg to
>>> can_partial_agg and removes the parallelism-specific bits from it.
>>
>> I think we need to update the comments in this function to use phrase
>> "partial aggregation" instead of "parallel aggregation". And I think
>> we need to change the conditions as well. For example if
>> parse->groupClause == NIL, why can't we do partial aggregation? This
>> is the classical case when we will need patial aggregation. Probably
>> we should test this with Jeevan's patches for partition-wise aggregate
>> to see if it considers partition-wise aggregate or not.
>
> I think the case where we have neither any aggregates nor a grouping
> clause is where we are doing SELECT DISTINCT.

That's a case which will also benefit from partial partition-wise
grouping where each partition produces its own distinct values, thus
reducing the number of rows that flow through an Append node or better
over network, and finalization step removes duplicates. In fact, what
we are attempting here is partition-wise grouping and partial
aggregation happens to be a requirement for doing that if there are
aggregates. If there are no aggregates, we still benefit from
partition-wise grouping. So, can_partial_agg should only tell whether
we can calculate partial aggregates. If there are aggregates present,
and can_partial_agg is false, we can not attempt partition-wise
grouping. But if there are no aggregates and can_partial_agg is false,
it shouldn't prohibit us from using partition-wise aggregates.

> Something like SELECT
> COUNT(*) FROM ... is not this case because that has an aggregate.
>
> I'm sort of on the fence as to whether and how to update the comments.
> I agree that it's a little strange to leave the comments here
> referencing parallel aggregation when the function has been renamed to
> is_partial_agg(), but a simple search-and-replace doesn't necessarily
> improve the situation very much.

Hmm, agree. And as you have mentioned downthread, it's not part of the
this patchset to attempt to do that. May be I will try providing a
patch to update comments once we have committed PWA.

> Most notably, hasNonSerial is
> irrelevant for partial but non-parallel aggregation, but we still have
> the test because we haven't done the work to really do the right thing
> here, which is to separately track whether we can do parallel partial
> aggregation (either hasNonPartial or hasNonSerial is a problem) and
> non-parallel partial aggregation (only hasNonPartial is a problem).
> This needs a deeper reassessment, but I don't think that can or should
> be something we try to do right now.

I think this bit is easy to mention in the comments. We could always
say that since partial aggregation is using serialization and
deserialization while calculating partial aggregates, even though the
step is not needed, we need hasNonSerial to be true for partial
aggregation to work. A future improvement to avoid serialization and
deserialization in partial aggregation when no parallel query is
involved should remove this condition from here and treat it as a
requirement for parallel aggregation.

>
>> OR When parse->groupingSets is true, I can see why we can't use
>> parallel query, but we can still compute partial aggregates. This
>> condition doesn't hurt since partition-wise aggregation bails out when
>> there are grouping sets, so it's not that harmful here.
>
> I haven't thought deeply about what will break when GROUPING SETS are
> in use, but it's not the purpose of this patch set to make them work
> where they didn't before.  The point of hoisting the first two tests
> out of this function was just to avoid doing repeated work when
> partition-wise aggregate is in use.  Those two tests could conceivably
> produce different results for different children, whereas the
> remaining tests won't give different answers.  Let's not get
> distracted by the prospect of improving the tests.  I suspect that's
> not anyway so simple to achieve as what you seem to be speculating
> here...

+1.

>
> I'm fine with using a structure to bundle details that need to be sent
> to the FDW, but why should the FDW need to know about
> can_sort/can_hash?  I suppose if it needs to know about
> can_partial_agg then it doesn't really cost anything to pass through
> all the flags, but I doubt whether the FDW has any use for the others.
> Anyway, if that's the goal, let's just make sure that the set of
> things we're passing that way are exactly the set of things that we
> think the FDW might need.

I am speculating that an FDW or custom planner hook, which does some
local processing or hints something to the foreign server can use
those flags. But I agree, that unless we see such a requirement we
shouldn't expose those in the structure. It will get difficult to
remove

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-19 Thread Robert Haas

On Fri, Mar 16, 2018 at 1:50 PM, Ashutosh Bapat
 wrote:
>> This patch also renames can_parallel_agg to
>> can_partial_agg and removes the parallelism-specific bits from it.
>
> I think we need to update the comments in this function to use phrase
> "partial aggregation" instead of "parallel aggregation". And I think
> we need to change the conditions as well. For example if
> parse->groupClause == NIL, why can't we do partial aggregation? This
> is the classical case when we will need patial aggregation. Probably
> we should test this with Jeevan's patches for partition-wise aggregate
> to see if it considers partition-wise aggregate or not.

I think the case where we have neither any aggregates nor a grouping
clause is where we are doing SELECT DISTINCT.  Something like SELECT
COUNT(*) FROM ... is not this case because that has an aggregate.

I'm sort of on the fence as to whether and how to update the comments.
I agree that it's a little strange to leave the comments here
referencing parallel aggregation when the function has been renamed to
is_partial_agg(), but a simple search-and-replace doesn't necessarily
improve the situation very much.  Most notably, hasNonSerial is
irrelevant for partial but non-parallel aggregation, but we still have
the test because we haven't done the work to really do the right thing
here, which is to separately track whether we can do parallel partial
aggregation (either hasNonPartial or hasNonSerial is a problem) and
non-parallel partial aggregation (only hasNonPartial is a problem).
This needs a deeper reassessment, but I don't think that can or should
be something we try to do right now.

> OR When parse->groupingSets is true, I can see why we can't use
> parallel query, but we can still compute partial aggregates. This
> condition doesn't hurt since partition-wise aggregation bails out when
> there are grouping sets, so it's not that harmful here.

I haven't thought deeply about what will break when GROUPING SETS are
in use, but it's not the purpose of this patch set to make them work
where they didn't before.  The point of hoisting the first two tests
out of this function was just to avoid doing repeated work when
partition-wise aggregate is in use.  Those two tests could conceivably
produce different results for different children, whereas the
remaining tests won't give different answers.  Let's not get
distracted by the prospect of improving the tests.  I suspect that's
not anyway so simple to achieve as what you seem to be speculating
here...

>> I am sort of unclear whether we need/want GroupPathExtraData at all.
>> What determines whether something gets passed via GroupPathExtraData
>> or just as a separate argument?  If we have a rule that stuff that is
>> common to all child grouped rels goes in there and other stuff
>> doesn't, or stuff postgres_fdw needs goes in there and other stuff
>> doesn't, then that might be OK.  But I'm not sure that there is such a
>> rule in the v20 patches.
>
> We have a single FDW hook for all the upper relations and that hook
> can not accept grouping specific arguments. Either we need a separate
> FDW hook for grouping OR we need some way of passing upper relation
> specific information down to an FDW. I think some FDWs and extensions
> will be happy if we provide them readymade decisions for can_sort,
> can_hash, can_partial_agg etc. It will be good if they don't have to
> translate the grouping target and havingQual for every child twice,
> once for core and second time in the FDW. In all it looks like we need
> some structure to hold that information so that we can pass it down
> the hook. I am fine with two structures one variable and other
> invariable. An upper operation can have one of them or both.

I'm fine with using a structure to bundle details that need to be sent
to the FDW, but why should the FDW need to know about
can_sort/can_hash?  I suppose if it needs to know about
can_partial_agg then it doesn't really cost anything to pass through
all the flags, but I doubt whether the FDW has any use for the others.
Anyway, if that's the goal, let's just make sure that the set of
things we're passing that way are exactly the set of things that we
think the FDW might need.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-19 Thread Robert Haas

On Fri, Mar 16, 2018 at 1:50 PM, Ashutosh Bapat
 wrote:
> Ok. That looks good.

Here's an updated version.  In this version, based on a voice
discussion with Ashutosh and Jeevan, I adjusted 0001 to combine it
with an earlier idea of splitting Gather/Gather Merge path generation
out of the function that creates partially aggregated paths.  The idea
here is that create_ordinary_gather_paths() could first call
create_partial_grouping_paths(), then add additional paths which might
be partial or non-partial by invoking the partition-wise aggregate
logic, then call gather_grouping_paths() and set_cheapest() to
finalize the partially grouped rel.  Also, I added draft commit
messages.

With this patch set applied, the key bit of logic in
create_ordinary_grouping_paths() ends up looking like this:

if (grouped_rel->consider_parallel && input_rel->partial_pathlist != NIL
&& (flags & GROUPING_CAN_PARTIAL_AGG) != 0)
{
partially_grouped_rel =
create_partial_grouping_paths(root,
  grouped_rel,
  input_rel,
  gd,
  can_sort,
  can_hash,
  _final_costs);
gather_grouping_paths(root, partially_grouped_rel);
set_cheapest(partially_grouped_rel);
}

I imagine that what the main partition-wise aggregate patch would do
is (1) change the conditions under which
create_partial_grouping_paths() gets called, (2) postpone
gather_grouping_paths() and set_cheapest() until after partition-wise
aggregate had been done, doing them only if partially_grouped_rel !=
NULL.  Partition-wise aggregate will need to happen before
add_paths_to_grouping_rel(), though, so that the latter function can
try a FinalizeAggregate node on top of an Append added by
partition-wise aggregate.

This is a bit strange, because it will mean that partition-wise
aggregate will be attempted BEFORE adding ordinary aggregate paths to
grouped_rel but AFTER adding them to partially_grouped_rel.  We could
fix that by splitting add_paths_to_grouping_rel() into two functions,
one of which performs full aggregation directly and the other of which
tries finishing partial aggregation.  I'm unsure that's a good idea
though: it would mean that we have very similar logic in two different
functions that could get out of sync as a result of future code
changes, and it's not really fixing any problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0003-Don-t-pass-the-grouping-target-around-unnecessarily.patch
Description: Binary data


0002-Determine-grouping-strategies-in-create_grouping_pat.patch
Description: Binary data


0001-Defer-creation-of-partially-grouped-relation-until-i.patch
Description: Binary data

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-16 Thread Ashutosh Bapat

On Fri, Mar 16, 2018 at 3:19 AM, Robert Haas  wrote:
> On Thu, Mar 15, 2018 at 2:46 PM, Robert Haas  wrote:
>> I wonder if we could simplify things by copying more information from
>> the parent grouping rel to the child grouping rels.
>
> On further review, it seems like a better idea is to generate the
> partial grouping relations from the grouping relations to which they
> correspond.  Attached is a series of proposed further refactoring
> patches.

Ok. That looks good.

>
> 0001 moves the creation of partially_grouped_rel downwards.  Instead
> of happening in create_grouping_paths(), it gets moved downward to
> add_paths_to_partial_grouping_rel(), which is renamed
> create_partial_grouping_paths() and now returns a pointer to new
> RelOptInfo.  This seems like a better design than what we've got now:
> it avoids creating the partially grouped relation if we don't need it,
> and it looks more like the other upper planner functions
> (create_grouping_paths, create_ordered_paths, etc.) which all create
> and return a new relation.

I liked that.

>
> 0002 moves the determination of which grouping strategies are possible
> upwards.  It represents them as a 'flags' variable with bits for
> GROUPING_CAN_USE_SORT, GROUPING_CAN_USE_HASH, and
> GROUPING_CAN_PARTIAL_AGG.  These are set in create_grouping_paths()
> and passed down to create_ordinary_grouping_paths().  The idea is that
> the flags value would be passed down to the partition-wise aggregate
> code which in turn would call create_ordinary_grouping_paths() for the
> child grouping relations, so that the relevant determinations are made
> only at the top level.

+1.

> This patch also renames can_parallel_agg to
> can_partial_agg and removes the parallelism-specific bits from it.

I think we need to update the comments in this function to use phrase
"partial aggregation" instead of "parallel aggregation". And I think
we need to change the conditions as well. For example if
parse->groupClause == NIL, why can't we do partial aggregation? This
is the classical case when we will need patial aggregation. Probably
we should test this with Jeevan's patches for partition-wise aggregate
to see if it considers partition-wise aggregate or not.

OR When parse->groupingSets is true, I can see why we can't use
parallel query, but we can still compute partial aggregates. This
condition doesn't hurt since partition-wise aggregation bails out when
there are grouping sets, so it's not that harmful here.

> To
> compensate for this, create_ordinary_grouping_paths() now tests the
> removed conditions instead.  This is all good stuff for partition-wise
> aggregate, since the grouped_rel->consider_parallel &&
> input_rel->partial_pathlist != NIL conditions can vary on a per-child
> basis but the rest of the stuff can't.  In some subsequent patch, the
> test should be pushed down inside create_partial_grouping_paths()
> itself, so that this function can handle both partial and non-partial
> paths as mentioned in the preceding paragraph.

I think can_parallel_agg() combines two conditions, whether partial
aggregation is possible and whether parallel aggregation is possible.
can_partial_agg() should have the first set and we should retain
can_parallel_agg() for the second set. We may then split
can_parallel_agg() into variant and invariant conditions i.e. the
conditions which change with input_rel and grouped_rel and those
don't.



>
> - create_partial_grouping_paths() is still doing
> get_agg_clause_costs() for the partial grouping target, which (I
> think) only needs to be done once.  Possibly we could handle that by
> having create_grouping_paths() do that work whenever it sets
> GROUPING_CAN_PARTIAL_AGG and pass the value downward.  You might
> complain that it won't get used unless either there are partial paths
> available for the input rel OR partition-wise aggregate is used --
> there's no point in partially aggregating a non-partial path at the
> top level.  We could just accept that as not a big deal, or maybe we
> can figure out how to make it conditional so that we only do it when
> either the input_rel has a partial path list or we have child rels.
> Or we could do as you did in your patches and save it when we compute
> it first, reusing it on each subsequent call.  Or maybe there's some
> other idea.

I am good with anything as long as we avoid repeated computation.

>
> I am sort of unclear whether we need/want GroupPathExtraData at all.
> What determines whether something gets passed via GroupPathExtraData
> or just as a separate argument?  If we have a rule that stuff that is
> common to all child grouped rels goes in there and other stuff
> doesn't, or stuff postgres_fdw needs goes in there and other stuff
> doesn't, then that might be OK.  But I'm not sure that there is such a
> rule in the v20 patches.

We have a single FDW hook for all the upper relations and that hook
can not accept grouping specific

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-16 Thread Ashutosh Bapat

On Fri, Mar 16, 2018 at 12:16 AM, Robert Haas  wrote:
>
> - partial_costs_set.  The comment in compute_group_path_extra_data
> doesn't look good.  It says "Set partial aggregation costs if we are
> going to calculate partial aggregates in make_grouping_rels()", but
> what it means is something more like "agg_partial_costs and
> agg_final_costs are not valid yet".  I also wonder if there's a way
> that we can figure out in advance whether we're going to need to do
> this and just do it at the appropriate place in the code, as opposed
> to doing it lazily.  Even if there were rare cases where we did it
> unnecessarily I'm not sure that would be any big deal.

I struggled with this one. In case of multi-level partitioning we will
compute fully aggregated results per partition in the levels for which
the partition keys are covered by GROUP BY clause. But beyond those
levels we will compute partial aggregates. When none of the levels can
use parallelism, only those levels which compute the partial
aggregates need these costs to be calculated. We will need to traverse
partitioning hierarchy before even starting aggregation to decide
whether we will need partial aggregation downwards. Instead of adding
that walker, I thought it better to use this flag. But more on this in
reply to your next mail.

>
> I wonder if we could simplify things by copying more information from
> the parent grouping rel to the child grouping rels.  It seems to me
> for example that the way you're computing consider_parallel for the
> child relations is kind of pointless.  The parallel-safety of the
> grouping_target can't vary across children, nor can that of the
> havingQual; the only thing that can change is whether the input rel
> has consider_parallel set.  You could cater to that by having
> GroupPathExtraData do something like extra.grouping_is_parallel_safe =
> target_parallel_safe && is_parallel_safe(root, havingQual) and then
> set each child's consider parallel flag to
> input_rel->consider_parallel && extra.grouping_is_parallel_safe.

I am actually confused by the current code itself. What parallel
workers compute is partial target which is different from the full
target. The partial target would only contain expressions in the
GROUPBY clause and partial aggregate nodes. It will not contain any
expressions comprising of full aggregates. When partial aggregate
nodes are parallel safe but the expressions using the full aggregates
are not parallel safe, the current code will not allow parallel
aggregation to take place whereas it should. That looks like an
optimization we are missing today.

That bug aside, the point is the target of grouped relation and that
of the partially grouped relation are different, giving rise to the
possibility that one is parallel safe and other is not. In fact, for
partially grouped relation, we shouldn't check parallel safety of
havingQual since havingQual is only applicable over the fully
aggregated result. That seems to be another missing optimization. OR I
am missing something here.

In case of partition-wise aggregates some levels may be performing
only partial aggregation and if partial aggregation is parallel safe,
we should allow those levels to run parallel partial aggregation.
Checking parallel safety of only grouped relation doesn't help here.
But since the optimization is already missing right now, I think this
patch shouldn't bother about it.

>
> Similarly, right now the way the patch sets the reltargets for
> grouping rels and partially grouping rels is a bit complex.
> make_grouping_rels() calls make_partial_grouping_target() separately
> for each partial grouping rel, but for non-partial grouping rels it
> gets the translated tlist as an argument.  Could we instead consider
> always building the tlist by translation from the parent, that is, a
> child grouped rel's tlist is the translation of the parent
> grouped_rel's tlist, and the child partially grouped rel's tlist is
> the translation of the parent partially_grouped_rel's tlist?  If you
> could both make that work and find a different place to compute the
> partial agg costs, make_grouping_rels() would get a lot simpler or
> perhaps go away entirely.

Hmm that's a thought. While we are translating, we allocate new nodes,
whereas make_partial_grouping_target() uses same nodes from the full
target. For a partially grouped child relation, this means that we
will allocate nodes to create partial target as well and then setrefs
will spend cycles matching node trees instead of matching pointers.
But I think we can take that hit if it saves us some complexity in the
code.

>
> I don't like this condition which appears in that function:
>
> if (extra->try_parallel_aggregation || force_partial_agg ||
> (extra->partitionwise_grouping &&
>  extra->partial_partitionwise_grouping))
>
> The problem with that is that it's got to exactly match the criteria
> for whether we're going to need the

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-16 Thread Ashutosh Bapat

On Thu, Mar 15, 2018 at 9:19 PM, Robert Haas  wrote:
> On Thu, Mar 15, 2018 at 9:46 AM, Jeevan Chalke
>  wrote:
>> Hmm.. you are right. Done.
>
> I don't see a reason to hold off on committing 0002 and 0003, so I've
> done that now; since they are closely related changes, I pushed them
> as a single commit.  It probably could've just been included in the
> main patch, but it's fine.

Thanks.

>
> I don't much like the code that 0001 refactors and am not keen to
> propagate it into more places.  I've separately proposed patches to
> restructure that code in
> http://postgr.es/m/ca+tgmoakt5gmahbpwgqrr2nadfomaonoxyowhrdvfgws34t...@mail.gmail.com
> and if we end up deciding to adopt that approach then I think this
> patch will also need to create rels for UPPERREL_TLIST.  I suspect
> that approach would also remove the need for 0004, as that case would
> also end up being handled in a different way.  However, the jury is
> still out on whether or not the approach I've proposed there is any
> good.  Feel free to opine over on that thread.

Will take a look at that patch next week.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-16 Thread Ashutosh Bapat

On Thu, Mar 15, 2018 at 7:46 PM, Robert Haas  wrote:
> On Thu, Mar 15, 2018 at 6:08 AM, Ashutosh Bapat
>  wrote:
>> In current create_grouping_paths() (without any of your patches
>> applied) we first create partial paths in partially grouped rel and
>> then add parallel path to grouped rel using those partial paths. Then
>> we hand over this to FDW and extension hooks, which may add partial
>> paths, which might throw away a partial path used to create a parallel
>> path in grouped rel causing a segfault. I think this bug exists since
>> we introduced parallel aggregation or upper relation refactoring
>> whichever happened later. Introduction of partially grouped rel has
>> just made it visible.
>
> I don't think there's really a problem here; or at least if there is,
> I'm not seeing it.  If an FDW wants to introduce partially grouped
> paths, it should do so when it is called for
> UPPERREL_PARTIAL_GROUP_AGG from within
> add_paths_to_partial_grouping_rel.  If it wants to introduce fully
> grouped paths, it should do so when it is called for
> UPPERREL_GROUP_AGG from within create_grouping_paths.  By the time the
> latter call is made, it's too late to add partially grouped paths; if
> the FDW does, that's a bug in the FDW.

Right.

>
> Admittedly, this means that commit
> 3bf05e096b9f8375e640c5d7996aa57efd7f240c subtly changed the API
> contract for FDWs.  Before that, an FDW that wanted to support partial
> aggregation would have needed to add partially grouped paths to
> UPPERREL_GROUP_AGG when called for that relation; whereas now it would
> need to add them to UPPERREL_PARTIAL_GROUP_AGG when called for that
> relation.

And when an FDW added partial paths to the grouped rel
(UPPERREL_GROUP_AGG) it might throw away the partial paths gathered by
the core code. 3bf05e096b9f8375e640c5d7996aa57efd7f240c has fixed that
bug by allowing FDW to add partial paths to UPPERREL_PARTIAL_GROUP_AGG
before adding gather paths to the UPPERREL_GROUP_AGG. But I agree that
it has subtly changed that API and we need to add this to the
documentation (may be we should have added that in the commit which
introduced partially grouped relation).

> This doesn't actually falsify any documentation, though,
> because this oddity wasn't documented before.  Possibly more
> documentation could stand to be written in this area, but that's not
> the topic of this thread.

+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-15 Thread Robert Haas

On Thu, Mar 15, 2018 at 2:46 PM, Robert Haas  wrote:
> I wonder if we could simplify things by copying more information from
> the parent grouping rel to the child grouping rels.

On further review, it seems like a better idea is to generate the
partial grouping relations from the grouping relations to which they
correspond.  Attached is a series of proposed further refactoring
patches.

0001 moves the creation of partially_grouped_rel downwards.  Instead
of happening in create_grouping_paths(), it gets moved downward to
add_paths_to_partial_grouping_rel(), which is renamed
create_partial_grouping_paths() and now returns a pointer to new
RelOptInfo.  This seems like a better design than what we've got now:
it avoids creating the partially grouped relation if we don't need it,
and it looks more like the other upper planner functions
(create_grouping_paths, create_ordered_paths, etc.) which all create
and return a new relation.  One possible objection to this line of
attack is that Jeevan's
0006-Implement-partitionwise-aggregation-paths-for-partit.patch patch
adds an additional Boolean argument to that function so that it can be
called twice, once for partial paths and a second time for non-partial
paths.  But it looks to me like we should instead just add separate
handling to this function for the pathlist in each place where we're
currently handling the partial_pathlist.  That's more like what we do
elsewhere and avoids complicating the code with a bunch of
conditionals.  To make the generation of partially_grouped_rel from
grouped_rel work cleanly, this also sets grouped_rel's reltarget.

0002 moves the determination of which grouping strategies are possible
upwards.  It represents them as a 'flags' variable with bits for
GROUPING_CAN_USE_SORT, GROUPING_CAN_USE_HASH, and
GROUPING_CAN_PARTIAL_AGG.  These are set in create_grouping_paths()
and passed down to create_ordinary_grouping_paths().  The idea is that
the flags value would be passed down to the partition-wise aggregate
code which in turn would call create_ordinary_grouping_paths() for the
child grouping relations, so that the relevant determinations are made
only at the top level.  This patch also renames can_parallel_agg to
can_partial_agg and removes the parallelism-specific bits from it.  To
compensate for this, create_ordinary_grouping_paths() now tests the
removed conditions instead.  This is all good stuff for partition-wise
aggregate, since the grouped_rel->consider_parallel &&
input_rel->partial_pathlist != NIL conditions can vary on a per-child
basis but the rest of the stuff can't.  In some subsequent patch, the
test should be pushed down inside create_partial_grouping_paths()
itself, so that this function can handle both partial and non-partial
paths as mentioned in the preceding paragraph.

0003 is a cleanup patch.  It removes the grouping target as a separate
argument from a bunch of places that no longer need it given that 0001
sets grouped_rel->reltarget.

I think 0001 and 0002 together get us pretty close to having the stuff
that should be done for every child rel
(create_ordinary_grouping_paths) disentangled from the stuff that
should be done only once (create_grouping_paths).  There are a couple
of exceptions:

- create_partial_grouping_paths() is still doing
get_agg_clause_costs() for the partial grouping target, which (I
think) only needs to be done once.  Possibly we could handle that by
having create_grouping_paths() do that work whenever it sets
GROUPING_CAN_PARTIAL_AGG and pass the value downward.  You might
complain that it won't get used unless either there are partial paths
available for the input rel OR partition-wise aggregate is used --
there's no point in partially aggregating a non-partial path at the
top level.  We could just accept that as not a big deal, or maybe we
can figure out how to make it conditional so that we only do it when
either the input_rel has a partial path list or we have child rels.
Or we could do as you did in your patches and save it when we compute
it first, reusing it on each subsequent call.  Or maybe there's some
other idea.

- These patches don't do anything about
create_partial_grouping_paths() and create_ordinary_grouping_paths()
directly referencing parse->targetList and parse->havingQual.  I think
that we could add those as additional arguments to
create_ordinary_grouping_paths().  create_grouping_paths() would pass
the values taken from "parse" and partition-wise join would pass down
translated versions.

I am sort of unclear whether we need/want GroupPathExtraData at all.
What determines whether something gets passed via GroupPathExtraData
or just as a separate argument?  If we have a rule that stuff that is
common to all child grouped rels goes in there and other stuff
doesn't, or stuff postgres_fdw needs goes in there and other stuff
doesn't, then that might be OK.  But I'm not sure that there is such a
rule in the v20 patches.

-- 
Robert Haas
EnterpriseDB:

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-15 Thread Robert Haas

On Thu, Mar 15, 2018 at 11:49 AM, Robert Haas  wrote:
> I'm going to go spend some time looking at 0005 next.  It looks to me
> like it's generally going in a very promising direction, but I need to
> study the details more.

On further study this patch is doing a number of things, some of which
seem like better ideas than others.  Splitting out the degenerate
grouping case looks like a great idea.  I've committed a patch to do
this loosely based on 0005, but I whacked it around quite a bit.  I
think the result is cleaner than what you had.

As far as the stuff in GroupPathExtraData extra is concerned:

- can_hash and can_sort look good; we can precompute them once and
reuse them for every grouping relation.  Cool.

- can_partial_agg looks a bit pointless.  You're not going to save
many CPU cycles by computing a value that is derived from two Booleans
and storing the result in another Boolean variable.

- partial_costs_set.  The comment in compute_group_path_extra_data
doesn't look good.  It says "Set partial aggregation costs if we are
going to calculate partial aggregates in make_grouping_rels()", but
what it means is something more like "agg_partial_costs and
agg_final_costs are not valid yet".  I also wonder if there's a way
that we can figure out in advance whether we're going to need to do
this and just do it at the appropriate place in the code, as opposed
to doing it lazily.  Even if there were rare cases where we did it
unnecessarily I'm not sure that would be any big deal.

- agg_partial_costs and agg_final_costs themselves seem OK.

- consider_parallel is only used in make_grouping_rels and could be
replaced with a local variable.  Its comment in relation.h doesn't
make a lot of sense either, as it is not used to double-check
anything.

- The remaining fields vary across the partitioning hierarchy, and it
seems a little strange to store them in this structure alongside the
pre-computed stuff that doesn't change.  I'm not quite sure what to do
about that; obviously passing around 15-20 arguments to a function
isn't too desirable either.

I wonder if we could simplify things by copying more information from
the parent grouping rel to the child grouping rels.  It seems to me
for example that the way you're computing consider_parallel for the
child relations is kind of pointless.  The parallel-safety of the
grouping_target can't vary across children, nor can that of the
havingQual; the only thing that can change is whether the input rel
has consider_parallel set.  You could cater to that by having
GroupPathExtraData do something like extra.grouping_is_parallel_safe =
target_parallel_safe && is_parallel_safe(root, havingQual) and then
set each child's consider parallel flag to
input_rel->consider_parallel && extra.grouping_is_parallel_safe.

Similarly, right now the way the patch sets the reltargets for
grouping rels and partially grouping rels is a bit complex.
make_grouping_rels() calls make_partial_grouping_target() separately
for each partial grouping rel, but for non-partial grouping rels it
gets the translated tlist as an argument.  Could we instead consider
always building the tlist by translation from the parent, that is, a
child grouped rel's tlist is the translation of the parent
grouped_rel's tlist, and the child partially grouped rel's tlist is
the translation of the parent partially_grouped_rel's tlist?  If you
could both make that work and find a different place to compute the
partial agg costs, make_grouping_rels() would get a lot simpler or
perhaps go away entirely.

I don't like this condition which appears in that function:

if (extra->try_parallel_aggregation || force_partial_agg ||
(extra->partitionwise_grouping &&
 extra->partial_partitionwise_grouping))

The problem with that is that it's got to exactly match the criteria
for whether we're going to need the partial_grouping_rel.  If it's
true when we are not using partial paths, then you've missed an
optimization; in the reverse case, we'll probably crash or fail to
consider paths we should have considered.  It is not entirely
straightforward to verify that this test is correct.
add_paths_to_partial_grouping_rel() gets called if
extra->try_parallel_aggregation is true or if
extra->is_partial_aggregation is true, but the condition doesn't test
extra->is_partial_aggregation at all. The other way that we can end up
using partially_grouped_rel is if create_partitionwise_grouping_paths
is called, but it just silently fails to do anything if we have no
partially_grouped_rel.  Putting all that together, do the conditions
under which a partially_grouped_rel gets created match the conditions
under which we want to have one?  Beats me!  Moreover, even if it's
correct now, I think that the chances that the next person who
modifies this code will manage to keep it correct are not great.  I
think we need to create the partial grouping rel somewhere in the code
that's closer to where it's actually

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-15 Thread Robert Haas

On Thu, Mar 15, 2018 at 9:46 AM, Jeevan Chalke
 wrote:
> Hmm.. you are right. Done.

I don't see a reason to hold off on committing 0002 and 0003, so I've
done that now; since they are closely related changes, I pushed them
as a single commit.  It probably could've just been included in the
main patch, but it's fine.

I don't much like the code that 0001 refactors and am not keen to
propagate it into more places.  I've separately proposed patches to
restructure that code in
http://postgr.es/m/ca+tgmoakt5gmahbpwgqrr2nadfomaonoxyowhrdvfgws34t...@mail.gmail.com
and if we end up deciding to adopt that approach then I think this
patch will also need to create rels for UPPERREL_TLIST.  I suspect
that approach would also remove the need for 0004, as that case would
also end up being handled in a different way.  However, the jury is
still out on whether or not the approach I've proposed there is any
good.  Feel free to opine over on that thread.

I'm going to go spend some time looking at 0005 next.  It looks to me
like it's generally going in a very promising direction, but I need to
study the details more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-15 Thread Robert Haas

On Thu, Mar 15, 2018 at 6:08 AM, Ashutosh Bapat
 wrote:
> In current create_grouping_paths() (without any of your patches
> applied) we first create partial paths in partially grouped rel and
> then add parallel path to grouped rel using those partial paths. Then
> we hand over this to FDW and extension hooks, which may add partial
> paths, which might throw away a partial path used to create a parallel
> path in grouped rel causing a segfault. I think this bug exists since
> we introduced parallel aggregation or upper relation refactoring
> whichever happened later. Introduction of partially grouped rel has
> just made it visible.

I don't think there's really a problem here; or at least if there is,
I'm not seeing it.  If an FDW wants to introduce partially grouped
paths, it should do so when it is called for
UPPERREL_PARTIAL_GROUP_AGG from within
add_paths_to_partial_grouping_rel.  If it wants to introduce fully
grouped paths, it should do so when it is called for
UPPERREL_GROUP_AGG from within create_grouping_paths.  By the time the
latter call is made, it's too late to add partially grouped paths; if
the FDW does, that's a bug in the FDW.

Admittedly, this means that commit
3bf05e096b9f8375e640c5d7996aa57efd7f240c subtly changed the API
contract for FDWs.  Before that, an FDW that wanted to support partial
aggregation would have needed to add partially grouped paths to
UPPERREL_GROUP_AGG when called for that relation; whereas now it would
need to add them to UPPERREL_PARTIAL_GROUP_AGG when called for that
relation. This doesn't actually falsify any documentation, though,
because this oddity wasn't documented before.  Possibly more
documentation could stand to be written in this area, but that's not
the topic of this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-15 Thread Jeevan Chalke

On Thu, Mar 15, 2018 at 3:38 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
> The patchset needs rebase.  I have rebased those on the latest head
> and made following changes.
>
> Argument force_partial_agg is added after output arguments to
> make_grouping_rels(). Moved it before output arguemnts to keep input and
> output
> arguments separate.
>
> Also moved the comment about creating partial aggregate paths before full
> aggregate paths in make_grouping_rels() moved to
> populate_grouping_rels_with_paths().
>
> In create_grouping_paths() moved call to try_degenerate_grouping_paths()
> before
> computing extra grouping information.
>

Thanks for these changes.


>
> Some more comment changes in the attached patch set.
>
> + *
> + * With partitionwise aggregates, we may have childrel's pathlist
> empty
> + * if we are doing partial aggregation. Thus do this only if
> childrel's
> + * pathlist is not NIL.
>   */
> -if (childrel->cheapest_total_path->param_info == NULL)
> +if (childrel->pathlist != NIL &&
> +childrel->cheapest_total_path->param_info == NULL)
>  accumulate_append_subpath(childrel->cheapest_total_path,
>, NULL);
> I thought we got rid of this code. Why has it come back? May be the comment
> should point to a function where this case happen.
>

Oops. My mistake. I have added it back while working on your comments. And
at that time we were creating partially_grouped_rel unconditionally. I was
getting segfault here.

But now, we create partially_grouped_rel only when needed, thanks for your
refactoring work. Thus no need of this guard. Removed in attached patch set.


> In populate_grouping_rels_with_paths(), we need to pass extra to
> extension hook create_upper_paths_hook similar to what
> add_paths_to_joinrel() does.
>

Hmm.. you are right. Done.


>
> Also, we aren't passing is_partial_agg to FDW hook, so it won't know
> whether to compute partial or full aggregates. I think the check
> 5296 /* Partial aggregates are not supported. */
> 5297 if (extra->partial_partitionwise_grouping)
> 5298 return;
> in add_foreign_grouping_paths() is wrong. It's checking whether the
> children of the given relation will produce partial aggregates or not.
> But it is supposed to check whether the given relation should produce
> partial aggregates or not. I think we need to include is_partial_agg
> in GroupPathExtraData so that it gets passed to FDWs. If we do so, we
> need to add a comment in the prologue of GroupPathExtraData to
> disambiguate usage of is_partial_agg and
> partial_partitionwise_grouping.
>

Sorry, I mis-interpreted this.
Reworked and added is_partial_aggregation in GroupPathExtraData.


> In current create_grouping_paths() (without any of your patches
> applied) we first create partial paths in partially grouped rel and
> then add parallel path to grouped rel using those partial paths. Then
> we hand over this to FDW and extension hooks, which may add partial
> paths, which might throw away a partial path used to create a parallel
> path in grouped rel causing a segfault. I think this bug exists since
> we introduced parallel aggregation or upper relation refactoring
> whichever happened later. Introduction of partially grouped rel has
> just made it visible.
>

Yes. That's why I needed to create partitionwise aggregation paths before
we create any normal paths.
Yeah, but if this issue needs to be taken care, it should be a separate
patch.

However, as noticed by Ashutosh Bapat, after refactoring, we no more try to
create partitionwise paths, rather if possible we do create them. So
inlined with create_grouping_paths() I have renamed it to
create_partitionwise_grouping_paths() in attached patch-set.

Thanks


> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>


-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v20.tar.gz
Description: application/gzip

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-15 Thread Ashutosh Bapat

On Wed, Mar 14, 2018 at 7:51 PM, Jeevan Chalke
 wrote:
>
>
> On Tue, Mar 13, 2018 at 7:43 PM, Jeevan Chalke
>  wrote:
>>
>> Hi,
>>
>> The patch-set is complete now. But still, there is a scope of some comment
>> improvements due to all these refactorings. I will work on it. Also, need to
>> update few documentations and indentations etc. Will post those changes in
>> next patch-set. But meanwhile, this patch-set is ready to review.
>
>
> Attached complete patch-set.
>
> Fixed all remaining documentation, indentation and comments.
> Also incorporated few comments improvements provided off-list by Ashutosh
> Bapat.
>

The patchset needs rebase.  I have rebased those on the latest head
and made following changes.

Argument force_partial_agg is added after output arguments to
make_grouping_rels(). Moved it before output arguemnts to keep input and output
arguments separate.

Also moved the comment about creating partial aggregate paths before full
aggregate paths in make_grouping_rels() moved to
populate_grouping_rels_with_paths().

In create_grouping_paths() moved call to try_degenerate_grouping_paths() before
computing extra grouping information.

Some more comment changes in the attached patch set.

+ *
+ * With partitionwise aggregates, we may have childrel's pathlist empty
+ * if we are doing partial aggregation. Thus do this only if childrel's
+ * pathlist is not NIL.
  */
-if (childrel->cheapest_total_path->param_info == NULL)
+if (childrel->pathlist != NIL &&
+childrel->cheapest_total_path->param_info == NULL)
 accumulate_append_subpath(childrel->cheapest_total_path,
   , NULL);
I thought we got rid of this code. Why has it come back? May be the comment
should point to a function where this case happen.

In populate_grouping_rels_with_paths(), we need to pass extra to
extension hook create_upper_paths_hook similar to what
add_paths_to_joinrel() does.

Also, we aren't passing is_partial_agg to FDW hook, so it won't know
whether to compute partial or full aggregates. I think the check
5296 /* Partial aggregates are not supported. */
5297 if (extra->partial_partitionwise_grouping)
5298 return;
in add_foreign_grouping_paths() is wrong. It's checking whether the
children of the given relation will produce partial aggregates or not.
But it is supposed to check whether the given relation should produce
partial aggregates or not. I think we need to include is_partial_agg
in GroupPathExtraData so that it gets passed to FDWs. If we do so, we
need to add a comment in the prologue of GroupPathExtraData to
disambiguate usage of is_partial_agg and
partial_partitionwise_grouping.

In current create_grouping_paths() (without any of your patches
applied) we first create partial paths in partially grouped rel and
then add parallel path to grouped rel using those partial paths. Then
we hand over this to FDW and extension hooks, which may add partial
paths, which might throw away a partial path used to create a parallel
path in grouped rel causing a segfault. I think this bug exists since
we introduced parallel aggregation or upper relation refactoring
whichever happened later. Introduction of partially grouped rel has
just made it visible.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

partition-wise-agg-v19.tar.gz
Description: GNU Zip compressed data

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-14 Thread Jeevan Chalke

On Tue, Mar 13, 2018 at 7:43 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

> Hi,
>
> The patch-set is complete now. But still, there is a scope of some comment
> improvements due to all these refactorings. I will work on it. Also, need
> to update few documentations and indentations etc. Will post those changes
> in next patch-set. But meanwhile, this patch-set is ready to review.
>

Attached complete patch-set.

Fixed all remaining documentation, indentation and comments.
Also incorporated few comments improvements provided off-list by Ashutosh
Bapat.

Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v18.tar.gz
Description: application/gzip

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-13 Thread Jeevan Chalke

Hi,

I have resolved all the comments/issues reported in this new patch-set.

Changes done by Ashutosh Bapat for splitting out create_grouping_paths()
into separate functions so that partitionwise aggregate code will use them
were based on my partitionwise aggregate changes. Those were like
refactoring changes. And thus, I have refactored them separately and before
any partitionwise changes (see
0005-Split-create_grouping_paths-and-Add-GroupPathExtraDa.patch). And then
I have re-based all partitionwise changes over it including all fixes.

The patch-set is complete now. But still, there is a scope of some comment
improvements due to all these refactorings. I will work on it. Also, need
to update few documentations and indentations etc. Will post those changes
in next patch-set. But meanwhile, this patch-set is ready to review.

On Tue, Mar 13, 2018 at 9:12 AM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Mon, Mar 12, 2018 at 7:49 PM, Jeevan Chalke
>  wrote:
> >
> >
> > On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat
> >  wrote:
> >>
> >> On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
> >>
> We don't need UpperRelationKind member in that structure. That will be
> provided by the RelOptInfo passed.
>
> The problem here is the extra information required for grouping is not
> going to be the same for that needed for window aggregate and
> certainly not for ordering. If we try to jam everything in the same
> structure, it will become large with many members useless for a given
> operation. A reader will not have an idea about which of them are
> useful and which of them are not. So, instead we should try some
> polymorphism. I think we can pass a void * to GetForeignUpperPaths()
> and corresponding FDW hook knows what to cast it to based on the
> UpperRelationKind passed.
>

Yep. Done this way.

>
> BTW, the patch has added an argument to GetForeignUpperPaths() but has
> not documented the change in API. If we go the route of polymorphism,
> we will need to document the mapping between UpperRelationKind and the
> type of structure passed in.
>

Oops. Will do that in next patchset.

Thanks for pointing out, I have missed this at first place it self.

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

partition-wise-agg-v17.tar.gz
Description: application/gzip

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-12 Thread Ashutosh Bapat

On Mon, Mar 12, 2018 at 7:49 PM, Jeevan Chalke
 wrote:
>
>
> On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat
>  wrote:
>>
>> On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
>>
>> GroupPathExtraData now completely absorbs members from and replaces
>> OtherUpperPathExtraData. This means that we have to consider a way to
>> pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
>> haven't tried it in this patch.
>
>
> Initially, I was passing OtherUpperPathExtraData to FDW. But now we need to
> pass GroupPathExtraData.
>
> However, since GetForeignUpperPaths() is a generic function for all upper
> relations, we might think of renaming this struct to UpperPathExtraData. Add
> an UpperRelationKind member to it
> Which will be used to distinguish the passed in extra data. But now we only
> have extra data for grouping only, I chose not to do that here. But someone,
> when needed, may choose this approach.

We don't need UpperRelationKind member in that structure. That will be
provided by the RelOptInfo passed.

The problem here is the extra information required for grouping is not
going to be the same for that needed for window aggregate and
certainly not for ordering. If we try to jam everything in the same
structure, it will become large with many members useless for a given
operation. A reader will not have an idea about which of them are
useful and which of them are not. So, instead we should try some
polymorphism. I think we can pass a void * to GetForeignUpperPaths()
and corresponding FDW hook knows what to cast it to based on the
UpperRelationKind passed.

BTW, the patch has added an argument to GetForeignUpperPaths() but has
not documented the change in API. If we go the route of polymorphism,
we will need to document the mapping between UpperRelationKind and the
type of structure passed in.

>
>>
>> With this patch there's a failure in partition_aggregation where the
>> patch is creating paths with MergeAppend with GatherMerge underneath.
>> I think this is related to the call
>> add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
>> true. But I didn't investigate it further.
>
>
> I fixed it. We need to pass  is_partial_agg instead of
> extra->partial_partitionwise_grouping while calling
> add_paths_to_partial_grouping_rel() in case of parallelism.

Thanks for investigation and the fix.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-12 Thread Jeevan Chalke

On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
>
> GroupPathExtraData now completely absorbs members from and replaces
> OtherUpperPathExtraData. This means that we have to consider a way to
> pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
> haven't tried it in this patch.
>

Initially, I was passing OtherUpperPathExtraData to FDW. But now we need to
pass GroupPathExtraData.

However, since GetForeignUpperPaths() is a generic function for all upper
relations, we might think of renaming this struct to UpperPathExtraData.
Add an UpperRelationKind member to it
Which will be used to distinguish the passed in extra data. But now we only
have extra data for grouping only, I chose not to do that here. But
someone, when needed, may choose this approach.


> With this patch there's a failure in partition_aggregation where the
> patch is creating paths with MergeAppend with GatherMerge underneath.
> I think this is related to the call
> add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
> true. But I didn't investigate it further.
>

I fixed it. We need to pass  is_partial_agg instead of
extra->partial_partitionwise_grouping while calling
add_paths_to_partial_grouping_rel() in case of parallelism.


> With those two things remaining I am posting this patch, so that
> Jeevan Chalke can merge this patch into his patches and also merge
> some of his changes related to mine and Robert's changes. Let me know
> if this refactoring looks good.
>

Will rebase my changes tomorrow.


>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 87ea18c..ed16f7d 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -353,7 +353,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 			 UpperRelationKind stage,
 			 RelOptInfo *input_rel,
 			 RelOptInfo *output_rel,
-			 OtherUpperPathExtraData *extra);
+			 GroupPathExtraData *group_extra);
 
 /*
  * Helper functions
@@ -429,7 +429,7 @@ static void add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 static void add_foreign_grouping_paths(PlannerInfo *root,
 		   RelOptInfo *input_rel,
 		   RelOptInfo *grouped_rel,
-		   OtherUpperPathExtraData *extra);
+		   GroupPathExtraData *group_extra);
 static void apply_server_options(PgFdwRelationInfo *fpinfo);
 static void apply_table_options(PgFdwRelationInfo *fpinfo);
 static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
@@ -5235,7 +5235,7 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
 static void
 postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 			 RelOptInfo *input_rel, RelOptInfo *output_rel,
-			 OtherUpperPathExtraData *extra)
+			 GroupPathExtraData *group_extra)
 {
 	PgFdwRelationInfo *fpinfo;
 
@@ -5255,7 +5255,7 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 	fpinfo->pushdown_safe = false;
 	output_rel->fdw_private = fpinfo;
 
-	add_foreign_grouping_paths(root, input_rel, output_rel, extra);
+	add_foreign_grouping_paths(root, input_rel, output_rel, group_extra);
 }
 
 /*
@@ -5268,7 +5268,7 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 static void
 add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 		   RelOptInfo *grouped_rel,
-		   OtherUpperPathExtraData *extra)
+		   GroupPathExtraData *group_extra)
 {
 	Query	   *parse = root->parse;
 	PgFdwRelationInfo *ifpinfo = input_rel->fdw_private;
@@ -5288,16 +5288,16 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	 * Store passed-in target and havingQual in fpinfo. If its a foreign
 	 * partition, then path target and HAVING quals fetched from the root are
 	 * not correct as Vars within it won't match with this child relation.
-	 * However, server passed them through extra and thus fetch from it.
+	 * However, server passed them through group_extra and thus fetch from it.
 	 */
-	if (extra)
+	if (group_extra)
 	{
 		/* Partial aggregates are not supported. */
-		if (extra->isPartialAgg)
+		if (group_extra->partial_partitionwise_grouping)
 			return;
 
-		fpinfo->grouped_target = extra->relTarget;
-		fpinfo->havingQual = extra->havingQual;
+		fpinfo->grouped_target = group_extra->target;
+		fpinfo->havingQual = group_extra->havingQual;
 	}
 	else
 	{
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1611975..9a34bf9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -109,43 +109,6 @@ typedef struct
 	int		   *tleref_to_colnum_map;
 }

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-12 Thread Jeevan Chalke

On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
>  wrote:
> > On Thu, Mar 8, 2018 at 7:31 PM, Robert Haas 
> wrote:
> >>
> >> This kind of goes along with the suggestion I made yesterday to
> >> introduce a new function, which at the time I proposed calling
> >> initialize_grouping_rel(), to set up new grouped or partially grouped
> >> relations.  By doing that it would be easier to ensure the
> >> initialization is always done in a consistent way but only for the
> >> relations we actually need.  But maybe we should call it
> >> fetch_grouping_rel() instead.  The idea would be that instead of
> >> calling fetch_upper_rel() we would call fetch_grouping_rel() when it
> >> is a question of the grouped or partially grouped relation.  It would
> >> either return the existing relation or initialize a new one for us.  I
> >> think that would make it fairly easy to initialize only the ones we're
> >> going to need.
> >
> > Hmm. I am working on refactoring the code to do something like this.
> >
>
> Here's patch doing the same. I have split create_grouping_paths() into
> three functions 1. to handle degenerate grouping paths
> (try_degenerate_grouping_paths()) 2. to create the grouping rels,
> partial grouped rel and grouped rel (make_grouping_rels()), which also
> sets some properties in GroupPathExtraData. 3. populate grouping rels
> with paths (populate_grouping_rels_with_paths()). With those changes,
> I have been able to get rid of partially grouped rels when they are
> not necessary. But I haven't tried to get rid of grouped_rels when
> they are not needed.
>
> GroupPathExtraData now completely absorbs members from and replaces
> OtherUpperPathExtraData. This means that we have to consider a way to
> pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
> haven't tried it in this patch.
>
> With this patch there's a failure in partition_aggregation where the
> patch is creating paths with MergeAppend with GatherMerge underneath.
> I think this is related to the call
> add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
> true. But I didn't investigate it further.
>
> With those two things remaining I am posting this patch, so that
> Jeevan Chalke can merge this patch into his patches and also merge
> some of his changes related to mine and Robert's changes. Let me know
> if this refactoring looks good.
>

Thanks Ashutosh for the refactoring patch.
I will rebase my changes and will also resolve above two issues you have
reported.


>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-12 Thread Ashutosh Bapat

On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
 wrote:
> On Thu, Mar 8, 2018 at 7:31 PM, Robert Haas  wrote:
>>
>> This kind of goes along with the suggestion I made yesterday to
>> introduce a new function, which at the time I proposed calling
>> initialize_grouping_rel(), to set up new grouped or partially grouped
>> relations.  By doing that it would be easier to ensure the
>> initialization is always done in a consistent way but only for the
>> relations we actually need.  But maybe we should call it
>> fetch_grouping_rel() instead.  The idea would be that instead of
>> calling fetch_upper_rel() we would call fetch_grouping_rel() when it
>> is a question of the grouped or partially grouped relation.  It would
>> either return the existing relation or initialize a new one for us.  I
>> think that would make it fairly easy to initialize only the ones we're
>> going to need.
>
> Hmm. I am working on refactoring the code to do something like this.
>

Here's patch doing the same. I have split create_grouping_paths() into
three functions 1. to handle degenerate grouping paths
(try_degenerate_grouping_paths()) 2. to create the grouping rels,
partial grouped rel and grouped rel (make_grouping_rels()), which also
sets some properties in GroupPathExtraData. 3. populate grouping rels
with paths (populate_grouping_rels_with_paths()). With those changes,
I have been able to get rid of partially grouped rels when they are
not necessary. But I haven't tried to get rid of grouped_rels when
they are not needed.

GroupPathExtraData now completely absorbs members from and replaces
OtherUpperPathExtraData. This means that we have to consider a way to
pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
haven't tried it in this patch.

With this patch there's a failure in partition_aggregation where the
patch is creating paths with MergeAppend with GatherMerge underneath.
I think this is related to the call
add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
true. But I didn't investigate it further.

With those two things remaining I am posting this patch, so that
Jeevan Chalke can merge this patch into his patches and also merge
some of his changes related to mine and Robert's changes. Let me know
if this refactoring looks good.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
commit c882a54585b93b8976d2d9a25f55d18ed27d69c1
Author: Ashutosh Bapat 
Date:   Thu Mar 8 15:33:51 2018 +0530

Split create_grouping_paths()

Separate code in create_grouping_paths() into two functions: first to
create the grouping relations, partial grouped rel and grouped rel,
second to populate those with paths. These two functions are then
called from create_grouping_paths() and try_partitionwise_grouping().

As part of this separate degenerate grouping case into a function of
its own (try_degenerate_grouping_paths()) to be called only from
create_grouping_paths().

Ashutosh Bapat.

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a5a049f..1611975 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -123,12 +123,27 @@ typedef struct
  */
 typedef struct
 {
+	/* Data which remains constant once set. */
 	bool		can_hash;
 	bool		can_sort;
 	bool		can_partial_agg;
 	bool		partial_costs_set;
 	AggClauseCosts agg_partial_costs;
 	AggClauseCosts agg_final_costs;
+
+	/*
+	 * Data which depends upon the input and grouping relations and hence may
+	 * change across partitioning hierarchy
+	 */
+	bool		consider_parallel;	/* It probably remains same across
+	 * partitioning hierarchy. But double
+	 * check.
+	 */
+	bool		try_parallel_aggregation;
+	bool		partitionwise_grouping;
+	bool		partial_partitionwise_grouping;
+	Node	   *havingQual;
+	PathTarget *target;
 } GroupPathExtraData;

 /* Local functions */
@@ -161,9 +176,7 @@ static RelOptInfo *create_grouping_paths(PlannerInfo *root,
 	  RelOptInfo *input_rel,
 	  PathTarget *target,
 	  const AggClauseCosts *agg_costs,
-	  grouping_sets_data *gd,
-	  GroupPathExtraData *extra,
-	  OtherUpperPathExtraData *child_data);
+	  grouping_sets_data *gd);
 static void consider_groupingsets_paths(PlannerInfo *root,
 			RelOptInfo *grouped_rel,
 			Path *path,
@@ -240,14 +253,31 @@ static void try_partitionwise_grouping(PlannerInfo *root,
 		   RelOptInfo *input_rel,
 		   RelOptInfo *grouped_rel,
 		   RelOptInfo *partially_grouped_rel,
-		   PathTarget *target,
 		   const AggClauseCosts *agg_costs,
 		   grouping_sets_data *gd,
-		   GroupPathExtraData *extra,
-		   Node *havingQual,
-		   bool forcePartialAgg);
+		   GroupPathExtraData *extra);
 static bool group_by_has_partkey(RelOptInfo *input_rel, PathTarget *target,

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-09 Thread Ashutosh Bapat

On Thu, Mar 8, 2018 at 7:31 PM, Robert Haas  wrote:
>
> This kind of goes along with the suggestion I made yesterday to
> introduce a new function, which at the time I proposed calling
> initialize_grouping_rel(), to set up new grouped or partially grouped
> relations.  By doing that it would be easier to ensure the
> initialization is always done in a consistent way but only for the
> relations we actually need.  But maybe we should call it
> fetch_grouping_rel() instead.  The idea would be that instead of
> calling fetch_upper_rel() we would call fetch_grouping_rel() when it
> is a question of the grouped or partially grouped relation.  It would
> either return the existing relation or initialize a new one for us.  I
> think that would make it fairly easy to initialize only the ones we're
> going to need.

Hmm. I am working on refactoring the code to do something like this.

> On a related note, I'm not sure that this code is correct:
>
> +   if (!isPartialAgg)
> +   {
> +   grouped_rel->part_scheme = input_rel->part_scheme;
> +   grouped_rel->nparts = nparts;
> +   grouped_rel->boundinfo = input_rel->boundinfo;
> +   grouped_rel->part_rels = part_rels;
> +   }
>
> It's not obvious to me why this should be done only when
> !isPartialAgg.  The comments claim that the partially grouped child
> rels can't be considered partitions of the top-level partitially
> grouped rel, but it seems to me that we could consider them that way.
> Maybe I'm missing something.

When we are performing partial aggregates, GROUP clause does not have
partition keys. This means that the targetlist of the grouped relation
and partially grouped relation do not have bare partition keys. So,
for a relation sitting on top of this (partially) grouped relation the
partition key doesn't exist. So, we can't consider grouped or
partially grouped relation as partitioned.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-08 Thread Jeevan Chalke

On Thu, Mar 8, 2018 at 7:49 PM, Robert Haas  wrote:

> On Thu, Mar 8, 2018 at 9:15 AM, Jeevan Chalke
>  wrote:
> > I am not sure why we don't set reltarget into the grouped_rel too.
> >
> > But if we do so like we did in partially_grouped_rel, then it will be lot
> > easier for partitionwise aggregate as then we don't have to pass target
> to
> > functions creating paths like create_append_path. We now need to update
> > generate_gather_paths() to take target too as it is now being called on
> > grouped_rel in which reltarget is not set.
> >
> > But yes, if there is any specific reason we can't do so, then I think the
> > same like Ashutosh Said. I didn't aware of such reason though.
>
> I see no problem with setting reltarget for the grouped_rel.  Before
> we added partially_grouped_rel, that rel computed paths with two
> different targets: partial paths had the partial grouping target, and
> non-partial paths had the ordinary grouping target.  But that's fixed
> now.
>

OK.
Will update my changes accordingly.
If we set reltarget into the grouped_rel now, then I don't need one of the
refactoring patch which is passing target to the path creation functions.


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-08 Thread Robert Haas

On Thu, Mar 8, 2018 at 9:15 AM, Jeevan Chalke
 wrote:
> I am not sure why we don't set reltarget into the grouped_rel too.
>
> But if we do so like we did in partially_grouped_rel, then it will be lot
> easier for partitionwise aggregate as then we don't have to pass target to
> functions creating paths like create_append_path. We now need to update
> generate_gather_paths() to take target too as it is now being called on
> grouped_rel in which reltarget is not set.
>
> But yes, if there is any specific reason we can't do so, then I think the
> same like Ashutosh Said. I didn't aware of such reason though.

I see no problem with setting reltarget for the grouped_rel.  Before
we added partially_grouped_rel, that rel computed paths with two
different targets: partial paths had the partial grouping target, and
non-partial paths had the ordinary grouping target.  But that's fixed
now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-08 Thread Jeevan Chalke

On Thu, Mar 8, 2018 at 1:15 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Wed, Mar 7, 2018 at 8:07 PM, Jeevan Chalke
>  wrote:
> Here are some more review comments esp. on
> try_partitionwise_grouping() function. BTW name of that function
> doesn't go in sync with enable_partitionwise_aggregation (which btw is
> in sync with enable_fooagg GUCs). But it goes in sync with
> create_grouping_paths(). It looks like we have already confused
> aggregation and grouping e.g. enable_hashagg may affect path creation
> when there is no aggregation involved i.e. only grouping is involved
> but create_grouping_paths will create paths when there is no grouping
> involved. Generally it looks like the function names use grouping to
> mean both aggregation and grouping but GUCs use aggregation to mean
> both of those. So, the naming in this patch looks consistent with the
> current naming conventions.
>
> +grouped_rel->part_scheme = input_rel->part_scheme;
> +grouped_rel->nparts = nparts;
> +grouped_rel->boundinfo = input_rel->boundinfo;
> +grouped_rel->part_rels = part_rels;
>
> You need to set the part_exprs which will provide partition keys for this
> partitioned relation. I think, we should include all the part_exprs of
> input_rel which are part of GROUP BY clause. Since any other expressions in
> part_exprs are not part of GROUP BY clause, they can not appear in the
> targetlist without an aggregate on top. So they can't be part of the
> partition
> keys of the grouped relation.
>
> In create_grouping_paths() we fetch both partial as well as fully grouped
> rel
> for given input relation. But in case of partial aggregation, we don't need
> fully grouped rel since we are not computing full aggregates for the
> children.
> Since fetch_upper_rel() creates a relation when one doesn't exist, we are
> unnecessarily creating fully grouped rels in this case. For thousands of
> partitions that's a lot of memory wasted.
>
> I see a similar issue with create_grouping_paths() when we are computing
> only
> full aggregates (either because partial aggregation is not possible or
> because
> parallelism is not possible). In that case, we unconditionally create
> partially
> grouped rels. That too would waste a lot of memory.
>
> I think that partially_grouped_rel, when required, is partitioned
> irrespective
> of whether we do full aggregation per partition or not. So, if we have its
> part_rels and other partitioning properties set. I agree that right now we
> won't use this information anywhere. It may be useful, in case we device a
> way
> to use partially_grouped_rel directly without using grouped_rel for
> planning
> beyond grouping, which seems unlikely.
>
> +
> +/*
> + * Parallel aggregation requires partial target, so compute it
> here
> + * and translate all vars. For partial aggregation, we need it
> + * anyways.
> + */
> +partial_target = make_partial_grouping_target(root, target,
> +  havingQual);
>
> Don't we have this available in partially_grouped_rel?
>
> That shows one asymmetry that Robert's refactoring has introduced. We
> don't set
> reltarget of grouped_rel but set reltarget of partially_grouped_rel. If
> reltarget of grouped_rel changes across paths (the reason why we have not
> set
> it in grouped_rel), shouldn't reltarget of partially grouped paths change
> accordingly?
>

I am not sure why we don't set reltarget into the grouped_rel too.

But if we do so like we did in partially_grouped_rel, then it will be lot
easier for partitionwise aggregate as then we don't have to pass target to
functions creating paths like create_append_path. We now need to update
generate_gather_paths() to take target too as it is now being called on
grouped_rel in which reltarget is not set.

But yes, if there is any specific reason we can't do so, then I think the
same like Ashutosh Said. I didn't aware of such reason though.


> +
> +/*
> + * group_by_has_partkey
> + *
> + * Returns true, if all the partition keys of the given relation are part
> of
> + * the GROUP BY clauses, false otherwise.
> + */
> +static bool
> +group_by_has_partkey(RelOptInfo *input_rel, PathTarget *target,
> + List *groupClause)
>
> We could modify this function to return the list of part_exprs which are
> part
> of group clause and use that as the partition keys of the grouped_rel if
> required. If group by doesn't have all the partition keys, the function
> would
> return a NULL list.
>
> Right now, in case of full aggregation, partially_grouped_rel is populated
> with
> the partial paths created by adding partial aggregation to the partial
> paths of
> input relation. But we are not trying to create partial paths by (parallel)
> appending the (non)partial paths from the child partially_grouped_rel.
> Have we
> thought about that? Would such paths have

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-08 Thread Robert Haas

On Thu, Mar 8, 2018 at 2:45 AM, Ashutosh Bapat
 wrote:
> +grouped_rel->part_scheme = input_rel->part_scheme;
> +grouped_rel->nparts = nparts;
> +grouped_rel->boundinfo = input_rel->boundinfo;
> +grouped_rel->part_rels = part_rels;
>
> You need to set the part_exprs which will provide partition keys for this
> partitioned relation. I think, we should include all the part_exprs of
> input_rel which are part of GROUP BY clause. Since any other expressions in
> part_exprs are not part of GROUP BY clause, they can not appear in the
> targetlist without an aggregate on top. So they can't be part of the partition
> keys of the grouped relation.
>
> In create_grouping_paths() we fetch both partial as well as fully grouped rel
> for given input relation. But in case of partial aggregation, we don't need
> fully grouped rel since we are not computing full aggregates for the children.
> Since fetch_upper_rel() creates a relation when one doesn't exist, we are
> unnecessarily creating fully grouped rels in this case. For thousands of
> partitions that's a lot of memory wasted.
>
> I see a similar issue with create_grouping_paths() when we are computing only
> full aggregates (either because partial aggregation is not possible or because
> parallelism is not possible). In that case, we unconditionally create 
> partially
> grouped rels. That too would waste a lot of memory.

This kind of goes along with the suggestion I made yesterday to
introduce a new function, which at the time I proposed calling
initialize_grouping_rel(), to set up new grouped or partially grouped
relations.  By doing that it would be easier to ensure the
initialization is always done in a consistent way but only for the
relations we actually need.  But maybe we should call it
fetch_grouping_rel() instead.  The idea would be that instead of
calling fetch_upper_rel() we would call fetch_grouping_rel() when it
is a question of the grouped or partially grouped relation.  It would
either return the existing relation or initialize a new one for us.  I
think that would make it fairly easy to initialize only the ones we're
going to need.

Also, I don't think we should be paranoid about memory usage here.
It's good to avoid creating new rels that are obviously not needed,
not only because of memory consumption but also because of the CPU
consumption involved, but I don't want to contort the code to squeeze
every last byte of memory out of this.

On a related note, I'm not sure that this code is correct:

+   if (!isPartialAgg)
+   {
+   grouped_rel->part_scheme = input_rel->part_scheme;
+   grouped_rel->nparts = nparts;
+   grouped_rel->boundinfo = input_rel->boundinfo;
+   grouped_rel->part_rels = part_rels;
+   }

It's not obvious to me why this should be done only when
!isPartialAgg.  The comments claim that the partially grouped child
rels can't be considered partitions of the top-level partitially
grouped rel, but it seems to me that we could consider them that way.
Maybe I'm missing something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-07 Thread Ashutosh Bapat

On Wed, Mar 7, 2018 at 8:07 PM, Jeevan Chalke
 wrote:
Here are some more review comments esp. on
try_partitionwise_grouping() function. BTW name of that function
doesn't go in sync with enable_partitionwise_aggregation (which btw is
in sync with enable_fooagg GUCs). But it goes in sync with
create_grouping_paths(). It looks like we have already confused
aggregation and grouping e.g. enable_hashagg may affect path creation
when there is no aggregation involved i.e. only grouping is involved
but create_grouping_paths will create paths when there is no grouping
involved. Generally it looks like the function names use grouping to
mean both aggregation and grouping but GUCs use aggregation to mean
both of those. So, the naming in this patch looks consistent with the
current naming conventions.

+grouped_rel->part_scheme = input_rel->part_scheme;
+grouped_rel->nparts = nparts;
+grouped_rel->boundinfo = input_rel->boundinfo;
+grouped_rel->part_rels = part_rels;

You need to set the part_exprs which will provide partition keys for this
partitioned relation. I think, we should include all the part_exprs of
input_rel which are part of GROUP BY clause. Since any other expressions in
part_exprs are not part of GROUP BY clause, they can not appear in the
targetlist without an aggregate on top. So they can't be part of the partition
keys of the grouped relation.

In create_grouping_paths() we fetch both partial as well as fully grouped rel
for given input relation. But in case of partial aggregation, we don't need
fully grouped rel since we are not computing full aggregates for the children.
Since fetch_upper_rel() creates a relation when one doesn't exist, we are
unnecessarily creating fully grouped rels in this case. For thousands of
partitions that's a lot of memory wasted.

I see a similar issue with create_grouping_paths() when we are computing only
full aggregates (either because partial aggregation is not possible or because
parallelism is not possible). In that case, we unconditionally create partially
grouped rels. That too would waste a lot of memory.

I think that partially_grouped_rel, when required, is partitioned irrespective
of whether we do full aggregation per partition or not. So, if we have its
part_rels and other partitioning properties set. I agree that right now we
won't use this information anywhere. It may be useful, in case we device a way
to use partially_grouped_rel directly without using grouped_rel for planning
beyond grouping, which seems unlikely.

+
+/*
+ * Parallel aggregation requires partial target, so compute it here
+ * and translate all vars. For partial aggregation, we need it
+ * anyways.
+ */
+partial_target = make_partial_grouping_target(root, target,
+  havingQual);

Don't we have this available in partially_grouped_rel?

That shows one asymmetry that Robert's refactoring has introduced. We don't set
reltarget of grouped_rel but set reltarget of partially_grouped_rel. If
reltarget of grouped_rel changes across paths (the reason why we have not set
it in grouped_rel), shouldn't reltarget of partially grouped paths change
accordingly?

+
+/*
+ * group_by_has_partkey
+ *
+ * Returns true, if all the partition keys of the given relation are part of
+ * the GROUP BY clauses, false otherwise.
+ */
+static bool
+group_by_has_partkey(RelOptInfo *input_rel, PathTarget *target,
+ List *groupClause)

We could modify this function to return the list of part_exprs which are part
of group clause and use that as the partition keys of the grouped_rel if
required. If group by doesn't have all the partition keys, the function would
return a NULL list.

Right now, in case of full aggregation, partially_grouped_rel is populated with
the partial paths created by adding partial aggregation to the partial paths of
input relation. But we are not trying to create partial paths by (parallel)
appending the (non)partial paths from the child partially_grouped_rel. Have we
thought about that? Would such paths have different shapes from the ones that
we create now and will they be better?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-07 Thread Jeevan Chalke

On Wed, Mar 7, 2018 at 1:45 AM, Robert Haas  wrote:

> On Tue, Mar 6, 2018 at 5:31 AM, Jeevan Chalke
>  wrote:
> > This is in-lined with enable_hashagg GUC. Do you think
> > enable_partitionwise_aggregate seems better? But it will be not
> consistent
> > with other GUC names like enable_hashagg then.
>
> Well, if I had my way, enable_hashagg would be spelled
> enable_hash_aggregate, too, but I wasn't involved in the project that
> long ago.  100% consistency is hard to achieve here; the perfect
> parallel of enable_hashagg would be enable_partitionwiseagg, but then
> it would be inconsistent with enable_partitionwise_join unless we
> renamed it to enable_partitionwisejoin, which I would rather not do.
> I think the way the enable_blahfoo names were done was kinda
> shortsighted -- it works OK as long as blahfoo is pretty short, like
> mergejoin or hashagg or whatever, but if you have more or longer words
> then I think it's hard to see where the word boundaries are without
> any punctuation.  And if you start abbreviating then you end up with
> things like enable_pwagg which are not very easy to understand.  So I
> favor spelling everything out and punctuating it.
>

Understood and make sense.
Updated.


>
> > So the code for doing partially aggregated partial paths and partially
> > aggregated non-partial path is same except partial paths goes into
> > partial_pathlist where as non-partial goes into pathlist of
> > partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel()
> > when isPartialAgg = true seems correct. Also as we have decided, this
> > function is responsible to create all partially aggregated paths
> including
> > both partial and non-partial.
> >
> > Am I missing something?
>
> Hmm.  I guess not.  I think I didn't read this code well enough
> previously.  Please find attached proposed incremental patches (0001
> and 0002) which hopefully make the code in this area a bit clearer.
>

Yep. Thanks for these patches.
I have merged these changes into my main (0007) patch now.


>
> >> +   /*
> >> +* If there are any fully aggregated partial paths present,
> >> may be because
> >> +* of parallel Append over partitionwise aggregates, we must
> stick
> >> a
> >> +* Gather or Gather Merge path atop the cheapest partial path.
> >> +*/
> >> +   if (grouped_rel->partial_pathlist)
> >>
> >> This comment is copied from someplace where the code does what the
> >> comment says, but here it doesn't do any such thing.
> >
> > Well, these comments are not present anywhere else than this place. With
> > Parallel Append and Partitionwise aggregates, it is now possible to have
> > fully aggregated partial paths now. And thus we need to stick a Gather
> > and/or Gather Merge atop cheapest partial path. And I believe the code
> does
> > the same. Am I missing something?
>
> I misread the code.  Sigh.  I should have waited until today to send
> that email and taken time to study it more carefully.  But I still
> don't think it's completely correct.  It will not consider using a
> pre-sorted path; the only strategies it can consider are cheapest path
> + Gather and cheapest path + explicit Sort (even if the cheapest path
> is already correctly sorted!) + Gather Merge.  It should really do
> something similar to what add_paths_to_partial_grouping_rel() already
> does: first call generate_gather_paths() and then, if the cheapest
> partial path is not already correctly sorted, also try an explicit
> Sort + Gather Merge.  In fact, it looks like we can actually reuse
> that logic exactly.  See attached 0003 incremental patch; this changes
> the outputs of one of your regression tests, but the new plan looks
> better.
>

This seems like a refactoring patch and thus added as separate patch (0005)
in patch-set.
Changes related to PWA patch are merged accordingly too.

Attached new patch-set with these changes merged and fixing review comments
from Ashutosh Bapat along with his GroupPathExtraData changes patch.


> Some other notes:
>
> There's a difference between performing partial aggregation in the
> same process and performing it in a different process.  hasNonPartial
> tells us that we can't perform partial aggregation *at all*;
> hasNonSerial only tells us that partial and final aggregation must
> happen in the same process.  This patch could possibly take advantage
> of partial aggregation even when hasNonSerial is set.  Finalize
> Aggregate -> Append -> N copies of { Partial Aggregate -> Whatever }
> is OK with hasNonSerial = true as long as hasNonPartial = false.  Now,
> the bad news is that for this to actually work we'd need to define new
> values of AggSplit, like AGGSPLIT_INITIAL = AGGSPLITOP_SKIPFINAL and
> AGGSPLIT_FINAL = AGGSPLITOP_COMBINE, and I'm not sure how much
> complexity that adds.  However, if we're not going to do that, I think
> we'd better at last add some comments about it suggesting

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-07 Thread Jeevan Chalke

On Tue, Mar 6, 2018 at 4:59 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi Jeevan,
> I am back reviewing this. Here are some comments.
>
> @@ -1415,7 +1413,8 @@ add_paths_to_append_rel(PlannerInfo *root,
> RelOptInfo *rel,
>   * the unparameterized Append path we are constructing for the
> parent.
>   * If not, there's no workable unparameterized path.
>   */
> -if (childrel->cheapest_total_path->param_info == NULL)
> +if (childrel->pathlist != NIL &&
> +childrel->cheapest_total_path->param_info == NULL)
>  accumulate_append_subpath(childrel->cheapest_total_path,
>, NULL);
>  else
> @@ -1683,6 +1682,13 @@ add_paths_to_append_rel(PlannerInfo *root,
> RelOptInfo *rel,
>  RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
>  Path   *subpath;
>
> +if (childrel->pathlist == NIL)
> +{
> +/* failed to make a suitable path for this child */
> +subpaths_valid = false;
> +break;
> +}
> +
> When can childrel->pathlist be NIL?
>

Done. Sorry it was leftover from my earlier trial. Not needed now. Removed.


>
> diff --git a/src/backend/optimizer/plan/createplan.c
> b/src/backend/optimizer/plan/createplan.c
> index 9ae1bf3..f90626c 100644
> --- a/src/backend/optimizer/plan/createplan.c
> +++ b/src/backend/optimizer/plan/createplan.c
> @@ -1670,7 +1670,15 @@ create_sort_plan(PlannerInfo *root, SortPath
> *best_path, int flags)
>  subplan = create_plan_recurse(root, best_path->subpath,
>flags | CP_SMALL_TLIST);
>
> -plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
> NULL);
> +/*
> + * In find_ec_member_for_tle(), child EC members are ignored if they
> don't
> + * belong to the given relids. Thus, if this sort path is based on a
> child
> + * relation, we must pass the relids of it. Otherwise, we will end-up
> into
> + * an error requiring pathkey item.
> + */
> +plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
> +   IS_OTHER_REL(best_path->subpath->parent)
> ?
> +   best_path->path.parent->relids : NULL);
>
>  copy_generic_path_info(>plan, (Path *) best_path);
>
> Please separate this small adjustment in a patch of its own, with some
> explanation of why we need it i.e. now this function can see SortPaths from
> child (other) relations.
>

I am not sure whether it is good to split this out of the main patch. Main
patch exposes this requirement and thus seems better to have these changes
in main patch itself.
However, I have no issues in extracting it into a separate small patch. Let
me know your views.


>
> +if (child_data)
> +{
> +/* Must be other rel as all child relations are marked OTHER_RELs
> */
> +Assert(IS_OTHER_REL(input_rel));
>
> I think we should check IS_OTHER_REL() and Assert(child_data). That way we
> know
> that the code in the if block is executed for OTHER relation.
>

Done.


>
> -if ((root->hasHavingQual || parse->groupingSets) &&
> +if (!child_data && (root->hasHavingQual || parse->groupingSets) &&
>
> Degenerate grouping will never see child relations, so instead of checking
> for
> child_data, Assert (!IS_OTHER_REL()) inside this block. Add a comment there
> explaining the assertion.
>

Done.


>
> + *
> + * If we are performing grouping for a child relation, fetch can_sort
> from
> + * the child_data to avoid re-calculating same.
>   */
> -can_sort = (gd && gd->rollups != NIL)
> -|| grouping_is_sortable(parse->groupClause);
> +can_sort = child_data ? child_data->can_sort : ((gd &&
> gd->rollups != NIL) ||
> +
> grouping_is_sortable(parse->groupClause));
>
> Instead of adding a conditional here, we can compute these values before
> create_grouping_paths() is called from grouping_planner() and then pass
> them
> down to try_partitionwise_grouping(). I have attached a patch which
> refactors
> this code. Please see if this refactoring is useful. In the attached
> patch, I
> have handled can_sort, can_hash and partial aggregation costs. More on the
> last
> component below.
>
>
>  /*
>   * Figure out whether a PartialAggregate/Finalize Aggregate execution
> @@ -3740,10 +3789,8 @@ create_grouping_paths(PlannerInfo *root,
>   * partial paths for partially_grouped_rel; that way, later code can
>   * easily consider both parallel and non-parallel approaches to
> grouping.
>   */
> -if (try_parallel_aggregation)
> +if (!child_data && !(agg_costs->hasNonPartial ||
> agg_costs->hasNonSerial))
>  {
> -PathTarget *partial_grouping_target;
> -
> [... clipped ...]
> +get_agg_clause_costs(root, havingQual,
>   AGGSPLIT_FINAL_DESERIAL,
> -

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-06 Thread Ashutosh Bapat

On Tue, Mar 6, 2018 at 7:52 PM, Jeevan Chalke
 wrote:

>
>
> Changes look good to me and refactoring will be useful for partitionwise
> patches.
>
> However, will it be good if we add agg_costs into the GroupPathExtraData
> too?
> Also can we pass this to the add_partial_paths_to_grouping_rel() and
> add_paths_to_grouping_rel() to avoid passing can_sort, can_hash and costs
> related details individually to them?

I think so too.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-06 Thread Robert Haas

On Tue, Mar 6, 2018 at 5:31 AM, Jeevan Chalke
 wrote:
> This is in-lined with enable_hashagg GUC. Do you think
> enable_partitionwise_aggregate seems better? But it will be not consistent
> with other GUC names like enable_hashagg then.

Well, if I had my way, enable_hashagg would be spelled
enable_hash_aggregate, too, but I wasn't involved in the project that
long ago.  100% consistency is hard to achieve here; the perfect
parallel of enable_hashagg would be enable_partitionwiseagg, but then
it would be inconsistent with enable_partitionwise_join unless we
renamed it to enable_partitionwisejoin, which I would rather not do.
I think the way the enable_blahfoo names were done was kinda
shortsighted -- it works OK as long as blahfoo is pretty short, like
mergejoin or hashagg or whatever, but if you have more or longer words
then I think it's hard to see where the word boundaries are without
any punctuation.  And if you start abbreviating then you end up with
things like enable_pwagg which are not very easy to understand.  So I
favor spelling everything out and punctuating it.

> So the code for doing partially aggregated partial paths and partially
> aggregated non-partial path is same except partial paths goes into
> partial_pathlist where as non-partial goes into pathlist of
> partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel()
> when isPartialAgg = true seems correct. Also as we have decided, this
> function is responsible to create all partially aggregated paths including
> both partial and non-partial.
>
> Am I missing something?

Hmm.  I guess not.  I think I didn't read this code well enough
previously.  Please find attached proposed incremental patches (0001
and 0002) which hopefully make the code in this area a bit clearer.

>> +   /*
>> +* If there are any fully aggregated partial paths present,
>> may be because
>> +* of parallel Append over partitionwise aggregates, we must stick
>> a
>> +* Gather or Gather Merge path atop the cheapest partial path.
>> +*/
>> +   if (grouped_rel->partial_pathlist)
>>
>> This comment is copied from someplace where the code does what the
>> comment says, but here it doesn't do any such thing.
>
> Well, these comments are not present anywhere else than this place. With
> Parallel Append and Partitionwise aggregates, it is now possible to have
> fully aggregated partial paths now. And thus we need to stick a Gather
> and/or Gather Merge atop cheapest partial path. And I believe the code does
> the same. Am I missing something?

I misread the code.  Sigh.  I should have waited until today to send
that email and taken time to study it more carefully.  But I still
don't think it's completely correct.  It will not consider using a
pre-sorted path; the only strategies it can consider are cheapest path
+ Gather and cheapest path + explicit Sort (even if the cheapest path
is already correctly sorted!) + Gather Merge.  It should really do
something similar to what add_paths_to_partial_grouping_rel() already
does: first call generate_gather_paths() and then, if the cheapest
partial path is not already correctly sorted, also try an explicit
Sort + Gather Merge.  In fact, it looks like we can actually reuse
that logic exactly.  See attached 0003 incremental patch; this changes
the outputs of one of your regression tests, but the new plan looks
better.

Some other notes:

There's a difference between performing partial aggregation in the
same process and performing it in a different process.  hasNonPartial
tells us that we can't perform partial aggregation *at all*;
hasNonSerial only tells us that partial and final aggregation must
happen in the same process.  This patch could possibly take advantage
of partial aggregation even when hasNonSerial is set.  Finalize
Aggregate -> Append -> N copies of { Partial Aggregate -> Whatever }
is OK with hasNonSerial = true as long as hasNonPartial = false.  Now,
the bad news is that for this to actually work we'd need to define new
values of AggSplit, like AGGSPLIT_INITIAL = AGGSPLITOP_SKIPFINAL and
AGGSPLIT_FINAL = AGGSPLITOP_COMBINE, and I'm not sure how much
complexity that adds.  However, if we're not going to do that, I think
we'd better at last add some comments about it suggesting that someone
might want to do something about it in the future.

I think that, in general, it's a good idea to keep the number of times
that create_grouping_paths() does something which is conditional on
whether child_data is NULL to a minimum.  I haven't looked at what
Ashutosh tried to do there so I don't know whether it's good or bad,
but I like the idea, if we can do it cleanly.

It strikes me that we might want to consider refactoring things so
that create_grouping_paths() takes the grouping_rel and
partial_grouping_rel as input arguments.  Right now, the
initialization of the child grouping and partial-grouping rels is
partly in try_partitionwise_aggregate(),

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-06 Thread Jeevan Chalke

On Tue, Mar 6, 2018 at 4:59 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi Jeevan,
> I am back reviewing this. Here are some comments.
>
> @@ -1415,7 +1413,8 @@ add_paths_to_append_rel(PlannerInfo *root,
> RelOptInfo *rel,
>   * the unparameterized Append path we are constructing for the
> parent.
>   * If not, there's no workable unparameterized path.
>   */
> -if (childrel->cheapest_total_path->param_info == NULL)
> +if (childrel->pathlist != NIL &&
> +childrel->cheapest_total_path->param_info == NULL)
>  accumulate_append_subpath(childrel->cheapest_total_path,
>, NULL);
>  else
> @@ -1683,6 +1682,13 @@ add_paths_to_append_rel(PlannerInfo *root,
> RelOptInfo *rel,
>  RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
>  Path   *subpath;
>
> +if (childrel->pathlist == NIL)
> +{
> +/* failed to make a suitable path for this child */
> +subpaths_valid = false;
> +break;
> +}
> +
> When can childrel->pathlist be NIL?
>
> diff --git a/src/backend/optimizer/plan/createplan.c
> b/src/backend/optimizer/plan/createplan.c
> index 9ae1bf3..f90626c 100644
> --- a/src/backend/optimizer/plan/createplan.c
> +++ b/src/backend/optimizer/plan/createplan.c
> @@ -1670,7 +1670,15 @@ create_sort_plan(PlannerInfo *root, SortPath
> *best_path, int flags)
>  subplan = create_plan_recurse(root, best_path->subpath,
>flags | CP_SMALL_TLIST);
>
> -plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
> NULL);
> +/*
> + * In find_ec_member_for_tle(), child EC members are ignored if they
> don't
> + * belong to the given relids. Thus, if this sort path is based on a
> child
> + * relation, we must pass the relids of it. Otherwise, we will end-up
> into
> + * an error requiring pathkey item.
> + */
> +plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
> +   IS_OTHER_REL(best_path->subpath->parent)
> ?
> +   best_path->path.parent->relids : NULL);
>
>  copy_generic_path_info(>plan, (Path *) best_path);
>
> Please separate this small adjustment in a patch of its own, with some
> explanation of why we need it i.e. now this function can see SortPaths from
> child (other) relations.
>
> +if (child_data)
> +{
> +/* Must be other rel as all child relations are marked OTHER_RELs
> */
> +Assert(IS_OTHER_REL(input_rel));
>
> I think we should check IS_OTHER_REL() and Assert(child_data). That way we
> know
> that the code in the if block is executed for OTHER relation.
>
> -if ((root->hasHavingQual || parse->groupingSets) &&
> +if (!child_data && (root->hasHavingQual || parse->groupingSets) &&
>
> Degenerate grouping will never see child relations, so instead of checking
> for
> child_data, Assert (!IS_OTHER_REL()) inside this block. Add a comment there
> explaining the assertion.
>
> + *
> + * If we are performing grouping for a child relation, fetch can_sort
> from
> + * the child_data to avoid re-calculating same.
>   */
> -can_sort = (gd && gd->rollups != NIL)
> -|| grouping_is_sortable(parse->groupClause);
> +can_sort = child_data ? child_data->can_sort : ((gd &&
> gd->rollups != NIL) ||
> +
> grouping_is_sortable(parse->groupClause));
>
> Instead of adding a conditional here, we can compute these values before
> create_grouping_paths() is called from grouping_planner() and then pass
> them
> down to try_partitionwise_grouping(). I have attached a patch which
> refactors
> this code. Please see if this refactoring is useful. In the attached
> patch, I
> have handled can_sort, can_hash and partial aggregation costs. More on the
> last
> component below.
>

Changes look good to me and refactoring will be useful for partitionwise
patches.

However, will it be good if we add agg_costs into the GroupPathExtraData
too?
Also can we pass this to the add_partial_paths_to_grouping_rel() and
add_paths_to_grouping_rel() to avoid passing can_sort, can_hash and costs
related details individually to them?


>
>
>  /*
>   * Figure out whether a PartialAggregate/Finalize Aggregate execution
> @@ -3740,10 +3789,8 @@ create_grouping_paths(PlannerInfo *root,
>   * partial paths for partially_grouped_rel; that way, later code can
>   * easily consider both parallel and non-parallel approaches to
> grouping.
>   */
> -if (try_parallel_aggregation)
> +if (!child_data && !(agg_costs->hasNonPartial ||
> agg_costs->hasNonSerial))
>  {
> -PathTarget *partial_grouping_target;
> -
> [... clipped ...]
> +get_agg_clause_costs(root, havingQual,
>   AGGSPLIT_FINAL_DESERIAL,
> - _final_costs);

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-06 Thread Jeevan Chalke

On Tue, Mar 6, 2018 at 2:29 AM, Robert Haas  wrote:

> On Mon, Mar 5, 2018 at 3:56 AM, Jeevan Chalke
>  wrote:
> > However, to perform Gather or Gather Merge once we have all partial paths
> > ready, and to avoid too many existing code rearrangement, I am calling
> > try_partitionwise_grouping() before we do any aggregation/grouping on
> whole
> > relation. By doing this, we will be having all partial paths in
> > partially_grouped_rel and then existing code will do required
> finalization
> > along with any Gather or Gather Merge, if required.
> >
> > Please have a look over attached patch-set and let me know if it needs
> > further changes.
>
> This does look better.
>

Thank you, Robert.

>
> +  enable_partitionwise_agg
> (boolean)
>
> Please don't abbreviate "aggregate" to "agg".
>

This is in-lined with enable_hashagg GUC. Do you think
enable_partitionwise_aggregate
seems better? But it will be not consistent with other GUC names like
enable_hashagg then.

>
> -   /* Build final grouping paths */
> -   add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
> -
> partially_grouped_rel, agg_costs,
> -
> _final_costs, gd, can_sort, can_hash,
> - dNumGroups,
> (List *) parse->havingQual);
> +   if (isPartialAgg)
> +   {
> +   Assert(agg_partial_costs && agg_final_costs);
> +   add_paths_to_partial_grouping_rel(root, input_rel,
> +
>partially_grouped_rel,
> +
>agg_partial_costs,
> +
>gd, can_sort, can_hash,
> +
>false, true);
> +   }
> +   else
> +   {
> +   double  dNumGroups;
> +
> +   /* Estimate number of groups. */
> +   dNumGroups = get_number_of_groups(root,
> +
>cheapest_path->rows,
> +
>gd,
> +
>child_data ? make_tlist_from_pathtarget(target) :
> parse->targetList);
> +
> +   /* Build final grouping paths */
> +   add_paths_to_grouping_rel(root, input_rel, grouped_rel,
> target,
> +
> partially_grouped_rel, agg_costs,
> +
> agg_final_costs, gd, can_sort, can_hash,
> +
> dNumGroups, (List *) havingQual);
> +   }
>
> This looks strange.  Why do we go down two completely different code
> paths here?

It is because when isPartialAgg = true we need to create partially
aggregated non-partial paths which should be added in
partially_grouped_rel->pathlist. And when isPartialAgg = false, we are
creating fully aggregated paths which goes into grouped_rel->pathlist.

>   It seems to me that the set of paths we add to the
> partial_pathlist shouldn't depend at all on isPartialAgg.  I might be
> confused, but it seems to me that any aggregate path we construct that
> is going to run in parallel must necessarily be partial, because even
> if each group will occur only in one partition, it might still occur
> in multiple workers for that partition, so finalization would be
> needed.

Thats's true. We are creating partially aggregated partial paths for this
and keeps them in partially_grouped_rel->partial_pathlist.

>   On the other hand, for non-partial paths, we can add then to
> partially_grouped_rel when isPartialAgg = true and to grouped_rel when
> isPartialAgg = false, with the only difference being AGGSPLIT_SIMPLE
> vs. AGGSPLIT_INITIAL_SERIAL.

Yes. As explained above, they goes in pathlist of respective Rel.
However, PathTarget is different too, we need partial_pathtarget when
isPartialAgg = true and also need agg_partial_costs.

> But that doesn't appear to be what this
> is doing.
>

So the code for doing partially aggregated partial paths and partially
aggregated non-partial path is same except partial paths goes into
partial_pathlist where as non-partial goes into pathlist of
partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel()
when isPartialAgg = true seems correct. Also as we have decided, this
function is responsible to create all partially aggregated paths including
both partial and non-partial.

Am I missing something?

>
> +   /*
> +* If there are any fully aggregated partial paths present,
> may be because
> +* of parallel Append over partitionwise aggregates, we must stick
> a
> +* Gather or Gather Merge path atop the cheapest partial path.
> +*/
> +   if (grouped_rel->partial_pathlist)
>
> This comment is copied from someplace where the code does what the
> comment says, but here it doesn't do any such thing.
>

Well, these comments are not present anywhere else than this place. With
Parallel Append and Partitionwise aggregates, it is now possible to have
fully aggregated partial paths now. And thus we need to stick a Gather
and/or Gather Merge atop cheapest partial path. And I believe the code does
the same. Am I missing something?

>
> More tomorrow...
>
> --
> Robert Haas
> EnterpriseDB:

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-05 Thread Robert Haas

On Mon, Mar 5, 2018 at 3:56 AM, Jeevan Chalke
 wrote:
> However, to perform Gather or Gather Merge once we have all partial paths
> ready, and to avoid too many existing code rearrangement, I am calling
> try_partitionwise_grouping() before we do any aggregation/grouping on whole
> relation. By doing this, we will be having all partial paths in
> partially_grouped_rel and then existing code will do required finalization
> along with any Gather or Gather Merge, if required.
>
> Please have a look over attached patch-set and let me know if it needs
> further changes.

This does look better.

+  enable_partitionwise_agg (boolean)

Please don't abbreviate "aggregate" to "agg".

-   /* Build final grouping paths */
-   add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
-
partially_grouped_rel, agg_costs,
-
_final_costs, gd, can_sort, can_hash,
- dNumGroups,
(List *) parse->havingQual);
+   if (isPartialAgg)
+   {
+   Assert(agg_partial_costs && agg_final_costs);
+   add_paths_to_partial_grouping_rel(root, input_rel,
+
   partially_grouped_rel,
+
   agg_partial_costs,
+
   gd, can_sort, can_hash,
+
   false, true);
+   }
+   else
+   {
+   double  dNumGroups;
+
+   /* Estimate number of groups. */
+   dNumGroups = get_number_of_groups(root,
+
   cheapest_path->rows,
+
   gd,
+
   child_data ? make_tlist_from_pathtarget(target) :
parse->targetList);
+
+   /* Build final grouping paths */
+   add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+
partially_grouped_rel, agg_costs,
+
agg_final_costs, gd, can_sort, can_hash,
+
dNumGroups, (List *) havingQual);
+   }

This looks strange.  Why do we go down two completely different code
paths here?  It seems to me that the set of paths we add to the
partial_pathlist shouldn't depend at all on isPartialAgg.  I might be
confused, but it seems to me that any aggregate path we construct that
is going to run in parallel must necessarily be partial, because even
if each group will occur only in one partition, it might still occur
in multiple workers for that partition, so finalization would be
needed.  On the other hand, for non-partial paths, we can add then to
partially_grouped_rel when isPartialAgg = true and to grouped_rel when
isPartialAgg = false, with the only difference being AGGSPLIT_SIMPLE
vs. AGGSPLIT_INITIAL_SERIAL.  But that doesn't appear to be what this
is doing.

+   /*
+* If there are any fully aggregated partial paths present,
may be because
+* of parallel Append over partitionwise aggregates, we must stick a
+* Gather or Gather Merge path atop the cheapest partial path.
+*/
+   if (grouped_rel->partial_pathlist)

This comment is copied from someplace where the code does what the
comment says, but here it doesn't do any such thing.

More tomorrow...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-05 Thread Jeevan Chalke

On Sat, Mar 3, 2018 at 12:12 AM, Robert Haas  wrote:

> On Thu, Mar 1, 2018 at 4:52 PM, Robert Haas  wrote:
> > This is not a full review, but I'm out of time for right now.
>
> Another thing I see here now is that create_grouping_paths() and
> create_child_grouping_paths() are extremely similar.  Isn't there some
> way we can refactor things so that we can reuse the code instead of
> duplicating it?
>

Yes. I too observed the same after our re-design.

To avoid code duplication, I am now calling create_grouping_paths() for
child relation too.

However, to perform Gather or Gather Merge once we have all partial paths
ready, and to avoid too many existing code rearrangement, I am calling
try_partitionwise_grouping() before we do any aggregation/grouping on whole
relation. By doing this, we will be having all partial paths in
partially_grouped_rel and then existing code will do required finalization
along with any Gather or Gather Merge, if required.

Please have a look over attached patch-set and let me know if it needs
further changes.

Thanks

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

partition-wise-agg-v15.tar.gz
Description: application/gzip

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-05 Thread Jeevan Chalke

On Fri, Mar 2, 2018 at 3:22 AM, Robert Haas  wrote:

> On Thu, Mar 1, 2018 at 5:34 AM, Jeevan Chalke
>  wrote:
> > Attached new patchset after rebasing my changes over these changes and on
> > latest HEAD.
>
> +* We have already created a Gather or Gather Merge path atop
> cheapest
> +* partial path. Thus the partial path referenced by the
> Gather node needs
> +* to be preserved as adding new partial paths in same rel may
> delete this
> +* referenced path. To do this we need to clear the
> partial_pathlist from
> +* the partially_grouped_rel as we may add partial paths again
> while doing
> +* partitionwise aggregation. Keeping older partial path intact
> seems
> +* reasonable too as it might possible that the final path
> chosen which is
> +* using it wins, but the underneath partial path is not the
> cheapest one.
>
> This isn't a good design.  You shouldn't create a Gather or Gather
> Merge node until all partial paths have been added.  I mean, the point
> is to put a Gather node on top of the cheapest path, not the path that
> is currently the cheapest but might not actually be the cheapest once
> we've added them all.
>

To be honest, I didn't know that we should not generated Gather or Gather
Merge until we have all possible partial paths in place. I realize it
recently while debugging one issue reported by Rajkumar off-list. While
working on that fix, what I have observed is
- I have cheapest partial path with cost say 10, a Gather on it increased
cost to 11.
- Later when I add a partial path it has a cost say 9 but a gather on it
resulted is total cost to 12.
This means, the first Gather path is the cheapest one but not the
underneath partial path and unfortunately that got removed when my partial
path is added into the partial_pathlist.

Due to this, I thought it is better to have both paths valid and to avoid
deleting earlier cheapest partial_path, I chose to reset the
partially_grouped_rel->partial_pathlist.

But, yes per comment in generate_gather_paths() and as you said, we should
add Gather or Gather Merge only after we have done with all partial path
creation. Sorry for not knowing this before.


>
> +add_gather_or_gather_merge(PlannerInfo *root,
>
> Please stop picking generic function names for functions that have
> very specific purposes.  I don't really think that you need this to be
> a separate function at all, but it it is certainly NOT a
> general-purpose function for adding a Gather or Gather Merge node.
>

OK. Got it now.


>
> +/*
> + * Collect statistics about aggregates for estimating costs of
> + * performing aggregation in parallel or in partial, if not
> already
> + * done. We use same cost for all the children as they will be
> same
> + * anyways.
> + */
>
> If it only needs to be done once, do we really have to have it inside
> the loop?  I see that you're using the varno-translated
> partial_target->exprs and target->exprs, but if the specific varnos
> don't matter, why not just use the untranslated version of the targets
> before entering the loop?  And if the specific varnos do matter, then
> presumably you need to do it every time.
>

Yes. It can be pulled outside a loop.


>
> This is not a full review, but I'm out of time for right now.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-02 Thread Robert Haas

On Thu, Mar 1, 2018 at 4:52 PM, Robert Haas  wrote:
> This is not a full review, but I'm out of time for right now.

Another thing I see here now is that create_grouping_paths() and
create_child_grouping_paths() are extremely similar.  Isn't there some
way we can refactor things so that we can reuse the code instead of
duplicating it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-01 Thread Jeevan Chalke

On Mon, Feb 26, 2018 at 8:03 PM, Robert Haas  wrote:

>
> Committed after incorporating your other fixes and updating the
> optimizer README.
>

Attached new patchset after rebasing my changes over these changes and on
latest HEAD.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Thanks

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v14.tar.gz
Description: GNU Zip compressed data

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-27 Thread Robert Haas

On Tue, Feb 27, 2018 at 4:29 AM, Jeevan Chalke
 wrote:
> Hi Robert,
> I had a look over his provided testcase and observed that when we create a
> Gather Merge path over a cheapest partial path by sorting it explicitly as
> generate_gather_paths won't consider it, we accidentally used cheapest
> partial path from the input_rel to create a Gather Merge; instead we need a
> cheapest partial path from the partially_grouped_rel.
>
> Attached fix_aggref_in_non-agg_error.patch fixing this.

Oops.  Thanks, committed.

> test_for_aggref_in_non-agg_error.patch has a testcase reported by Rajkumar
> which I have added in a aggregates.sql.

Didn't commit this; I think that's overkill.

> While doing so, I have observed few cleanup changes, added those in
> misc_cleanup.patch.

Committed those.

> While re-basing my partitionwise aggregate changes, I observed that when we
> want to create partial aggregation paths for a child partition, we don't
> need to add Gather or Gather Merge on top of it as we first want to append
> them all and then want to stick a gather on it. So it will be better to have
> that code part in a separate function so that we can call it from required
> places.
>
> I have attached patch (create_non_partial_paths.patch) for it including all
> above fix.

I don't like that very much.  For one thing, the name
create_non_partial_paths() is not very descriptive at all.  For
another thing, it somewhat renders add_paths_to_partial_grouping_rel()
a misnomer, as that function then adds only partial paths.  I think
what you should just do is have the main patch add a test for
rel->reloptkind == RELOPT_UPPER_REL; if true, add the Gather paths; if
not, skip it.  Then it will be skipped for RELOPT_OTHER_UPPER_REL
which is what we want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-26 Thread Robert Haas

On Mon, Feb 26, 2018 at 6:38 AM, Jeevan Chalke
 wrote:
> One idea I thought about is to memcpy the struct once we have set all
> required fields for grouped_rel so that we don't have to do similar stuff
> for partially_grouped_rel.

I think that would be a poor idea.  We want to copy a few specific
fields, not everything, and copying those fields is cheap, because
they are just simple assignment statements.  I think memcpy()'ing the
whole structure would be using a sledgehammer to solve a problem for
which a scalpel is more suited.

> Paths in pathlist are added by add_path(). Though we have paths is pathlist
> is sorted with the cheapest total path, we generally use
> RelOptInfo->cheapest_total_path instead of using first entry, unlike partial
> paths. But here you use the first entry like partial paths case. Will it
> better to use cheapest total path from partially_grouped_rel? This will
> require calling set_cheapest on partially_grouped_rel before we call this
> function.

Hmm, I guess that seems like a reasonable approach, although I am not
sure it matters much either way.

> Attached top-up patch doing this along with few indentation fixes.

I don't see much point to the change in generate_gather_paths -- that
line is only 77 characters long.

Committed after incorporating your other fixes and updating the
optimizer README.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-26 Thread Jeevan Chalke

Hi Robert,

On Fri, Feb 23, 2018 at 2:53 AM, Robert Haas  wrote:

> On Thu, Feb 8, 2018 at 8:05 AM, Jeevan Chalke
>  wrote:
> > In this attached version, I have rebased my changes over new design of
> > partially_grouped_rel. The preparatory changes of adding
> > partially_grouped_rel are in 0001.
>
> I spent today hacking in 0001; results attached.  The big change from
> your version is that this now uses generate_gather_paths() to add
> Gather/Gather Merge nodes (except in the case where we sort by group
> pathkeys and then Gather Merge) rather than keeping all of the bespoke
> code.  That turned up to be a bit less elegant than I would have liked
> -- I had to an override_rows argument to generate_gather_paths to make
> it work.  But overall I think this is still a big improvement, since
> it lets us share code instead of duplicating it.  Also, it potentially
> lets us add partially-aggregated but non-parallel paths into
> partially_grouped_rel->pathlist and that should Just Work; they will
> get the Finalize Aggregate step but not the Gather.  With your
> arrangement that wouldn't work.
>
> Please review/test.
>

I have reviewed and tested the patch and here are my couple of points:

 /*
- * If the input rel belongs to a single FDW, so does the grouped rel.
+ * If the input rel belongs to a single FDW, so does the grouped rel.
Same
+ * for the partially_grouped_rel.
  */
 grouped_rel->serverid = input_rel->serverid;
 grouped_rel->userid = input_rel->userid;
 grouped_rel->useridiscurrent = input_rel->useridiscurrent;
 grouped_rel->fdwroutine = input_rel->fdwroutine;
+partially_grouped_rel->serverid = input_rel->serverid;
+partially_grouped_rel->userid = input_rel->userid;
+partially_grouped_rel->useridiscurrent = input_rel->useridiscurrent;
+partially_grouped_rel->fdwroutine = input_rel->fdwroutine;

In my earlier mail where I have posted a patch for this partially grouped
rel changes, I forgot to put my question on this.
I was unclear about above changes and thus passed grouped_rel whenever we
wanted to work on partially_grouped_rel to fetch relevant details.

One idea I thought about is to memcpy the struct once we have set all
required fields for grouped_rel so that we don't have to do similar stuff
for partially_grouped_rel.

---

+ * Insert a Sort node, if required.  But there's no point in
+ * sorting anything but the cheapest path.
  */
-if (root->group_pathkeys)
+if (!pathkeys_contained_in(root->group_pathkeys,
path->pathkeys))
+{
+if (path != linitial(partially_grouped_rel->pathlist))
+continue;

Paths in pathlist are added by add_path(). Though we have paths is pathlist
is sorted with the cheapest total path, we generally use
RelOptInfo->cheapest_total_path instead of using first entry, unlike
partial paths. But here you use the first entry like partial paths case.
Will it better to use cheapest total path from partially_grouped_rel? This
will require calling set_cheapest on partially_grouped_rel before we call
this function.

Attached top-up patch doing this along with few indentation fixes.

Rest of the changes look good to me.

Once this gets in, I will re-base my other patches accordingly.

And, thanks for committing 0006.

Thanks


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1c792a0..f6b0208 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2453,7 +2453,8 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * we must do something.)
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+	  bool override_rows)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e4f9bd4..e8f6cc5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6101,7 +6101,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 			 */
 			if (!pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
 			{
-if (path != linitial(partially_grouped_rel->pathlist))
+if (path != partially_grouped_rel->cheapest_total_path)
 	continue;
 path = (Path *) create_sort_path(root,
  grouped_rel,
@@ -6186,9 +6186,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		 */
 		if (partially_grouped_rel->pathlist)
 		{
-			Path	   *path;
-
-			path = (Path *)

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-22 Thread Robert Haas

On Thu, Feb 8, 2018 at 8:05 AM, Jeevan Chalke
 wrote:
> In this attached version, I have rebased my changes over new design of
> partially_grouped_rel. The preparatory changes of adding
> partially_grouped_rel are in 0001.

I spent today hacking in 0001; results attached.  The big change from
your version is that this now uses generate_gather_paths() to add
Gather/Gather Merge nodes (except in the case where we sort by group
pathkeys and then Gather Merge) rather than keeping all of the bespoke
code.  That turned up to be a bit less elegant than I would have liked
-- I had to an override_rows argument to generate_gather_paths to make
it work.  But overall I think this is still a big improvement, since
it lets us share code instead of duplicating it.  Also, it potentially
lets us add partially-aggregated but non-parallel paths into
partially_grouped_rel->pathlist and that should Just Work; they will
get the Finalize Aggregate step but not the Gather.  With your
arrangement that wouldn't work.

Please review/test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


partially-grouped-rel-rmh.patch
Description: Binary data

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-21 Thread Robert Haas

On Thu, Feb 8, 2018 at 8:05 AM, Jeevan Chalke
 wrote:
> 0003 - 0006 are refactoring patches as before.

I have committed 0006 with some modifications.  In particular, [1] I
revised the comments and formatting; [2] I made cost_merge_append()
add cpu_tuple_cost * APPEND_CPU_COST_MULTIPLIER in lieu of, rather
than in addition to, cpu_operator_cost; and [3] I modified the
regression test so that the overall plan shape didn't change.

[2] was proposed upthread, but not adopted.  I had the same thought
while reading the patch (having forgotten the previous discussion) and
that seemed like a good enough reason to do it according to the
previous proposal.  If there is a good reason to think MergeAppend
needs that extra cost increment to be fairly-costed, I don't see it on
this thread.

[3] was also remarked upon upthread -- Ashutosh mentioned that the
change in plan shape was "sad" but there was no further discussion of
the matter.  I also found it sad; hence the change.  This is, by the
way, an interesting illustration of how partition-wise join could
conceivably lose.  Up until now I've thought that it seemed to be a
slam dunk to always win or at least break even, but if you've got a
relatively unselective join, such that the output is much larger than
either input, then doing the join partition-wise means putting all of
the output rows through an Append node, whereas doing it the normal
way means putting only the input rows through Append nodes.  If the
smaller number of rows being joined at one time doesn't help -- e.g.
all of the inner rows across all partitions fit in a tiny little hash
table -- then we're just feeding more rows through the Append for no
gain.  Not a common case, perhaps, but not impossible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-18 Thread Ashutosh Bapat

Commit 2fb1abaeb016aeb45b9e6d0b81b7a7e92bb251b9, changed
enable_partition_wise_join to enable_partitionwise_join. This patch
too should use enable_partitionwise_agg instead of
enable_partition_wise_agg.

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-14 Thread Jeevan Chalke

On Wed, Feb 14, 2018 at 12:17 PM, Rafia Sabih 
wrote:

> On Tue, Feb 13, 2018 at 6:21 PM, Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
>
>>
>> I see that partition-wise aggregate plan too uses parallel index, am I
>> missing something?
>>
>>
> You're right, I missed that, oops.
>
>>
>>> Q18 takes some 390 secs with patch and some 147 secs without it.
>>>
>>
>> This looks strange. This patch set does not touch parallel or seq scan as
>> such. I am not sure why this is happening. All these three queries explain
>> plan shows much higher execution time for parallel/seq scan.
>>
>> Yeah strange it is.
>

Off-list I have asked Rafia to provide me the perf machine access where she
is doing this bench-marking to see what's going wrong.
Thanks Rafia for the details.

What I have observed that, there are two sources, one with HEAD and other
with HEAD+PWA. However the configuration switches were different. Sources
with HEAD+PWA has CFLAGS="-ggdb3 -O0" CXXFLAGS="-ggdb3 -O0" flags in
addition with other sources. i.e. HEAD+PWA is configured with
debugging/optimization enabled which account for the slowness.

I have run EXPLAIN for these three queries on both the sources having
exactly same configuration switches and I don't find any slowness with PWA
patch-set.

Thus, it will be good if you re-run the benchmark by keeping configuration
switches same on both the sources and share the results.

Thanks

> However, do you see similar behaviour with patches applied,
>> "enable_partition_wise_agg = on" and "enable_partition_wise_agg = off" ?
>>
>
> I tried that for query 18, with patch and  enable_partition_wise_agg =
> off, query completes in some 270 secs. You may find the explain analyse
> output for it in the attached file. I noticed that on head the query plan
> had parallel hash join however with patch and no partition-wise agg it is
> using nested loop joins. This might be the issue.
>
>>
>> Also, does rest of the queries perform better with partition-wise
>> aggregates?
>>
>>
> As far as this setting goes, there wasn't any other query using
> partition-wise-agg, so, no.
>
> BTW, just an FYI, this experiment is on scale factor 20.
>
> --
> Regards,
> Rafia Sabih
> EnterpriseDB: http://www.enterprisedb.com/
>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-13 Thread Rafia Sabih

On Tue, Feb 13, 2018 at 6:21 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
> I see that partition-wise aggregate plan too uses parallel index, am I
> missing something?
>
>
You're right, I missed that, oops.

>
>> Q18 takes some 390 secs with patch and some 147 secs without it.
>>
>
> This looks strange. This patch set does not touch parallel or seq scan as
> such. I am not sure why this is happening. All these three queries explain
> plan shows much higher execution time for parallel/seq scan.
>
> Yeah strange it is.

> However, do you see similar behaviour with patches applied,
> "enable_partition_wise_agg = on" and "enable_partition_wise_agg = off" ?
>

I tried that for query 18, with patch and  enable_partition_wise_agg = off,
query completes in some 270 secs. You may find the explain analyse output
for it in the attached file. I noticed that on head the query plan had
parallel hash join however with patch and no partition-wise agg it is using
nested loop joins. This might be the issue.

>
> Also, does rest of the queries perform better with partition-wise
> aggregates?
>
>
As far as this setting goes, there wasn't any other query using
partition-wise-agg, so, no.

BTW, just an FYI, this experiment is on scale factor 20.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


18_pwa_off.out
Description: Binary data

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-13 Thread Jeevan Chalke

On Tue, Feb 13, 2018 at 12:37 PM, Rafia Sabih 
wrote:

>
> I was testing this patch for TPC-H benchmarking and came across following
> results,
>

Thanks Rafia for testing this with TPC-H benchmarking.


>
> Q1 completes in 229 secs with patch and in 66 secs without it. It looks
> like with this patch the time of parallel seq scan itself is elevated for
> some of the partitions. Notice for partitions, lineitem_3, lineitem_7,
> lineitem_10, and linietem_5 it is some 13 secs which was somewhere around 5
> secs on head.
>

> Q6 completes in some 7 secs with patch and it takes 4 secs without it.
> This is mainly caused because with the new parallel append, the parallel
> operator below it (parallel index scan in this case) is not used, however,
> on head it was the append of all the parallel index scans, which was saving
> quite some time.
>

I see that partition-wise aggregate plan too uses parallel index, am I
missing something?


>
> Q18 takes some 390 secs with patch and some 147 secs without it.
>

This looks strange. This patch set does not touch parallel or seq scan as
such. I am not sure why this is happening. All these three queries explain
plan shows much higher execution time for parallel/seq scan.

However, do you see similar behaviour with patches applied,
"enable_partition_wise_agg = on" and "enable_partition_wise_agg = off" ?

Also, does rest of the queries perform better with partition-wise
aggregates?


>
> The experimental setup for these tests is as follows,
> work_mem = 500MB
> shared_buffers = 10GB
> effective_cache_size = 4GB
> seq_page_cost = random+page_cost = 0.01
> enable_partition_wise_join = off
>
> Partitioning info:
> Total 10 partitions on tables - lineitem and orders each with partitioning
> key being l_orderkey and o_orderkey respectively.
>
> Please find the attached file for explain analyse outputs of each of the
> reported query.
> --
> Regards,
> Rafia Sabih
> EnterpriseDB: http://www.enterprisedb.com/
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-08 Thread Jeevan Chalke

Hi,

In this attached version, I have rebased my changes over new design of
partially_grouped_rel. The preparatory changes of adding
partially_grouped_rel are in 0001.

Also to minimize finalization code duplication, I have refactored them into
two separate functions, finalize_sorted_partial_agg_path() and
finalize_hashed_partial_agg_path(). I need to create these two functions as
current path creation order in like,
Sort Agg Path
Sort Agg Path - Parallel Aware (Finalization needed here)
Hash Agg Path
Hash Agg Path - Parallel Aware (Finalization needed here)
And if we club those finalizations together, then path creation order will
be changed and it may result in the existing plan changes.
Let me know if that's OK, I will merge them together as they are distinct
anyways. These changes are part of 0002.

0003 - 0006 are refactoring patches as before.

0007 is the main patch per new design. I have removed
create_partition_agg_paths() altogether as finalization code is reused.
Also, renamed preferFullAgg with forcePartialAgg as we forcefully needed a
partial path from nested level if the parent is doing a partial
aggregation. add_single_path_to_append_rel() is no more exists and also
there is no need to pass OtherUpperPathExtraData to
add_paths_to_append_rel().

0008 - 0009, testcase and postgres_fdw changes.

Please have a look at new changes and let me know if I missed any.

Thanks

On Fri, Feb 2, 2018 at 7:29 PM, Robert Haas  wrote:

> On Fri, Feb 2, 2018 at 8:25 AM, Jeevan Chalke
>  wrote:
> >> The problem is that create_partition_agg_paths() is doing *exactly*
> >> same thing that add_paths_to_grouping_rel() is already doing inside
> >> the blocks that say if (grouped_rel->partial_pathlist).  We don't need
> >> two copies of that code.  Both of those places except to take a
> >> partial path that has been partially aggregated and produce a
> >> non-partial path that is fully aggregated.  We do not need or want two
> >> copies of that code.
> >
> > OK. Got it.
> >
> > Will try to find a common place for them and will also check how it goes
> > with your suggested design change.
> >
> >> Here's another way to look at it.  We have four kinds of things.
> >>
> >> 1. Partially aggregated partial paths
> >> 2. Partially aggregated non-partial paths
> >> 3. Fully aggregated partial paths
> >> 4. Fully aggregated non-partial paths
>
> So in the new scheme I'm proposing, you've got a partially_grouped_rel
> and a grouped_rel.  So all paths of type #1 go into
> partially_grouped_rel->partial_pathlist, paths of type #2 go into
> partially_grouped_rel->pathlist, type #3 (if we have any) goes into
> grouped_rel->partial_pathlist, and type #4 goes into
> grouped_rel->pathlist.
>
> > add_paths_to_append_rel() -> generate_mergeappend_paths() does not
> consider
> > partial_pathlist. Thus we will never see MergeAppend over parallel scan
> > given by partial_pathlist. And thus plan like:
> > -> Gather Merge
> >   -> MergeAppend
> > is not possible with current HEAD.
> >
> > Are you suggesting we should implement that here? I think that itself is
> a
> > separate task.
>
> Oh, I didn't realize that wasn't working already.  I agree that it's a
> separate task from this patch, but it's really too bad that it doesn't
> already work.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

partition-wise-agg-v13.tar.gz
Description: GNU Zip compressed data

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-02 Thread Robert Haas

On Fri, Feb 2, 2018 at 8:25 AM, Jeevan Chalke
 wrote:
>> The problem is that create_partition_agg_paths() is doing *exactly*
>> same thing that add_paths_to_grouping_rel() is already doing inside
>> the blocks that say if (grouped_rel->partial_pathlist).  We don't need
>> two copies of that code.  Both of those places except to take a
>> partial path that has been partially aggregated and produce a
>> non-partial path that is fully aggregated.  We do not need or want two
>> copies of that code.
>
> OK. Got it.
>
> Will try to find a common place for them and will also check how it goes
> with your suggested design change.
>
>> Here's another way to look at it.  We have four kinds of things.
>>
>> 1. Partially aggregated partial paths
>> 2. Partially aggregated non-partial paths
>> 3. Fully aggregated partial paths
>> 4. Fully aggregated non-partial paths

So in the new scheme I'm proposing, you've got a partially_grouped_rel
and a grouped_rel.  So all paths of type #1 go into
partially_grouped_rel->partial_pathlist, paths of type #2 go into
partially_grouped_rel->pathlist, type #3 (if we have any) goes into
grouped_rel->partial_pathlist, and type #4 goes into
grouped_rel->pathlist.

> add_paths_to_append_rel() -> generate_mergeappend_paths() does not consider
> partial_pathlist. Thus we will never see MergeAppend over parallel scan
> given by partial_pathlist. And thus plan like:
> -> Gather Merge
>   -> MergeAppend
> is not possible with current HEAD.
>
> Are you suggesting we should implement that here? I think that itself is a
> separate task.

Oh, I didn't realize that wasn't working already.  I agree that it's a
separate task from this patch, but it's really too bad that it doesn't
already work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-02 Thread Jeevan Chalke

On Fri, Feb 2, 2018 at 1:41 AM, Robert Haas  wrote:

> On Thu, Feb 1, 2018 at 8:59 AM, Jeevan Chalke
>  wrote:
> > I wrote a patch for this (on current HEAD) and attached separately here.
> > Please have a look.
>
> Yes, this is approximately what I had in mind, though it needs more
> work (e.g. it doesn't removing the clearing of the grouped_rel's
> partial_pathlist, which should no longer be necessary; also, it needs
> substantial comment updates).
>

That was just a quick patch to make sure is this what you meant.
Yes, it need some more work as you suggested and comment updates.


>
> >> 1. Parallel partition-wise aggregate, grouping key doesn't contain
> >> partition key.  This should just be a matter of adding additional
> >> Append paths to partially_grouped_rel->partial_pathlist.  The existing
> >> code already knows how to stick a Gather and FinalizeAggregate step on
> >> top of that, and I don't see why that logic would need any
> >> modification or addition.  An Append of child partial-grouping paths
> >> should be producing the same output as a partial grouping of an
> >> Append, except that the former case might produce more separate groups
> >> that need to be merged; but that should be OK: we can just throw all
> >> the paths into the same path list and let the cheapest one win.
> >
> > For any partial aggregation we need to add finalization step after we are
> > done with the APPEND i.e. post add_paths_to_append_rel(). Given that we
> need
> > to replicate the logic of sticking Gather and FinalizeAggregate step at
> > later stage. This is what exactly done in create_partition_agg_paths().
> > Am I missing something here?
>
> The problem is that create_partition_agg_paths() is doing *exactly*
> same thing that add_paths_to_grouping_rel() is already doing inside
> the blocks that say if (grouped_rel->partial_pathlist).  We don't need
> two copies of that code.  Both of those places except to take a
> partial path that has been partially aggregated and produce a
> non-partial path that is fully aggregated.  We do not need or want two
> copies of that code.
>

OK. Got it.

Will try to find a common place for them and will also check how it goes
with your suggested design change.


>
> Here's another way to look at it.  We have four kinds of things.
>
> 1. Partially aggregated partial paths
> 2. Partially aggregated non-partial paths
> 3. Fully aggregated partial paths
> 4. Fully aggregated non-partial paths
>
> The current code only ever generates paths of type #1 and #4; this
> patch will add paths of type #2 as well, and maybe also type #3.  But
> the way you've got it, the existing paths of type #1 go into the
> grouping_rel's partial_pathlist, and the new paths of type #1 go into
> the OtherUpperPathExtraData's partial_paths list.  Maybe there's a
> good reason why we should keep them separate, but I'm inclined to
> think they should all be going into the same list.
>

The new paths are specific to partition-wise aggregates and I thought
better to keep them separately without interfering with grouped_rel
pathlist/partial_pathlist. And as you said, I didn't find a better place
that its own structure.


> >> Overall, what I'm trying to argue for here is making this feature look
> >> less like its own separate thing and more like part of the general
> >> mechanism we've already got: partial paths would turn into regular
> >> paths via generate_gather_paths(), and partially aggregated paths
> >> would turn into fully aggregated paths by adding FinalizeAggregate.
> >> The existing special case that allows us to build a non-partial, fully
> >> aggregated path from a partial, partially-aggregated path would be
> >> preserved.
> >>
> >> I think this would probably eliminate some other problems I see in the
> >> existing design as well.  For example, create_partition_agg_paths
> >> doesn't consider using Gather Merge, but that might be a win.
> >
> > Append path is always non-sorted and has no pathkeys. Thus Gather Merge
> over
> > an Append path seems infeasible, isn't it?
>
> We currently never generate an Append path with pathkeys, but we do
> generate MergeAppend paths with pathkeys, as in the following example:
>
> rhaas=# create table foo (a int, b text) partition by range (a);
> CREATE TABLE
> rhaas=# create index on foo (a);
> CREATE INDEX
> rhaas=# create table foo1 partition of foo for values from (0) to
> (100);
> CREATE TABLE
> rhaas=# create table foo2 partition of foo for values from (100)
> to (200);
> CREATE TABLE
> rhaas=# select * from foo foo order by a;
>  a | b
> ---+---
> (0 rows)
> rhaas=# explain select * from foo foo order by a;
>QUERY PLAN
> 
> 
>  Merge Append  (cost=0.32..145.47 rows=2540 width=36)
>Sort Key: foo.a
>->  Index Scan using foo1_a_idx on foo1 foo

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-01 Thread Robert Haas

On Thu, Feb 1, 2018 at 8:59 AM, Jeevan Chalke
 wrote:
> I wrote a patch for this (on current HEAD) and attached separately here.
> Please have a look.

Yes, this is approximately what I had in mind, though it needs more
work (e.g. it doesn't removing the clearing of the grouped_rel's
partial_pathlist, which should no longer be necessary; also, it needs
substantial comment updates).

>> 1. Parallel partition-wise aggregate, grouping key doesn't contain
>> partition key.  This should just be a matter of adding additional
>> Append paths to partially_grouped_rel->partial_pathlist.  The existing
>> code already knows how to stick a Gather and FinalizeAggregate step on
>> top of that, and I don't see why that logic would need any
>> modification or addition.  An Append of child partial-grouping paths
>> should be producing the same output as a partial grouping of an
>> Append, except that the former case might produce more separate groups
>> that need to be merged; but that should be OK: we can just throw all
>> the paths into the same path list and let the cheapest one win.
>
> For any partial aggregation we need to add finalization step after we are
> done with the APPEND i.e. post add_paths_to_append_rel(). Given that we need
> to replicate the logic of sticking Gather and FinalizeAggregate step at
> later stage. This is what exactly done in create_partition_agg_paths().
> Am I missing something here?

The problem is that create_partition_agg_paths() is doing *exactly*
same thing that add_paths_to_grouping_rel() is already doing inside
the blocks that say if (grouped_rel->partial_pathlist).  We don't need
two copies of that code.  Both of those places except to take a
partial path that has been partially aggregated and produce a
non-partial path that is fully aggregated.  We do not need or want two
copies of that code.

Here's another way to look at it.  We have four kinds of things.

1. Partially aggregated partial paths
2. Partially aggregated non-partial paths
3. Fully aggregated partial paths
4. Fully aggregated non-partial paths

The current code only ever generates paths of type #1 and #4; this
patch will add paths of type #2 as well, and maybe also type #3.  But
the way you've got it, the existing paths of type #1 go into the
grouping_rel's partial_pathlist, and the new paths of type #1 go into
the OtherUpperPathExtraData's partial_paths list.  Maybe there's a
good reason why we should keep them separate, but I'm inclined to
think they should all be going into the same list.

>> Overall, what I'm trying to argue for here is making this feature look
>> less like its own separate thing and more like part of the general
>> mechanism we've already got: partial paths would turn into regular
>> paths via generate_gather_paths(), and partially aggregated paths
>> would turn into fully aggregated paths by adding FinalizeAggregate.
>> The existing special case that allows us to build a non-partial, fully
>> aggregated path from a partial, partially-aggregated path would be
>> preserved.
>>
>> I think this would probably eliminate some other problems I see in the
>> existing design as well.  For example, create_partition_agg_paths
>> doesn't consider using Gather Merge, but that might be a win.
>
> Append path is always non-sorted and has no pathkeys. Thus Gather Merge over
> an Append path seems infeasible, isn't it?

We currently never generate an Append path with pathkeys, but we do
generate MergeAppend paths with pathkeys, as in the following example:

rhaas=# create table foo (a int, b text) partition by range (a);
CREATE TABLE
rhaas=# create index on foo (a);
CREATE INDEX
rhaas=# create table foo1 partition of foo for values from (0) to (100);
CREATE TABLE
rhaas=# create table foo2 partition of foo for values from (100)
to (200);
CREATE TABLE
rhaas=# select * from foo foo order by a;
 a | b
---+---
(0 rows)
rhaas=# explain select * from foo foo order by a;
   QUERY PLAN

 Merge Append  (cost=0.32..145.47 rows=2540 width=36)
   Sort Key: foo.a
   ->  Index Scan using foo1_a_idx on foo1 foo  (cost=0.15..63.20
rows=1270 width=36)
   ->  Index Scan using foo2_a_idx on foo2 foo_1  (cost=0.15..63.20
rows=1270 width=36)
(4 rows)

Actually, in this example, the MergeAppend could be safely converted
into an Append, because the partitions are in bound order, and
somebody already proposed a patch for that.

The point is that we want to be able to get plans like this:

Finalize GroupAggregate
-> Gather Merge
  -> MergeAppend
-> Partial GroupAggregate
  -> Parallel Index Scan on t1
-> Partial GroupAggregate
  -> Parallel Index Scan on t2
-> Partial GroupAggregate
  -> Parallel Index Scan on t3

If we only consider Gather, not Gather Merge, when turning a partially
aggregated partial path into a non-partial path, then we end up

Re: [HACKERS] Partition-wise aggregation/grouping

2018-02-01 Thread Jeevan Chalke

On Thu, Feb 1, 2018 at 1:11 AM, Robert Haas  wrote:

> On Mon, Jan 29, 2018 at 3:42 AM, Jeevan Chalke
>  wrote:
> > Attached new patch set and rebased it on latest HEAD.
>
> I strongly dislike add_single_path_to_append_rel.  It adds branches
> and complexity to code that is already very complex.  Most
> importantly, why are we adding paths to fields in
> OtherUpperPathExtraData *extra instead of adding them to the path list
> of some RelOptInfo?  If we had an appropriate RelOptInfo to which we
> could add the paths, then we could make this simpler.
>
> If I understand correctly, the reason you're doing it this way is
> because we have no place to put partially-aggregated, non-partial
> paths.  If we only needed to worry about the parallel query case, we
> could just build an append of partially-aggregated paths for each
> child and stick it into the grouped rel's partial pathlist, just as we
> already do for regular parallel aggregation.  There's no reason why
> add_paths_to_grouping_rel() needs to care about the difference a
> Partial Aggregate on top of whatever and an Append each branch of
> which is a Partial Aggregate on top of whatever.  However, this won't
> work for non-partial paths, because add_paths_to_grouping_rel() needs
> to put paths into the grouped rel's pathlist -- and we can't mix
> together partially-aggregated paths and fully-aggregated paths in the
> same path list.
>

Yes.


>
> But, really, the way we're using grouped_rel->partial_pathlist right
> now is an awful hack.  What I'm thinking we could do is introduce a
> new UpperRelationKind called UPPERREL_PARTIAL_GROUP_AGG, coming just
> before UPPERREL_GROUP_AGG.  Without partition-wise aggregate,
> partially_grouped_rel's pathlist would always be NIL, and its partial
> pathlist would be constructed using the logic in
> add_partial_paths_to_grouping_rel, which would need renaming.  Then,
> add_paths_to_grouping_rel would use paths from input_rel when doing
> non-parallel aggregation and paths from partially_grouped_rel when
> doing parallel aggregation.  This would eliminate the ugly
> grouped_rel->partial_pathlist = NIL assignment at the bottom of
> create_grouping_paths(), because the grouped_rel's partial_pathlist
> would never have been (bogusly) populated in the first place, and
> hence would not need to be reset.  All of these changes could be made
> via a preparatory patch.
>

I wrote a patch for this (on current HEAD) and attached separately here.
Please have a look.

I still not yet fully understand how we are going to pass those to the
add_paths_to_append_rel(). I need to look it more deeply though.


> Then the main patch needs to worry about four cases:
>
> 1. Parallel partition-wise aggregate, grouping key doesn't contain
> partition key.  This should just be a matter of adding additional
> Append paths to partially_grouped_rel->partial_pathlist.  The existing
> code already knows how to stick a Gather and FinalizeAggregate step on
> top of that, and I don't see why that logic would need any
> modification or addition.  An Append of child partial-grouping paths
> should be producing the same output as a partial grouping of an
> Append, except that the former case might produce more separate groups
> that need to be merged; but that should be OK: we can just throw all
> the paths into the same path list and let the cheapest one win.
>

For any partial aggregation we need to add finalization step after we are
done with the APPEND i.e. post add_paths_to_append_rel(). Given that we
need to replicate the logic of sticking Gather and FinalizeAggregate step
at later stage. This is what exactly done in create_partition_agg_paths().
Am I missing something here?


> 2. Parallel partition-wise aggregate, grouping key contains partition
> key.  For the most part, this is no different from case #1.  We won't
> have groups spanning different partitions in this case, but we might
> have groups spanning different workers, so we still need a
> FinalizeAggregate step.  As an exception, Gather -> Parallel Append ->
> [non-partial Aggregate path] would give us a way of doing aggregation
> in parallel without a separate Finalize step.  I'm not sure if we want
> to consider that to be in scope for this patch.  If we do, then we'd
> add the Parallel Append path to grouped_rel->partial_pathlist.  Then,
> we could stick Gather (Merge) on top if it to produce a path for
> grouped_rel->pathlist using generate_gather_paths(); alternatively, it
> can be used by upper planning steps -- something we currently can't
> ever make work with parallel aggregation.
>
> 3. Non-parallel partition-wise aggregate, grouping key contains
> partition key.  Build Append paths from the children of grouped_rel
> and add them to grouped_rel->pathlist.
>

Yes.


>
> 3. Non-parallel partition-wise aggregate, grouping key doesn't contain
> partition key.  Build Append paths from the children of
>

Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-31 Thread Robert Haas

On Mon, Jan 29, 2018 at 3:42 AM, Jeevan Chalke
 wrote:
> Attached new patch set and rebased it on latest HEAD.

I strongly dislike add_single_path_to_append_rel.  It adds branches
and complexity to code that is already very complex.  Most
importantly, why are we adding paths to fields in
OtherUpperPathExtraData *extra instead of adding them to the path list
of some RelOptInfo?  If we had an appropriate RelOptInfo to which we
could add the paths, then we could make this simpler.

If I understand correctly, the reason you're doing it this way is
because we have no place to put partially-aggregated, non-partial
paths.  If we only needed to worry about the parallel query case, we
could just build an append of partially-aggregated paths for each
child and stick it into the grouped rel's partial pathlist, just as we
already do for regular parallel aggregation.  There's no reason why
add_paths_to_grouping_rel() needs to care about the difference a
Partial Aggregate on top of whatever and an Append each branch of
which is a Partial Aggregate on top of whatever.  However, this won't
work for non-partial paths, because add_paths_to_grouping_rel() needs
to put paths into the grouped rel's pathlist -- and we can't mix
together partially-aggregated paths and fully-aggregated paths in the
same path list.

But, really, the way we're using grouped_rel->partial_pathlist right
now is an awful hack.  What I'm thinking we could do is introduce a
new UpperRelationKind called UPPERREL_PARTIAL_GROUP_AGG, coming just
before UPPERREL_GROUP_AGG.  Without partition-wise aggregate,
partially_grouped_rel's pathlist would always be NIL, and its partial
pathlist would be constructed using the logic in
add_partial_paths_to_grouping_rel, which would need renaming.  Then,
add_paths_to_grouping_rel would use paths from input_rel when doing
non-parallel aggregation and paths from partially_grouped_rel when
doing parallel aggregation.  This would eliminate the ugly
grouped_rel->partial_pathlist = NIL assignment at the bottom of
create_grouping_paths(), because the grouped_rel's partial_pathlist
would never have been (bogusly) populated in the first place, and
hence would not need to be reset.  All of these changes could be made
via a preparatory patch.

Then the main patch needs to worry about four cases:

1. Parallel partition-wise aggregate, grouping key doesn't contain
partition key.  This should just be a matter of adding additional
Append paths to partially_grouped_rel->partial_pathlist.  The existing
code already knows how to stick a Gather and FinalizeAggregate step on
top of that, and I don't see why that logic would need any
modification or addition.  An Append of child partial-grouping paths
should be producing the same output as a partial grouping of an
Append, except that the former case might produce more separate groups
that need to be merged; but that should be OK: we can just throw all
the paths into the same path list and let the cheapest one win.

2. Parallel partition-wise aggregate, grouping key contains partition
key.  For the most part, this is no different from case #1.  We won't
have groups spanning different partitions in this case, but we might
have groups spanning different workers, so we still need a
FinalizeAggregate step.  As an exception, Gather -> Parallel Append ->
[non-partial Aggregate path] would give us a way of doing aggregation
in parallel without a separate Finalize step.  I'm not sure if we want
to consider that to be in scope for this patch.  If we do, then we'd
add the Parallel Append path to grouped_rel->partial_pathlist.  Then,
we could stick Gather (Merge) on top if it to produce a path for
grouped_rel->pathlist using generate_gather_paths(); alternatively, it
can be used by upper planning steps -- something we currently can't
ever make work with parallel aggregation.

3. Non-parallel partition-wise aggregate, grouping key contains
partition key.  Build Append paths from the children of grouped_rel
and add them to grouped_rel->pathlist.

3. Non-parallel partition-wise aggregate, grouping key doesn't contain
partition key.  Build Append paths from the children of
partially_grouped_rel and add them to partially_grouped_rel->pathlist.
Also add code to generate paths for grouped_rel->pathlist by sticking
a FinalizeAggregate step on top of each path from
partially_grouped_rel->pathlist.

Overall, what I'm trying to argue for here is making this feature look
less like its own separate thing and more like part of the general
mechanism we've already got: partial paths would turn into regular
paths via generate_gather_paths(), and partially aggregated paths
would turn into fully aggregated paths by adding FinalizeAggregate.
The existing special case that allows us to build a non-partial, fully
aggregated path from a partial, partially-aggregated path would be
preserved.

I think this would probably eliminate some other problems I see in the
existing design as well.  For

Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-29 Thread Jeevan Chalke

On Sat, Jan 27, 2018 at 1:35 AM, Robert Haas  wrote:

> On Thu, Jan 18, 2018 at 8:55 AM, Jeevan Chalke
>  wrote:
> > Attached patch with other review points fixed.
>
> Committed 0001 and 0002 together, with some cosmetic changes,
> including fixing pgindent damage.


Thanks Robert.



>   Please pgindent your patches before
> submitting.
>

Sure, will take care of this.

Attached new patch set and rebased it on latest HEAD.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Thanks

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v12.tar.gz
Description: GNU Zip compressed data

Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-26 Thread Robert Haas

On Thu, Jan 18, 2018 at 8:55 AM, Jeevan Chalke
 wrote:
> Attached patch with other review points fixed.

Committed 0001 and 0002 together, with some cosmetic changes,
including fixing pgindent damage.  Please pgindent your patches before
submitting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-16 Thread Robert Haas

On Tue, Jan 16, 2018 at 3:56 AM, Jeevan Chalke
 wrote:
> I will make your suggested changes that is merge create_sort_agg_path() and
> create_hash_agg_path(). Will name that function as
> create_sort_and_hash_agg_paths().

I suggest add_paths_to_grouping_rel() and
add_partial_paths_to_grouping_rel(), similar to what commit
c44c47a773bd9073012935a29b0264d95920412c did with
add_paths_to_append_rel().

> Oops. My mistake. Missed. We should loop over the input rel's pathlist.
>
> Yep. With above change, the logic is very similar except
> (1) isPartialAgg/can_sort case creates the partial paths and
> (2) finalization step is not needed at this stage.

I'm not sure what you mean by #1.

> I think it can be done by passing a flag to create_sort_agg_path() (or new
> combo function) and making appropriate adjustments. Do you think addition of
> this new flag should go in re-factoring patch or main PWA patch?
> I think re-factoring patch.

I think the refactoring patch should move the existing code into a new
function without any changes, and then the main patch should add an
additional argument to that function that allows for either behavior.

By the way, I'm also a bit concerned about this:

+   /*
+* For full aggregation, we are done with the partial
paths.  Just
+* clear it out so that we don't try to create a
parallel plan over it.
+*/
+   grouped_rel->partial_pathlist = NIL;

I think that's being done for the same reason as mentioned at the
bottom of the current code for create_grouping_paths().  They are only
partially aggregated and wouldn't produce correct final results if
some other planning step -- create_ordered_paths, or the code that
sets up final_rel -- used them as if they had been fully agggregated.
I'm worried that there might be an analogous danger for partition-wise
aggregation -- that is, that the paths being inserted into the partial
pathlists of the aggregate child rels might get reused by some later
planning step which doesn't realize that the output they produce
doesn't quite match up with the rel to which they are attached.  You
may have already taken care of that problem somehow, but we should
make sure that it's fully correct and clearly commented.  I don't
immediately see why the isPartialAgg case should be any different from
the !isPartialAgg case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-16 Thread Jeevan Chalke

On Tue, Jan 16, 2018 at 3:41 AM, Robert Haas  wrote:

> On Thu, Jan 11, 2018 at 6:00 AM, Jeevan Chalke
>  wrote:
> >  Attached new set of patches adding this. Only patch 0007 (main patch)
> and
> > 0008 (testcase patch) has changed.
> >
> > Please have a look and let me know if I missed any.
>
> I spent a little time studying 0001 and 0002 today, as well as their
> relation with 0007.  I find the way that the refactoring has been done
> slightly odd.  With 0001 and 0002 applied, we end up with three
> functions for creating aggregate paths: create_partial_agg_path, which
> handles the partial-path case for both sort and hash;
> create_sort_agg_path, which handles the sort case for non-partial
> paths only; and create_hash_agg_path, which handles the hash case for
> non-partial paths only.  This leads to the following code in 0007:
>
> +   /* Full grouping paths */
> +
> +   if (try_parallel_aggregation)
> +   {
> +   Assert(extra->agg_partial_costs &&
> extra->agg_final_costs);
> +   create_partial_agg_path(root, input_rel,
> grouped_rel, target,
> +
>  partial_target, extra->agg_partial_costs,
> +
>  extra->agg_final_costs, gd, can_sort,
> +
>  can_hash, (List *) extra->havingQual);
> +   }
> +
> +   if (can_sort)
> +   create_sort_agg_path(root, input_rel,
> grouped_rel, target,
> +
> partial_target, agg_costs,
> +
> extra->agg_final_costs, gd, can_hash,
> +
> dNumGroups, (List *) extra->havingQual);
> +
> +   if (can_hash)
> +   create_hash_agg_path(root, input_rel,
> grouped_rel, target,
> +
> partial_target, agg_costs,
> +
> extra->agg_final_costs, gd, dNumGroups,
> +(List
> *) extra->havingQual);
>
> That looks strange -- you would expect to see either "sort" and "hash"
> cases here, or maybe "partial" and "non-partial", or maybe all four
> combinations, but seeing three things here looks surprising.  I think
> the solution is just to create a single function that does both the
> work of create_sort_agg_path and the work of create_hash_agg_path
> instead of having two separate functions.
>

In existing code (in create_grouping_paths()), I see following pattern:
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)

And thus, I have created three functions to match with existing pattern.

I will make your suggested changes that is merge create_sort_agg_path() and
create_hash_agg_path(). Will name that function as
create_sort_and_hash_agg_paths().


>
> A related thing that is also surprising is that 0007 manages to reuse
> create_partial_agg_path for both the isPartialAgg and non-isPartialAgg
> cases -- in fact, the calls appear to be identical, and could be
> hoisted out of the "if" statement


Yes. We can do that as well and I think it is better too.
I was just trying to preserve the existing pattern. So for PWA I chose:
if partialAgg
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)
else fullAgg
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)

But since, if (try_parallel_aggregation) case is exactly same, I will pull
that out of if..else.



> -- but create_sort_agg_path and
> create_hash_agg_path do not get reused.  I think you should see
> whether you can define the new combo function that can be used for
> both cases.  The logic looks very similar, and I'm wondering why it
> isn't more similar than it is; for instance, create_sort_agg_path
> loops over the input rel's pathlist, but the code for
> isPartialAgg/can_sort seems to consider only the cheapest path.  If
> this is correct, it needs a comment explaining it, but I don't see why
> it should be correct.
>

Oops. My mistake. Missed. We should loop over the input rel's pathlist.

Yep. With above change, the logic is very similar except
(1) isPartialAgg/can_sort case creates the partial paths and
(2) finalization step is not needed at this stage.

I think it can be done by passing a flag to create_sort_agg_path() (or new
combo function) and making appropriate adjustments. Do you think addition of
this new flag should go in re-factoring patch or main PWA patch?
I think re-factoring patch.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-11 Thread Jeevan Chalke

> On Thu, Dec 14, 2017 at 4:01 PM, Ashutosh Bapat <
> ashutosh.ba...@enterprisedb.com> wrote:
>
>>
>> +
>> +-- Partial aggregation as GROUP BY clause does not match with PARTITION
>> KEY
>> +EXPLAIN (COSTS OFF)
>> +SELECT b, sum(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1, 2, 3;
>> +   QUERY PLAN
>> +
>> + Sort
>> +   Sort Key: pagg_tab_p1.b, (sum(pagg_tab_p1.a)), (count(*))
>> +   ->  Finalize GroupAggregate
>> + Group Key: pagg_tab_p1.b
>> + ->  Sort
>> +   Sort Key: pagg_tab_p1.b
>> +   ->  Append
>> + ->  Partial HashAggregate
>> +   Group Key: pagg_tab_p1.b
>> +   ->  Seq Scan on pagg_tab_p1
>> + ->  Partial HashAggregate
>> +   Group Key: pagg_tab_p2_s1.b
>> +   ->  Append
>> + ->  Seq Scan on pagg_tab_p2_s1
>> + ->  Seq Scan on pagg_tab_p2_s2
>> + ->  Partial HashAggregate
>> +   Group Key: pagg_tab_p3_s1.b
>> +   ->  Append
>> + ->  Seq Scan on pagg_tab_p3_s1
>> + ->  Seq Scan on pagg_tab_p3_s2
>> +(20 rows)
>>
>> Why aren't we seeing partial aggregation paths for level two and below
>> partitions?
>>
>
>>
> In this version of the patch I have not recursed into next level.
> Will work on it and submit changes in the next patch-set.
>

 Attached new set of patches adding this. Only patch 0007 (main patch) and
0008 (testcase patch) has changed.

>
Please have a look and let me know if I missed any.

Thanks
-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v10.tar.gz
Description: GNU Zip compressed data

Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-02 Thread Jeevan Chalke

On Thu, Dec 14, 2017 at 4:01 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
> Sure no problem. Take your time. Here's set of comments for 0008. That
> ends the first read of all the patches (2nd reading for the core
> changes)
>
> +-- Also, disable parallel paths.
> +SET max_parallel_workers_per_gather TO 0;
>
> If you enable parallel aggregation for smaller data partition-wise
> aggregation
> paths won't be chosen. I think this is the reason why you are disabling
> parallel query. But we should probably explain that in a comment. Better
> if we
> could come up testcases without disabling parallel query. Since parallel
> append
> is now committed, may be it can help.
>

Removed.


>
> +
> +-- Check with multiple columns in GROUP BY, order in target-list is
> reversed
> +EXPLAIN (COSTS OFF)
> +SELECT c, a, count(*) FROM pagg_tab GROUP BY a, c;
> +   QUERY PLAN
> +-
> + Append
> +   ->  HashAggregate
> + Group Key: pagg_tab_p1.a, pagg_tab_p1.c
> + ->  Seq Scan on pagg_tab_p1
> [ ... clipped ... ]
> +(10 rows)
>
> Why do we need this testcase?
>

Rajkumar, earlier reported one issue when order in the target list is
reversed. Fix then required redesigning the GROUP key matching algorithm.
So I think it will be good to have this testcase.


> +
> +SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
> + c | sum
> +---+-
> +(0 rows)
>
> I think we also need a case when the child input relations are marked
> dummy and
> then the parent is marked dummy. Just use a condition with partkey =  of
> list bounds>.
>

I have added the testcase for that. But don't you think both are same. When
all input children are dummy, parent too marked as dummy, i.e. input
relation is itself dummy.
Am I missing something here?


>
> +
> +-- Check with SORTED paths. Disable hashagg to get group aggregate
>
> Suggest: "Test GroupAggregate paths by disabling hash aggregates."
>
> +-- When GROUP BY clause matches with PARTITION KEY.
>
> I don't think we need "with", and just extend the same sentence with
> "complete
> aggregation is performed for each partition"
>
> +-- Should choose full partition-wise aggregation path
>
> suggest: "Should choose full partition-wise GroupAggregate plan", but I
> guess
> with the above suggestion, this sentence is not needed.
>
> +
> +-- When GROUP BY clause not matches with PARTITION KEY.
> +-- Should choose partial partition-wise aggregation path
>
> Similar suggestions as above.
>
> +-- No aggregates, but still able to perform partition-wise aggregates
>
> That's a funny construction. May be "Test partition-wise grouping without
> any
> aggregates".
>
> We should try some output for this query.
>
> +
> +EXPLAIN (COSTS OFF)
> +SELECT a FROM pagg_tab GROUP BY a ORDER BY 1;
> +   QUERY PLAN
> +-
> + Group
> +   Group Key: pagg_tab_p1.a
> +   ->  Merge Append
> + Sort Key: pagg_tab_p1.a
> + ->  Group
> +   Group Key: pagg_tab_p1.a
> +   ->  Sort
> + Sort Key: pagg_tab_p1.a
> + ->  Seq Scan on pagg_tab_p1
> [ ... clipped ... ]
> +(19 rows)
>
> It's strange that we do not annotate partial grouping as Partial. Does not
> look
> like a bug in your patch. Do we get similar output with parallel grouping?
>
>
Its partial aggregation only which is finalize at the top.
But since tere are no aggregates involved we create a GROUP path and not an
AGG path. GROUP path has no partial annotations.
Yes, we see similar plan for parallel grouping too.

+
> +-- ORDERED SET within aggregate
> +EXPLAIN (COSTS OFF)
> +SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2;
> +   QUERY PLAN
> +
> + Sort
> +   Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b ORDER BY pagg_tab_p1.a))
> +   ->  GroupAggregate
> + Group Key: pagg_tab_p1.a
> + ->  Sort
> +   Sort Key: pagg_tab_p1.a
> +   ->  Append
> + ->  Seq Scan on pagg_tab_p1
> + ->  Seq Scan on pagg_tab_p2
> + ->  Seq Scan on pagg_tab_p3
> +(10 rows)
>
> pagg_tab is partitioned by column c. So, not having it in GROUP BY
> itself might produce this plan if Partial parallel aggregation is
> expensive.
> When testing negative tests like this GROUP BY should always have the
> partition
> key.
>

I deliberatly wanted to test when GROUP BY key does not match with the
partition key so that partial aggregation is forced. But then we do have
some limitiation to perform the aggregation in partial i.e.  ORDERED SET
cannot be done in partial mode, this is the test to excerisize that code
path.


> In case of full aggregation, since all the rows that belong to the same
> group
> come from the same partition, having an ORDER BY doesn't make any

Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-01 Thread Jeevan Chalke

On Tue, Dec 12, 2017 at 3:43 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Here are review comments for 0009
> Only full aggregation is pushed on the remote server.
>
> I think we can live with that for a while but we need to be able to push
> down
> partial aggregates to the foreign server. I agree that it needs some
> infrastructure to serialized and deserialize the partial aggregate values,
> support unpartitioned aggregation first and then work on partitioned
> aggregates. That is definitely a separate piece of work.
>

Yep.


>
> +CREATE TABLE pagg_tab_p3 (a int, b int, c text);
>
> Like partition-wise join testcases please use LIKE so that it's easy to
> change
> the table schema if required.
>

Done.


>
> +INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30,
> 'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
> +INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30,
> 'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i %
> 30) >= 10;
> +INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30,
> 'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i %
> 30) >= 20;
>
> We have to do this because INSERT tuple routing to a foreign partition is
> not
> supported right now.


Yes.


> Somebody has to remember to change this to a single
> statement once that's done.
>

I don't know how we can keep track of it.


>
> +ANALYZE fpagg_tab_p1;
> +ANALYZE fpagg_tab_p2;
> +ANALYZE fpagg_tab_p3;
>
> I thought this is not needed. When you ANALYZE the partitioned table, it
> would
> analyze the partitions as well. But I see that partition-wise join is also
> ANALYZING the foreign partitions separately. When I ran ANALYZE on a
> partitioned table with foreign partitions, statistics for only the local
> tables
> (partitioned and partitions) was updated. Of course this is separate work,
> but
> probably needs to be fixed.
>

Hmm.


>
> +-- When GROUP BY clause matches with PARTITION KEY.
> +-- Plan when partition-wise-agg is disabled
>
> s/when/with/
>
> +-- Plan when partition-wise-agg is enabled
>
> s/when/with/
>

Done.


>
> +   ->  Append
>
> Just like ForeignScan node's Relations tell what kind of ForeignScan this
> is,
> may be we should annotate Append to tell whether the children are joins,
> aggregates or relation scans. That might be helpful. Of course as another
> patch.
>

OK.


>
> +SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING
> avg(b) < 25 ORDER BY 1;
> + a  | sum  | min | count
> ++--+-+---
> +  0 | 2000 |   0 |   100
> +  1 | 2100 |   1 |   100
> [ ... clipped ...]
> + 23 | 2300 |   3 |   100
> + 24 | 2400 |   4 |   100
> +(15 rows)
>
> May be we want to reduce the number of rows to a few by using a stricter
> HAVING
> clause?
>

Done.


>
> +
> +-- When GROUP BY clause not matches with PARTITION KEY.
>
> ... clause does not match ...
>

Done.


>
> +EXPLAIN (VERBOSE, COSTS OFF)
> +SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
> sum(a) < 800 ORDER BY 1;
> +
> QUERY PLAN
> +---
> 
> --
> + Sort
> +   Output: fpagg_tab_p1.b, (avg(fpagg_tab_p1.a)),
> (max(fpagg_tab_p1.a)), (count(*))
> +   ->  Partial HashAggregate
> [ ... clipped ... ]
> + Output: fpagg_tab_p3.b, PARTIAL
> avg(fpagg_tab_p3.a), PARTIAL max(fpagg_tab_p3.a), PARTIAL count(*),
> PARTIAL sum(fpagg_tab_p3.a)
> + Group Key: fpagg_tab_p3.b
> + ->  Foreign Scan on public.fpagg_tab_p3
> +   Output: fpagg_tab_p3.b, fpagg_tab_p3.a
> +   Remote SQL: SELECT a, b FROM public.pagg_tab_p3
> +(26 rows)
>
> I think we interested in overall shape of the plan and not the details of
> Remote SQL etc. So, may be turn off VERBOSE. This comment applies to an
> earlier
> plan with enable_partition_wise_agg = false;
>

OK. Removed VERBOSE from all the queries as we are interested in overall
shape of the plan.


>
> +
> +SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
> sum(a) < 800 ORDER BY 1;
> + b  | avg | max | count
> ++-+-+---
> +  0 | 10. |  20 |60
> +  1 | 11. |  21 |60
> [... clipped ...]
> + 42 | 12. |  22 |60
> + 43 | 13. |  23 |60
> +(20 rows)
>
> Since the aggregates were not pushed down, I doubt we should be testing the
> output. But this test is good to check partial aggregates over foreign
> partition scans, which we don't have in postgres_fdw.sql I think. So, may
> be
> add it as a separate patch?
>

Agree. Removed SELECT query. EXPLAIN is enough here.


>
> Can you please add a test where we reference a whole-row; that usually has
> troubles.
>

Added.


>
> -if (root->hasHavingQual &&

Re: [HACKERS] Partition-wise aggregation/grouping

2018-01-01 Thread Jeevan Chalke

Attached patchset with all the review comments reported so far.

On Fri, Dec 1, 2017 at 6:11 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Continuing with review of 0007.
>
> +
> +/* Copy input rels's relids to grouped rel */
> +grouped_rel->relids = input_rel->relids;
>
> Isn't this done in fetch_upper_rel()? Why do we need it here?
> There's also a similar hunk in create_grouping_paths() which doesn't look
> appropriate. I guess, you need relids in grouped_rel->relids for FDW.
> There are
> two ways to do this: 1. set grouped_rel->relids for parent upper rel as
> well,
> but then we should pass relids to fetch_upper_rel() instead of setting
> those
> later. 2. For a parent upper rel, in create_foreignscan_plan(), set relids
> to
> all_baserels, if upper_rel->relids is NULL and don't set relids for a
> parent
> upper rel. I am fine with either of those.
>

Done. Opted second option.


>
> +/* partial phase */
> +get_agg_clause_costs(root, (Node *) partial_target->exprs,
> + AGGSPLIT_INITIAL_SERIAL,
> + _partial_costs);
>
> IIUC, the costs for evaluating aggregates would not change per child. They
> won't be different for parent and any of the children. So, we should be
> able to
> calculate those once, save in "extra" and use repeatedly.
>

Yep. Done.


>
> +if (can_sort)
> +{
> +Path   *path = cheapest_path;
> +
> +if (!(pathkeys_contained_in(root->group_pathkeys,
> +path->pathkeys)))
> [ .. clipped patch .. ]
> +   NIL,
> +   dNumGroups));
> +}
>
> We create two kinds of paths partial paths for parallel query and partial
> aggregation paths when group keys do not have partition keys. The comments
> and
> code uses partial to mean both the things, which is rather confusing. May
> be we
> should use term "partial aggregation" explicitly wherever it means that in
> comments and in variable names.
>

Agree. Used "partial aggregation" wherever applicable. Let me know if you
see any other place need this adjustments.


> I still feel that create_grouping_paths() and create_child_grouping_paths()
> have a lot of common code. While I can see that there are some pockets in
> create_grouping_paths() which are not required in
> create_child_grouping_paths()
> and vice-versa, may be we should create only one function and move those
> pockets under "if (child)" or "if (parent)" kind of conditions. It will be
> a
> maintenance burden to keep these two functions in sync in future. If we do
> not
> keep them in sync, that will introduce bugs.
>

Agree that keeping these two functions in sync in future will be a
maintenance burden, but I am not yet sure how to refactor them cleanly.
Will give one more try and update those changes in the next patchset.


> +
> +/*
> + * create_partition_agg_paths
> + *
> + * Creates append path for all the partitions and adds into the grouped
> rel.
>
> I think you want to write "Creates an append path containing paths from
> all the
> child grouped rels and adds into the given parent grouped rel".
>

Reworded as you said.


>
> + * For partial mode we need to add a finalize agg node over append path
> before
> + * adding a path to grouped rel.
> + */
> +void
> +create_partition_agg_paths(PlannerInfo *root,
> +   RelOptInfo *grouped_rel,
> +   List *subpaths,
> +   List *partial_subpaths,
>
> Why do we have these two as separate arguments? I don't see any call to
> create_partition_agg_paths() through add_append_path() passing both of them
> non-NULL simultaneously. May be you want use a single list subpaths and
> another
> boolean indicating whether it's list of partial paths or regular paths.
>
>
After redesigning in the area of putting gather over append, I don't need
to pass all Append subpaths to this function at-all. Append is done by
add_paths_to_append_rel() itself. This function now just adds fanalization
steps as needed.
So, we don't have two lists now. And to know about partial paths, passed a
boolean instead. Please have a look and let me know if I missed any.


> +
> +/* For non-partial path, just create a append path and we are done. */
> This is the kind of confusion, I am talking about above. Here you have
> mentioned "non-partial path" which may mean a regular path but what you
> actually mean by that term is a path representing partial aggregates.
>

> +/*
> + * Partial partition-wise grouping paths.  Need to create final
> + * aggregation path over append relation.
> + *
> + * If there are partial subpaths, then we need to add gather path
> before we
> + * append these subpaths.
>
> More confusion here.
>

Hopefully no more confusion in this new version.


> + */
> +if

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-14 Thread Ashutosh Bapat

On Thu, Dec 14, 2017 at 4:01 PM, Ashutosh Bapat
 wrote:
> +
> +-- Test when input relation for grouping is dummy
> +EXPLAIN (COSTS OFF)
> +SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
> +   QUERY PLAN
> +
> + HashAggregate
> +   Group Key: pagg_tab.c
> +   ->  Result
> + One-Time Filter: false
> +(4 rows)
>
> Not part of your patch, I am wondering if we can further optimize this plan by
> converting HashAggregate to Result (One-time Filter: false) and the aggregate
> target. Just an idea.
>
This comment is also wrong. The finalization step of aggregates needs
to be executed irrespective of whether or not the underlying scan
produces any rows. It may, for example, add a constant value to the
transition result. We may apply this optimization only when none of
aggregations have finalization functions, but it may not be worth the
code.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-14 Thread Ashutosh Bapat

On Thu, Dec 14, 2017 at 4:01 PM, Ashutosh Bapat
 wrote:
>
> +
> +EXPLAIN (COSTS OFF)
> +SELECT a FROM pagg_tab GROUP BY a ORDER BY 1;
> +   QUERY PLAN
> +-
> + Group
> +   Group Key: pagg_tab_p1.a
> +   ->  Merge Append
> + Sort Key: pagg_tab_p1.a
> + ->  Group
> +   Group Key: pagg_tab_p1.a
> +   ->  Sort
> + Sort Key: pagg_tab_p1.a
> + ->  Seq Scan on pagg_tab_p1
> [ ... clipped ... ]
> +(19 rows)
>
> It's strange that we do not annotate partial grouping as Partial. Does not 
> look
> like a bug in your patch. Do we get similar output with parallel grouping?

I am wrong here. It's not partial grouping. It's two level grouping. I
think annotating Group as Partial would be misleading. Sorry.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-14 Thread Ashutosh Bapat

On Wed, Dec 13, 2017 at 6:37 PM, Jeevan Chalke
 wrote:
>
>
> On Tue, Dec 12, 2017 at 3:43 PM, Ashutosh Bapat
>  wrote:
>>
>> Here are review comments for 0009
>
>
> Thank you, Ashutosh for the detailed review so far.
>
> I am working on your reviews but since parallel Append is now committed,
> I need to re-base my changes over it and need to resolve the conflicts too.
>
> Once done, I will submit the new patch-set fixing these and earlier review
> comments.
>

Sure no problem. Take your time. Here's set of comments for 0008. That
ends the first read of all the patches (2nd reading for the core
changes)

+-- Also, disable parallel paths.
+SET max_parallel_workers_per_gather TO 0;

If you enable parallel aggregation for smaller data partition-wise aggregation
paths won't be chosen. I think this is the reason why you are disabling
parallel query. But we should probably explain that in a comment. Better if we
could come up testcases without disabling parallel query. Since parallel append
is now committed, may be it can help.

+
+-- Check with multiple columns in GROUP BY, order in target-list is reversed
+EXPLAIN (COSTS OFF)
+SELECT c, a, count(*) FROM pagg_tab GROUP BY a, c;
+   QUERY PLAN
+-
+ Append
+   ->  HashAggregate
+ Group Key: pagg_tab_p1.a, pagg_tab_p1.c
+ ->  Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(10 rows)

Why do we need this testcase?

+
+-- Test when input relation for grouping is dummy
+EXPLAIN (COSTS OFF)
+SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
+   QUERY PLAN
+
+ HashAggregate
+   Group Key: pagg_tab.c
+   ->  Result
+ One-Time Filter: false
+(4 rows)

Not part of your patch, I am wondering if we can further optimize this plan by
converting HashAggregate to Result (One-time Filter: false) and the aggregate
target. Just an idea.

+
+SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
+ c | sum
+---+-
+(0 rows)

I think we also need a case when the child input relations are marked dummy and
then the parent is marked dummy. Just use a condition with partkey = .

+
+-- Check with SORTED paths. Disable hashagg to get group aggregate

Suggest: "Test GroupAggregate paths by disabling hash aggregates."

+-- When GROUP BY clause matches with PARTITION KEY.

I don't think we need "with", and just extend the same sentence with "complete
aggregation is performed for each partition"

+-- Should choose full partition-wise aggregation path

suggest: "Should choose full partition-wise GroupAggregate plan", but I guess
with the above suggestion, this sentence is not needed.

+
+-- When GROUP BY clause not matches with PARTITION KEY.
+-- Should choose partial partition-wise aggregation path

Similar suggestions as above.

+-- No aggregates, but still able to perform partition-wise aggregates

That's a funny construction. May be "Test partition-wise grouping without any
aggregates".

We should try some output for this query.

+
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab GROUP BY a ORDER BY 1;
+   QUERY PLAN
+-
+ Group
+   Group Key: pagg_tab_p1.a
+   ->  Merge Append
+ Sort Key: pagg_tab_p1.a
+ ->  Group
+   Group Key: pagg_tab_p1.a
+   ->  Sort
+ Sort Key: pagg_tab_p1.a
+ ->  Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(19 rows)

It's strange that we do not annotate partial grouping as Partial. Does not look
like a bug in your patch. Do we get similar output with parallel grouping?

+
+-- ORDERED SET within aggregate
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2;
+   QUERY PLAN
+
+ Sort
+   Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b ORDER BY pagg_tab_p1.a))
+   ->  GroupAggregate
+ Group Key: pagg_tab_p1.a
+ ->  Sort
+   Sort Key: pagg_tab_p1.a
+   ->  Append
+ ->  Seq Scan on pagg_tab_p1
+ ->  Seq Scan on pagg_tab_p2
+ ->  Seq Scan on pagg_tab_p3
+(10 rows)

pagg_tab is partitioned by column c. So, not having it in GROUP BY
itself might produce this plan if Partial parallel aggregation is expensive.
When testing negative tests like this GROUP BY should always have the partition
key.

In case of full aggregation, since all the rows that belong to the same group
come from the same partition, having an ORDER BY doesn't make any difference.
We should support such a case.

+INSERT INTO pagg_tab1 SELECT i%30, i%20 FROM generate_series(0, 299, 2) i;
+INSERT INTO pagg_tab2 SELECT i%20, i%30 FROM generate_series(0, 299, 3) i;

spaces around % operator?

+-- When GROUP BY clause matches with PARTITION KEY.
+--

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-12 Thread Ashutosh Bapat

Here are review comments for 0009
Only full aggregation is pushed on the remote server.

I think we can live with that for a while but we need to be able to push down
partial aggregates to the foreign server. I agree that it needs some
infrastructure to serialized and deserialize the partial aggregate values,
support unpartitioned aggregation first and then work on partitioned
aggregates. That is definitely a separate piece of work.

+-- ===
+-- test partition-wise-aggregates
+-- ===
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+CREATE TABLE pagg_tab_p1 (a int, b int, c text);
+CREATE TABLE pagg_tab_p2 (a int, b int, c text);
+CREATE TABLE pagg_tab_p3 (a int, b int, c text);

Like partition-wise join testcases please use LIKE so that it's easy to change
the table schema if required.

+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30,
'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30,
'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i %
30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30,
'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i %
30) >= 20;

We have to do this because INSERT tuple routing to a foreign partition is not
supported right now. Somebody has to remember to change this to a single
statement once that's done.

+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;

I thought this is not needed. When you ANALYZE the partitioned table, it would
analyze the partitions as well. But I see that partition-wise join is also
ANALYZING the foreign partitions separately. When I ran ANALYZE on a
partitioned table with foreign partitions, statistics for only the local tables
(partitioned and partitions) was updated. Of course this is separate work, but
probably needs to be fixed.

+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan when partition-wise-agg is disabled

s/when/with/

+-- Plan when partition-wise-agg is enabled

s/when/with/

+   ->  Append

Just like ForeignScan node's Relations tell what kind of ForeignScan this is,
may be we should annotate Append to tell whether the children are joins,
aggregates or relation scans. That might be helpful. Of course as another
patch.

+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING
avg(b) < 25 ORDER BY 1;
+ a  | sum  | min | count
++--+-+---
+  0 | 2000 |   0 |   100
+  1 | 2100 |   1 |   100
[ ... clipped ...]
+ 23 | 2300 |   3 |   100
+ 24 | 2400 |   4 |   100
+(15 rows)

May be we want to reduce the number of rows to a few by using a stricter HAVING
clause?

+
+-- When GROUP BY clause not matches with PARTITION KEY.

... clause does not match ...

+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
sum(a) < 800 ORDER BY 1;
+
QUERY PLAN
+-
+ Sort
+   Output: fpagg_tab_p1.b, (avg(fpagg_tab_p1.a)),
(max(fpagg_tab_p1.a)), (count(*))
+   ->  Partial HashAggregate
[ ... clipped ... ]
+ Output: fpagg_tab_p3.b, PARTIAL
avg(fpagg_tab_p3.a), PARTIAL max(fpagg_tab_p3.a), PARTIAL count(*),
PARTIAL sum(fpagg_tab_p3.a)
+ Group Key: fpagg_tab_p3.b
+ ->  Foreign Scan on public.fpagg_tab_p3
+   Output: fpagg_tab_p3.b, fpagg_tab_p3.a
+   Remote SQL: SELECT a, b FROM public.pagg_tab_p3
+(26 rows)

I think we interested in overall shape of the plan and not the details of
Remote SQL etc. So, may be turn off VERBOSE. This comment applies to an earlier
plan with enable_partition_wise_agg = false;

+
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
sum(a) < 800 ORDER BY 1;
+ b  | avg | max | count
++-+-+---
+  0 | 10. |  20 |60
+  1 | 11. |  21 |60
[... clipped ...]
+ 42 | 12. |  22 |60
+ 43 | 13. |  23 |60
+(20 rows)

Since the aggregates were not pushed down, I doubt we should be testing the
output. But this test is good to check partial aggregates over foreign
partition scans, which we don't have in postgres_fdw.sql I think. So, may be
add it as a separate patch?

Can you please add a test where we reference a whole-row; that usually has
troubles.

-if (root->hasHavingQual && query->havingQual)
+if (root->hasHavingQual && fpinfo->havingQual)

This is not exactly a problem with your patch, but why do we need to check both
the boolean and the actual clauses? If the boolean is true, query->havingQual
should be non-NULL and NULL otherwise.

 /* Grouping

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-03 Thread Ashutosh Bapat

On Sat, Dec 2, 2017 at 4:08 AM, Robert Haas  wrote:
> On Fri, Dec 1, 2017 at 7:41 AM, Ashutosh Bapat
>  wrote:
>> This code creates plans where there are multiple Gather nodes under an Append
>> node.
>
> We should avoid that.  Starting and stopping workers is inefficient,
> and precludes things like turning the Append into a Parallel Append.

Ah, I didn't think about it. Thanks for bringing it up.

>
>> AFAIU, the workers assigned to one gather node can be reused until that
>> Gather node finishes. Having multiple Gather nodes under an Append mean that
>> every worker will be idle from the time that worker finishes the work till 
>> the
>> last worker finishes the work.
>
> No, workers will exit as soon as they finish.  They don't hang around idle.

Sorry, I think I used wrong word "idle". I meant that if a worker
finishes and exists, the query can't use it that worker slot until the
next Gather node starts. But as you pointed out, starting and stopping
a worker is costlier than the cost of not using the slot. So we should
avoid such plans.

>
>> index b422050..1941468 100644
>> --- a/src/tools/pgindent/typedefs.list
>> +++ b/src/tools/pgindent/typedefs.list
>> @@ -2345,6 +2345,7 @@ UnlistenStmt
>>  UnresolvedTup
>>  UnresolvedTupData
>>  UpdateStmt
>> +UpperPathExtraData
>>  UpperRelationKind
>>  UpperUniquePath
>>  UserAuth
>>
>> Do we commit this file as part of the feature?
>
> Andres and I regularly commit such changes; Tom rejects them.
>

We will leave it to the committer to decide what to do with this hunk.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-01 Thread Robert Haas

On Fri, Dec 1, 2017 at 7:41 AM, Ashutosh Bapat
 wrote:
> This code creates plans where there are multiple Gather nodes under an Append
> node.

We should avoid that.  Starting and stopping workers is inefficient,
and precludes things like turning the Append into a Parallel Append.

> AFAIU, the workers assigned to one gather node can be reused until that
> Gather node finishes. Having multiple Gather nodes under an Append mean that
> every worker will be idle from the time that worker finishes the work till the
> last worker finishes the work.

No, workers will exit as soon as they finish.  They don't hang around idle.

> index b422050..1941468 100644
> --- a/src/tools/pgindent/typedefs.list
> +++ b/src/tools/pgindent/typedefs.list
> @@ -2345,6 +2345,7 @@ UnlistenStmt
>  UnresolvedTup
>  UnresolvedTupData
>  UpdateStmt
> +UpperPathExtraData
>  UpperRelationKind
>  UpperUniquePath
>  UserAuth
>
> Do we commit this file as part of the feature?

Andres and I regularly commit such changes; Tom rejects them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-01 Thread Ashutosh Bapat

Continuing with review of 0007.

+
+/* Copy input rels's relids to grouped rel */
+grouped_rel->relids = input_rel->relids;

Isn't this done in fetch_upper_rel()? Why do we need it here?
There's also a similar hunk in create_grouping_paths() which doesn't look
appropriate. I guess, you need relids in grouped_rel->relids for FDW. There are
two ways to do this: 1. set grouped_rel->relids for parent upper rel as well,
but then we should pass relids to fetch_upper_rel() instead of setting those
later. 2. For a parent upper rel, in create_foreignscan_plan(), set relids to
all_baserels, if upper_rel->relids is NULL and don't set relids for a parent
upper rel. I am fine with either of those.

+/* partial phase */
+get_agg_clause_costs(root, (Node *) partial_target->exprs,
+ AGGSPLIT_INITIAL_SERIAL,
+ _partial_costs);

IIUC, the costs for evaluating aggregates would not change per child. They
won't be different for parent and any of the children. So, we should be able to
calculate those once, save in "extra" and use repeatedly.

+if (can_sort)
+{
+Path   *path = cheapest_path;
+
+if (!(pathkeys_contained_in(root->group_pathkeys,
+path->pathkeys)))
[ .. clipped patch .. ]
+   NIL,
+   dNumGroups));
+}

We create two kinds of paths partial paths for parallel query and partial
aggregation paths when group keys do not have partition keys. The comments and
code uses partial to mean both the things, which is rather confusing. May be we
should use term "partial aggregation" explicitly wherever it means that in
comments and in variable names.

I still feel that create_grouping_paths() and create_child_grouping_paths()
have a lot of common code. While I can see that there are some pockets in
create_grouping_paths() which are not required in create_child_grouping_paths()
and vice-versa, may be we should create only one function and move those
pockets under "if (child)" or "if (parent)" kind of conditions. It will be a
maintenance burden to keep these two functions in sync in future. If we do not
keep them in sync, that will introduce bugs.

+
+/*
+ * create_partition_agg_paths
+ *
+ * Creates append path for all the partitions and adds into the grouped rel.

I think you want to write "Creates an append path containing paths from all the
child grouped rels and adds into the given parent grouped rel".

+ * For partial mode we need to add a finalize agg node over append path before
+ * adding a path to grouped rel.
+ */
+void
+create_partition_agg_paths(PlannerInfo *root,
+   RelOptInfo *grouped_rel,
+   List *subpaths,
+   List *partial_subpaths,

Why do we have these two as separate arguments? I don't see any call to
create_partition_agg_paths() through add_append_path() passing both of them
non-NULL simultaneously. May be you want use a single list subpaths and another
boolean indicating whether it's list of partial paths or regular paths.

+
+/* For non-partial path, just create a append path and we are done. */
This is the kind of confusion, I am talking about above. Here you have
mentioned "non-partial path" which may mean a regular path but what you
actually mean by that term is a path representing partial aggregates.

+/*
+ * Partial partition-wise grouping paths.  Need to create final
+ * aggregation path over append relation.
+ *
+ * If there are partial subpaths, then we need to add gather path before we
+ * append these subpaths.

More confusion here.

+ */
+if (partial_subpaths)
+{
+ListCell   *lc;
+
+Assert(subpaths == NIL);
+
+foreach(lc, partial_subpaths)
+{
+Path   *path = lfirst(lc);
+doubletotal_groups = path->rows * path->parallel_workers;
+
+/* Create gather paths for partial subpaths */
+Path *gpath = (Path *) create_gather_path(root, grouped_rel, path,
+  path->pathtarget, NULL,
+  _groups);
+
+subpaths = lappend(subpaths, gpath);

Using the argument variable is confusing and that's why you need two different
List variables. Instead probably you could have another variable local to this
function to hold the gather subpaths.

AFAIU, the Gather paths that this code creates has its parent set to
parent grouped
rel. That's not correct. These partial paths come from children of grouped rel
and each gather is producing rows corresponding to one children of grouped rel.
So gather path's parent should be set to corresponding child and not parent
grouped rel.

This code creates plans where there are multiple Gather nodes under an

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-29 Thread Michael Paquier

On Tue, Nov 28, 2017 at 5:50 PM, Jeevan Chalke
 wrote:
> [snip]

This is still a hot topic so I am moving it to next CF.
-- 
Michael

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-28 Thread Jeevan Chalke

On Tue, Nov 28, 2017 at 12:37 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

> On Thu, Nov 23, 2017 at 6:38 PM, Jeevan Chalke
>  wrote:
> > Let me know if I missed any comment to be fixed.
>
> Hi,
>
> I have applied v8 patches on commit id 8735978e7aebfbc499843630131c18
> d1f7346c79,
> and getting below observation, please take a look.
>
> Observation:
> "when joining a foreign partition table with local partition table
> getting wrong output
> with partition_wise_join enabled, same is working fine on PG-head
> without aggregates patch."
>

I have observed the same behavior on the master branch too when
partition-wise join path is selected irrespective of this patch-set.

This is happening because data on the foreign table is not compliance with
the partitioning constraints.


> Test-case:
> CREATE EXTENSION postgres_fdw;
> CREATE SERVER pwj_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS
> (dbname 'postgres',port '5432',use_remote_estimate 'true');
> CREATE USER MAPPING FOR PUBLIC SERVER pwj_server;
>
> CREATE TABLE fplt1 (a int,  c text) PARTITION BY LIST(c);
> CREATE TABLE fplt1_p1 (a int,  c text);
> CREATE TABLE fplt1_p2 (a int,  c text);
> CREATE FOREIGN TABLE ftplt1_p1 PARTITION OF fplt1 FOR VALUES IN
> ('', '0001', '0002', '0003') SERVER pwj_server OPTIONS (TABLE_NAME
> 'fplt1_p1');
> CREATE FOREIGN TABLE ftplt1_p2 PARTITION OF fplt1 FOR VALUES IN
> ('0004', '0005', '0006', '0007') SERVER pwj_server OPTIONS (TABLE_NAME
> 'fplt1_p2');
> INSERT INTO fplt1_p1 SELECT i, to_char(i%8, 'FM') FROM
> generate_series(0, 199, 2) i;
> INSERT INTO fplt1_p2 SELECT i, to_char(i%8, 'FM') FROM
> generate_series(200, 398, 2) i;
>
> CREATE TABLE lplt2 (a int,  c text) PARTITION BY LIST(c);
> CREATE TABLE lplt2_p1 PARTITION OF lplt2 FOR VALUES IN ('',
> '0001', '0002', '0003');
> CREATE TABLE lplt2_p2 PARTITION OF lplt2 FOR VALUES IN ('0004',
> '0005', '0006', '0007');
> INSERT INTO lplt2 SELECT i, to_char(i%8, 'FM') FROM
> generate_series(0, 398, 3) i;
>
> SELECT t1.c, t2.c,count(*) FROM fplt1 t1 JOIN lplt2 t2 ON (t1.c = t2.c
> and t1.a = t2.a)  WHERE t1.a % 25 = 0 GROUP BY 1,2 ORDER BY t1.c,
> t2.c;
>   c   |  c   | count
> --+--+---
>   |  | 1
>  0004 | 0004 | 1
>  0006 | 0006 | 1
> (3 rows)
>
> SET enable_partition_wise_join = on;
> SELECT t1.c, t2.c,count(*) FROM fplt1 t1 JOIN lplt2 t2 ON (t1.c = t2.c
> and t1.a = t2.a)  WHERE t1.a % 25 = 0 GROUP BY 1,2 ORDER BY t1.c,
> t2.c;
>   c   |  c   | count
> --+--+---
>   |  | 1
>  0004 | 0004 | 1
> (2 rows)
>
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-27 Thread Rajkumar Raghuwanshi

On Thu, Nov 23, 2017 at 6:38 PM, Jeevan Chalke
 wrote:
> Let me know if I missed any comment to be fixed.

Hi,

I have applied v8 patches on commit id 8735978e7aebfbc499843630131c18d1f7346c79,
and getting below observation, please take a look.

Observation:
"when joining a foreign partition table with local partition table
getting wrong output
with partition_wise_join enabled, same is working fine on PG-head
without aggregates patch."

Test-case:
CREATE EXTENSION postgres_fdw;
CREATE SERVER pwj_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(dbname 'postgres',port '5432',use_remote_estimate 'true');
CREATE USER MAPPING FOR PUBLIC SERVER pwj_server;

CREATE TABLE fplt1 (a int,  c text) PARTITION BY LIST(c);
CREATE TABLE fplt1_p1 (a int,  c text);
CREATE TABLE fplt1_p2 (a int,  c text);
CREATE FOREIGN TABLE ftplt1_p1 PARTITION OF fplt1 FOR VALUES IN
('', '0001', '0002', '0003') SERVER pwj_server OPTIONS (TABLE_NAME
'fplt1_p1');
CREATE FOREIGN TABLE ftplt1_p2 PARTITION OF fplt1 FOR VALUES IN
('0004', '0005', '0006', '0007') SERVER pwj_server OPTIONS (TABLE_NAME
'fplt1_p2');
INSERT INTO fplt1_p1 SELECT i, to_char(i%8, 'FM') FROM
generate_series(0, 199, 2) i;
INSERT INTO fplt1_p2 SELECT i, to_char(i%8, 'FM') FROM
generate_series(200, 398, 2) i;

CREATE TABLE lplt2 (a int,  c text) PARTITION BY LIST(c);
CREATE TABLE lplt2_p1 PARTITION OF lplt2 FOR VALUES IN ('',
'0001', '0002', '0003');
CREATE TABLE lplt2_p2 PARTITION OF lplt2 FOR VALUES IN ('0004',
'0005', '0006', '0007');
INSERT INTO lplt2 SELECT i, to_char(i%8, 'FM') FROM
generate_series(0, 398, 3) i;

SELECT t1.c, t2.c,count(*) FROM fplt1 t1 JOIN lplt2 t2 ON (t1.c = t2.c
and t1.a = t2.a)  WHERE t1.a % 25 = 0 GROUP BY 1,2 ORDER BY t1.c,
t2.c;
  c   |  c   | count
--+--+---
  |  | 1
 0004 | 0004 | 1
 0006 | 0006 | 1
(3 rows)

SET enable_partition_wise_join = on;
SELECT t1.c, t2.c,count(*) FROM fplt1 t1 JOIN lplt2 t2 ON (t1.c = t2.c
and t1.a = t2.a)  WHERE t1.a % 25 = 0 GROUP BY 1,2 ORDER BY t1.c,
t2.c;
  c   |  c   | count
--+--+---
  |  | 1
 0004 | 0004 | 1
(2 rows)


Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-23 Thread Ashutosh Bapat

On Thu, Nov 23, 2017 at 6:38 PM, Jeevan Chalke
 wrote:
>
>
> Agree. However, there was ab existing comment in create_append_path() saying
> "We don't bother with inventing a cost_append(), but just do it here", which
> implies at sometime in future we may need it; so why not now where we are
> explicitly costing for an append node. Having a function is good so that, if
> required in future, we need update in only this function.
> Let me know if you think otherwise, I make those changes in next patchset.
>

I don't read that comment as something we will do it in future. I
don't think the amount of changes that this patch introduces just for
adding one more line of code aren't justified. There's anyway only one
place where we are costing append, so it's not that the new function
avoids code duplication. Although I am happy to defer this to the
committer, if you think that we need a separate function.

>
> As suggested by Robert, I have renamed it to APPEND_CPU_COST_MULTIPLIER in
> v7 patchset.
> Also, retained the #define for just multiplier as suggested by Robert.

Ok.


>
>>
>>
>> Anyway, it doesn't look like a good idea to pass an argument (gd) only to
>> return from that function in case of its presence. May be we should handle
>> it
>> outside this function.
>
>
> Well, I would like to have it inside the function itself. Let the function
> itself do all the necessary checking rather than doing some of them outside.

We will leave this to the committer. I don't like that style, but it's
also good to expect a function to do all related work.


>>
>> +if (!create_child_grouping_paths(root, input_child_rel,
>> agg_costs, gd,
>> + ))
>> +{
>> +/* Could not create path for childrel, return */
>> +pfree(appinfos);
>> +return;
>> +}
>>
>> Can we detect this condition and bail out even before planning any of the
>> children? It looks wasteful to try to plan children only to bail out in
>> this
>> case.
>
>
> I don't think so. It is like non-reachable and added just for a safety in
> case we can't able to create a child path. The bail out conditions cannot be
> evaluated at the beginning. Do you this an Assert() will be good here? Am I
> missing something?

An Assert would help. If it's something that should not happen, we
should try catching that rather that silently ignoring it.

>
>>
>> +/* Nothing to do if we have no live children */
>> +if (live_children == NIL)
>> +return;
>>
>> A parent relation with all dummy children will also be dummy. May be we
>> should
>> mark the parent dummy case using mark_dummy_rel() similar to
>> generate_partition_wise_join_paths().
>
>
> If parent is dummy, then we are not at all doing PWA. So no need to mark
> parent grouped_rel as dummy I guess.
> However, if some of the children are dummy, I am marking corresponding upper
> rel as dummy too.
> Actually, this condition will never going to be true as you said correctly
> that "A parent relation with all dummy children will also be dummy". Should
> we have an Assert() instead?

Yes.


>
>
> I have testcase for multi-level partitioned table.
> However, I did not understand by what you mean by "children with different
> order of partition key columns". I had a look over tests in
> partition_join.sql and it seems that I have cover all those scenarios.
> Please have a look over testcases added for PWA and let me know the
> scenarios missing, I will add them then.

By children with different order of partition key columns, I meant
something like this

parent(a int, b int, c int) partition by (a), child1(b int, c int, a
int) partition by b, child1_1 (c int, a int, b int);

where the attribute numbers of the partition keys in different
children are different.

>> This looks like a useful piece of general functionality
>> list_has_intersection(), which would returns boolean instead of the whole
>> intersection. I am not sure whether we should add that function to list.c
>> and
>> use here.
>
>
> Sounds good.
> But for now, I am keeping it as part of this feature itself.

ok

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-21 Thread Robert Haas

On Fri, Nov 17, 2017 at 7:24 AM, Ashutosh Bapat
 wrote:
> + * this value should be multiplied with cpu_tuple_cost wherever applicable.
> + */
> +#define DEFAULT_APPEND_COST_FACTOR 0.5
>
> I am wondering whether we should just define
> #define APPEND_TUPLE_COST (cpu_tuple_cost * 0.5)
> and use this macro everywhere. What else use DEFAULT_APPEND_COST_FACTOR would
> have other than multiplying with cpu_tuple_cost?

-1.  If you wrap it in a macro like that, future readers of the code
will have to go look up what the macro does.  If you just multiply by
DEFAULT_APPEND_COST_FACTOR it will be clear that's what being used is
a multiple of cpu_tuple_cost.

> I am not sure whether we should be discussing why this technique performs
> better or when it performs better. We don't have similar discussion for
> partition-wise join. That paragraph just describes the technique and may be we
> want to do the same here.

+1.

> + * might be optimal because of presence of suitable paths with pathkeys or
> + * because the hash tables for most of the partitions fit into the memory.
> + * However, if partition keys are not part of the group by clauses, then we
> + * still able to compute the partial aggregation for each partition and then
> + * finalize them over append. This can either win or lose. It may win if the
> + * PartialAggregate stage greatly reduces the number of groups and lose if we
> + * have lots of small groups.
>
> I have not seen prologue of a function implementing a query optimization
> technique explain why that technique improves performance. So I am not sure
> whether the comment should include this explanation. One of the reasons being
> that the reasons why a technique works might change over the period of time
> with the introduction of other techniques, thus obsoleting the comment.  But
> may be it's good to have it here.

+1 for keeping it.

> +extra.inputRows = 0;/* Not needed at child paths creation */
>
> Why? Comment should be on its own line.

Comments on same line are fine if they are short enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-17 Thread Ashutosh Bapat

On Wed, Nov 15, 2017 at 5:31 PM, Jeevan Chalke
 wrote:
>
> OK. Done in the attached patch set.
>
> I have rebased all my patches on latest HEAD which is at
> 7518049980be1d90264addab003476ae105f70d4
>
> Thanks

These are review comments for the last set and I think most of them
apply to the new set as well.

Patches 0001 - 0005 refactoring existing code. I haven't
reviewed them in detail, checking whether we have missed anything in moving the
code, but they mostly look fine.

Comments on 0006
 /*
+ * cost_append
+ *  Determines and returns the cost of an Append node.
+ *
... clipped portion
+
+/* Add Append node overhead. */
+run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+

I am wondering whether it's really worth creating a new function for a single
line addition to create_append_path(). I think all we need to change in
create_append_path() is add (cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR *
tuples) to path->total_cost.

+/* Add MergeAppend node overhead like we do it for the Append node */
+run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+

With this change the following comment is no more true. Please remove it.
 * extracted tuple.  We don't charge cpu_tuple_cost because a MergeAppend
 * node doesn't do qual-checking or projection, so it has less overhead
 * than most plan nodes.
 */

+/*
+ * Arbitrarily use 50% of the cpu_tuple_cost to cost append node. Note that

May be reword it as " ... to cost per tuple processing by an append node ..."

+ * this value should be multiplied with cpu_tuple_cost wherever applicable.
+ */
+#define DEFAULT_APPEND_COST_FACTOR 0.5

I am wondering whether we should just define
#define APPEND_TUPLE_COST (cpu_tuple_cost * 0.5)
and use this macro everywhere. What else use DEFAULT_APPEND_COST_FACTOR would
have other than multiplying with cpu_tuple_cost?

 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM
plt1 t1, plt2 t2, plt1_e t3 WHERE t1.c = t2.c AND ltrim(t3.c, 'A') =
t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-  QUERY PLAN
---
+   QUERY PLAN
+
  Sort
Sort Key: t1.c, t3.c
->  HashAggregate
  Group Key: t1.c, t2.c, t3.c
- ->  Result
+ ->  Hash Join
+   Hash Cond: (t1.c = t2.c)
->  Append
- ->  Hash Join
-   Hash Cond: (t1.c = t2.c)

That's sad. Interestingly this query has an aggregate, so the plan will use
partition-wise join again when partition-wise aggregation patch will be
applied. So may be fine.

- Append  (cost=0.00..0.04 rows=2 width=32)
+ Append  (cost=0.00..0.05 rows=2 width=32)

- Append  (cost=0.00..0.04 rows=2 width=4)
+ Append  (cost=0.00..0.05 rows=2 width=4)

We do have some testcases which print costs. Interesting :). I don't have
any objection to this change.

Comments on 0007

+   
+Enables or disables the query planner's use of partition-wise grouping
+or aggregation, which allows  If partition-wise aggregation
does not result in the
+cheapest path, it will still spend time in creating these paths and
+consume memory which increase linearly with the number of partitions.
+The default is off.
+   
+  
+ 
+
May be we should word this in the same manner as partition-wise join like

Enables or disables the query planner's use of partition-wise grouping
or aggregation, which allows aggregation or grouping on a partitioned
tables to be spread across the partitions. If GROUP
BY clause includes partition keys, the rows are aggregated at
each partition. Otherwise, partial aggregates computed for each
partition are required to be combined. Because partition-wise
aggregation/gropuing can use significantly more CPU time and memory
during planning, the default is off.

+
+Partition-wise aggregates/grouping
+--

... clipped patch

+In above plan, aggregation is performed after append node which means that the
+whole table is an input for the aggregation node. However, with partition-wise
+aggregation, same query will have plane like:

s/plane/plan/

+ Append

... clipped patch

+PartialAggregate stage greatly reduces the number of groups and lose if we have
+lots of small groups.

To keep the discussion brief, I suggest we rewrite this paragraph as

If GROUP BY clause has all partition keys, all the rows that belong to a given
group come from a single partition and thus aggregates can be finalized
separately for each partition. When the number of groups is far lesser than the

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-16 Thread Ashutosh Bapat

On Thu, Nov 16, 2017 at 6:02 AM, David Rowley
 wrote:
> On 16 November 2017 at 05:57, Konstantin Knizhnik
>  wrote:
>> The main problem IMHO is that there are a lot of different threads and
>> patches related with this topic:(
>> And it is very difficult to combine all of them together to achieve the
>> final goal: efficient execution of OLAP queries on sharded table.
>> It will be nice if somebody who is making the most contribution in this
>> direction can somehow maintain it...
>> I just faced with particular problem with our pg_shardman extension and now
>> (thanks to your patch) I have some working solution for it.
>> But certainly I prefer to have this support in mainstream version of
>> Postgres.
>
> I don't think it's fair to be asking about additional features on this
> thread. It seems to me you're asking about two completely separate
> features, with the aim of trying to solve your own problems.
>
> It also looks to me that Jeevan has been clear on what his goals are
> for this patch. Perhaps what you're asking for is a logical direction
> to travel once this patch is committed, so I think, probably, the best
> way to conduct what you're after here is to either:
>
> a) Wait until this is committed and spin up your own thread about
> you're proposed changes to allow the PARTIAL aggregate to be pushed
> into the foreign server, or;
> b) Spin up your own thread now, with reference to this patch as a
> prerequisite to your own patch.
>
> I agree that what you're talking about is quite exciting stuff, but
> please, let's not talk about it here.
>

+1 for all that.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-15 Thread David Rowley

On 16 November 2017 at 05:57, Konstantin Knizhnik
 wrote:
> The main problem IMHO is that there are a lot of different threads and
> patches related with this topic:(
> And it is very difficult to combine all of them together to achieve the
> final goal: efficient execution of OLAP queries on sharded table.
> It will be nice if somebody who is making the most contribution in this
> direction can somehow maintain it...
> I just faced with particular problem with our pg_shardman extension and now
> (thanks to your patch) I have some working solution for it.
> But certainly I prefer to have this support in mainstream version of
> Postgres.

I don't think it's fair to be asking about additional features on this
thread. It seems to me you're asking about two completely separate
features, with the aim of trying to solve your own problems.

It also looks to me that Jeevan has been clear on what his goals are
for this patch. Perhaps what you're asking for is a logical direction
to travel once this patch is committed, so I think, probably, the best
way to conduct what you're after here is to either:

a) Wait until this is committed and spin up your own thread about
you're proposed changes to allow the PARTIAL aggregate to be pushed
into the foreign server, or;
b) Spin up your own thread now, with reference to this patch as a
prerequisite to your own patch.

I agree that what you're talking about is quite exciting stuff, but
please, let's not talk about it here.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-15 Thread Konstantin Knizhnik

On 15.11.2017 13:35, Jeevan Chalke wrote:

As explained by Ashutosh Bapat in reply
https://www.postgresql.org/message-id/CAFjFpRdpeMTd8kYbM_x0769V-aEKst5Nkg3+coG=8ki7s8z...@mail.gmail.com
we cannot rely on just aggtype==aggtranstype.

Obviously this check (aggtype==aggtranstype) is not correct criteria for
all user defined aggregates.

I just did it as temporary work around for standard aggregates.

However, I have tried pushing partial aggregation over remote server
and also

submitted a PoC patch here:
https://www.postgresql.org/message-id/CAM2+6=uakp9+tsjuh2fbhhwnjc7oyfl1_gvu7mt2fxtvt6g...@mail.gmail.com

I have later removed these patches from Partition-wise-Aggregation
patch set
as it is altogether a different issue than this mail thread. We might
need to

discuss on it separately.

...
Interesting idea of "asynchronous append". However, IMHO it deserves
its own

email-chain.

The main problem IMHO is that there are a lot of different threads and
patches related with this topic:(
And it is very difficult to combine all of them together to achieve the
final goal: efficient execution of OLAP queries on sharded table.
It will be nice if somebody who is making the most contribution in this
direction can somehow maintain it...
I just faced with particular problem with our pg_shardman extension and
now (thanks to your patch) I have some working solution for it.
But certainly I prefer to have this support in mainstream version of
Postgres.

There are two open questions, which I wan to discuss (sorry, may be one
again this is not the right thread for it):

1. Parallel append and FDW/postgres_fdw: should FDW support parallel
scan and do we really need it to support concurrent execution of query
on local and remote partitions?
"Asynchronous append" partly solves this problem, but only for remote
partitions. I do not completely understand all complexity of alternative
approaches.

2. Right now partition-wise aggregation/grouping works only for tables
partitioned using new PG 10 partitioning mechanism. But it doesn't work
for inherited tables, although
there seems to be not so much difference between this two cases. Do you
think that sometimes it will be also supported for standard inheritance
mechanism or there is no sense in it?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-15 Thread Jeevan Chalke

On Thu, Nov 2, 2017 at 7:36 AM, Robert Haas  wrote:

> On Wed, Nov 1, 2017 at 6:20 PM, Jeevan Chalke
>  wrote:
> > Yep.
> > But as David reported earlier, if we remove the first part i.e. adding
> > cpu_operator_cost per tuple, Merge Append will be preferred over an
> Append
> > node unlike before. And thus, I thought of better having both, but no so
> > sure. Should we remove that part altogether, or add both in a single
> > statement with updated comments?
>
> I was only suggesting that you update the comments.
>

OK. Done in the attached patch set.

I have rebased all my patches on latest HEAD which is at
7518049980be1d90264addab003476ae105f70d4

Thanks


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v7.tar.gz
Description: GNU Zip compressed data

Re: [HACKERS] Partition-wise aggregation/grouping

2017-11-15 Thread Jeevan Chalke

On Sun, Nov 12, 2017 at 1:59 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> On 10/27/2017 02:01 PM, Jeevan Chalke wrote:
>
>> Hi,
>>
>> Attached new patch-set here. Changes include:
>>
>> 1. Added separate patch for costing Append node as discussed up-front in
>> the
>> patch-set.
>> 2. Since we now cost Append node, we don't need
>> partition_wise_agg_cost_factor
>> GUC. So removed that. The remaining patch hence merged into main
>> implementation
>> patch.
>> 3. Updated rows in test-cases so that we will get partition-wise plans.
>>
>> Thanks
>>
>
> I applied partition-wise-agg-v6.tar.gz patch to  the master and use
> shard.sh example from https://www.postgresql.org/mes
> sage-id/14577.1509723225%40localhost
> Plan for count(*) is the following:
>
> shard=# explain select count(*) from orders;
>   QUERY PLAN
> 
> ---
>  Finalize Aggregate  (cost=100415.29..100415.30 rows=1 width=8)
>->  Append  (cost=50207.63..100415.29 rows=2 width=8)
>  ->  Partial Aggregate  (cost=50207.63..50207.64 rows=1 width=8)
>->  Foreign Scan on orders_0 (cost=101.00..50195.13
> rows=5000 width=0)
>  ->  Partial Aggregate  (cost=50207.63..50207.64 rows=1 width=8)
>->  Foreign Scan on orders_1 (cost=101.00..50195.13
> rows=5000 width=0)
>
>
> We really calculate partial aggregate for each partition, but to do we
> still have to fetch all data from remote host.
> So for foreign partitions such plans is absolutely inefficient.
> Amy be it should be combined with some other patch?
> For example, with  agg_pushdown_v4.tgz patch
> https://www.postgresql.org/message-id/14577.1509723225%40localhost ?
> But it is not applied after partition-wise-agg-v6.tar.gz patch.
> Also postgres_fdw in 11dev is able to push down aggregates without
> agg_pushdown_v4.tgz patch.
>
> In 0009-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patch
> there is the following check:
>
>  /* Partial aggregates are not supported. */
> +   if (extra->isPartial)
> +   return;
>
> If we just comment this line then produced plan will be the following:
>
> shard=# explain select sum(product_id) from orders;
>QUERY PLAN
> 
>  Finalize Aggregate  (cost=308.41..308.42 rows=1 width=8)
>->  Append  (cost=144.18..308.41 rows=2 width=8)
>  ->  Foreign Scan  (cost=144.18..154.20 rows=1 width=8)
>Relations: Aggregate on (public.orders_0 orders)
>  ->  Foreign Scan  (cost=144.18..154.20 rows=1 width=8)
>Relations: Aggregate on (public.orders_1 orders)
> (6 rows)
>
> And it is actually desired plan!
> Obviously such approach will not always work. FDW really doesn't support
> partial aggregates now.
> But for most frequently used aggregates: sum, min, max, count
> aggtype==aggtranstype and there is no difference
> between partial and normal aggregate calculation.
> So instead of (extra->isPartial) condition we can add more complex check
> which will traverse pathtarget expressions and
> check if it can be evaluated in this way. Or... extend FDW API to support
> partial aggregation.
>

As explained by Ashutosh Bapat in reply
https://www.postgresql.org/message-id/CAFjFpRdpeMTd8kYbM_x0769V-aEKst5Nkg3+coG=8ki7s8z...@mail.gmail.com
we cannot rely on just aggtype==aggtranstype.

However, I have tried pushing partial aggregation over remote server and
also
submitted a PoC patch here:
https://www.postgresql.org/message-id/CAM2+6=uakp9+tsjuh2fbhhwnjc7oyfl1_gvu7mt2fxtvt6g...@mail.gmail.com

I have later removed these patches from Partition-wise-Aggregation patch set
as it is altogether a different issue than this mail thread. We might need
to
discuss on it separately.


>
> But even the last plan is not ideal: it will calculate predicates at each
> remote node sequentially.
> There is parallel append patch:
> https://www.postgresql.org/message-id/CAJ3gD9ctEcrVUmpY6fq_J
> UB6WDKGXAGd70EY68jVFA4kxMbKeQ%40mail.gmail.com
> but ... FDW doesn't support parallel scan, so parallel append can not be
> applied in this case.
> And we actually do not need parallel append with all its dynamic workers
> here.
> We just need to start commands at all remote servers and only after it
> fetch results (which can be done sequentially).
>
> I am investigating problem of efficient execution of OLAP queries on
> sharded tables (tables with remote partitions).
> After reading all this threads and corresponding  patches, it seems to me
> that we already have most of parts of the puzzle, what we need is to put
> them on right places and may be add missed ones.
> I wonder if somebody is busy with it and can I somehow help here?
>
> Also I am not quite sure about the best approach with parallel execution
> of distributed query at all nodes.
> Should we make postgres_fdw

1 2 >

100 matches

Mail list logo