Re: [HACKERS] WIP: Aggregation push-down
Antonin Houska wrote: > This is another version of the patch. > shard.tgz demonstrates the typical postgres_fdw use case. One query shows base > scans of base relation's partitions being pushed to shard nodes, the other > pushes down a join and performs aggregation of the join result on the remote > node. Of course, the query can only references one particular partition, until > the "partition-wise join" [1] patch gets committed and merged with this my > patch. Since [1] is already there, the new version of shard.tgz shows what I consider the typical use case. (I'm aware of the postgres_fdw regression test failures, I'll try to fix them all in the next version.) Besides that: * A concept of "path unique keys" has been introduced. It helps to find out if the final relation appears to generate a distinct set of grouping keys. If that happens, the final aggregation is replaced by mere call of aggfinalfn() function on each transient state value. * FDW can sort rows by aggregate. * enable_agg_pushdown GUC was added. The default value is false. * I fixed errors reported during the previous CF. * Added a few more regression tests. I'm not about to add any other features now. Implementation of the missing parts (see the TODO comments in the code) is the next step. But what I'd appreciate most is a feedback on the design. Thanks. > [1] https://commitfest.postgresql.org/14/994/ -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at agg_pushdown_v4.tgz Description: GNU Zip compressed data shard.tgz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Aggregation push-down
On Fri, Sep 8, 2017 at 7:04 PM, Antonin Houska wrote: > Ashutosh Bapat wrote: > >> On Thu, Aug 17, 2017 at 8:52 PM, Antonin Houska wrote: >> > Antonin Houska wrote: >> > >> >> Antonin Houska wrote: >> >> >> >> > This is a new version of the patch I presented in [1]. >> >> >> >> Rebased. >> >> >> >> cat .git/refs/heads/master >> >> b9a3ef55b253d885081c2d0e9dc45802cab71c7b >> > >> > This is another version of the patch. >> > >> > Besides other changes, it enables the aggregation push-down for >> > postgres_fdw, >> > although only for aggregates whose transient aggregation state is equal to >> > the >> > output type. For other aggregates (like avg()) the remote nodes will have >> > to >> > return the transient state value in an appropriate form (maybe bytea type), >> > which does not depend on PG version. >> >> Having transient aggregation state type same as output type doesn't >> mean that we can feed the output of remote aggregation as is to the >> finalization stage locally or finalization is not needed at all. I am >> not sure why is that test being used to decide whether or not to push >> an aggregate (I assume they are partial aggregates) to the foreign >> server. > > I agree. This seems to be the same problem as reported in [2]. > >> postgres_fdw doesn't have a feature to fetch partial aggregate >> results from the foreign server. Till we do that, I don't think this >> patch can push any partial aggregates down to the foreign server. > > Even if the query contains only aggregates like sum(int4), i.e. aggfinalfn > does not exist and the transient type can be transfered from the remote server > in textual form? (Of course there's a question is if such behaviour is > consistent from user's perspective.) Yes, those functions will work. > >> > >> > One thing I'm not sure about is whether the patch should remove >> > GetForeignUpperPaths function from FdwRoutine, which it essentially makes >> > obsolete. Or should it only be deprecated so far? I understand that >> > deprecation is important for "ordinary SQL users", but FdwRoutine is an >> > interface for extension developers, so the policy might be different. >> >> I doubt if that's correct. We can do that only when we know that all >> kinds aggregates/grouping can be pushed down the join tree, which >> looks impossible to me. Apart from that that FdwRoutine handles all >> kinds of upper relations like windowing, distinct, ordered which are >> not pushed down by this set of patches. > > Good point. > I think this is where Jeevan Chalke's partition-wise aggregation helps. It deals with the cases where your patch can not push the aggregates down to children of join. >> The patch maintains an extra rel target, two pathlists, partial pathlist and >> non-partial pathlist for grouping in RelOptInfo. These two extra >> pathlists require "grouped" argument to be passed to a number of functions. >> Instead probably it makes sense to create a RelOptInfo for >> aggregation/grouping >> and set pathlist and partial_pathlist in that RelOptInfo. That will also >> make a >> place for keeping grouped estimates. > > If grouped paths require a separate RelOptInfo, why the existing partial paths > do not? partial paths produce the same targetlist and the same relation that non-partial paths do. A RelOptInfo in planner represents a set of rows and all paths added to that RelOptInfo produce same whole result (parameterized paths produces the same result collectively). Grouping paths however are producing a different result and different targetlist, so may be it's better to have a separate RelOptInfo. > >> For placeholders we have two function add_placeholders_to_base_rels() and >> add_placeholders_to_joinrel(). We have add_grouping_info_to_base_rels(), but >> do >> not have corresponding function for joinrels. How do we push aggregates >> involving two or more relations to the corresponding joinrels? > > add_grouping_info_to_base_rels() calls prepare_rel_for_grouping() which > actually adds the "grouping info". For join, prepare_rel_for_grouping() is > called from build_join_rel(). Ok. > > While PlaceHolderVars need to be finalized before planner starts to create > joins, the GroupedPathInfo has to fit particular pair of joined relations. > > Perhaps create_grouping_info_... would be better, to indicate that the thing > we're adding does not exist yet. I'll think about it. > >> Some minor assorted comments ... > > o.k., will fix. > >> Some quick observations using two tables t1(a int, b int) and t2(a int, b >> int), >> populated as >> insert into t1 select i, i % 5 from generate_series(1, 100) i where i % 2 = >> 0; >> insert into t2 select i, i % 5 from generate_series(1, 100) i where i % 3 = >> 0; >> >> 1. The patch pushes aggregates down join in the following query >> explain verbose select avg(t2.a) from t1 inner join t2 on t1.b = t2.b group >> by >> t2.b; >> but does not push the aggregates down join in the following query >> explain verbose select avg
Re: [HACKERS] WIP: Aggregation push-down
Ashutosh Bapat wrote: > On Thu, Aug 17, 2017 at 8:52 PM, Antonin Houska wrote: > > Antonin Houska wrote: > > > >> Antonin Houska wrote: > >> > >> > This is a new version of the patch I presented in [1]. > >> > >> Rebased. > >> > >> cat .git/refs/heads/master > >> b9a3ef55b253d885081c2d0e9dc45802cab71c7b > > > > This is another version of the patch. > > > > Besides other changes, it enables the aggregation push-down for > > postgres_fdw, > > although only for aggregates whose transient aggregation state is equal to > > the > > output type. For other aggregates (like avg()) the remote nodes will have to > > return the transient state value in an appropriate form (maybe bytea type), > > which does not depend on PG version. > > Having transient aggregation state type same as output type doesn't > mean that we can feed the output of remote aggregation as is to the > finalization stage locally or finalization is not needed at all. I am > not sure why is that test being used to decide whether or not to push > an aggregate (I assume they are partial aggregates) to the foreign > server. I agree. This seems to be the same problem as reported in [2]. > postgres_fdw doesn't have a feature to fetch partial aggregate > results from the foreign server. Till we do that, I don't think this > patch can push any partial aggregates down to the foreign server. Even if the query contains only aggregates like sum(int4), i.e. aggfinalfn does not exist and the transient type can be transfered from the remote server in textual form? (Of course there's a question is if such behaviour is consistent from user's perspective.) > > > > One thing I'm not sure about is whether the patch should remove > > GetForeignUpperPaths function from FdwRoutine, which it essentially makes > > obsolete. Or should it only be deprecated so far? I understand that > > deprecation is important for "ordinary SQL users", but FdwRoutine is an > > interface for extension developers, so the policy might be different. > > I doubt if that's correct. We can do that only when we know that all > kinds aggregates/grouping can be pushed down the join tree, which > looks impossible to me. Apart from that that FdwRoutine handles all > kinds of upper relations like windowing, distinct, ordered which are > not pushed down by this set of patches. Good point. > Here's review of the patchset > The patches compile cleanly when applied together. > > Testcase "limit" fails in make check. If I remember correctly, this test generates different plan because the aggregate push-down gets involved. I tend to ignore this error until the patch has cost estimates refined. Then let's see if the test will pass or if the expected output should be adjusted. > This is a pretty large patchset and the patches are divided by stages in the > planner rather than functionality. IOW, no single patch provides a full > functionality. Reviewing and probably committing such patches is not easy and > may take quite long. Instead, if the patches are divided by functionality, > i.e. > each patch provides some functionality completely, it will be easy to review > and commit such patches with each commit giving something useful. I had > suggested breaking patches on that line at [1]. The patches also refactor > existing code. It's better to move such refactoring in a patch/es by itself. I definitely saw commits whose purpose is preparation for something else. But you're right that some splitting of the actual funcionality wouldn't harm. I'll at least separate partial aggregation of base relations and that of joins. And maybe some more preparation steps. > The patches need more comments, explaining various decisions o.k. > The patch maintains an extra rel target, two pathlists, partial pathlist and > non-partial pathlist for grouping in RelOptInfo. These two extra > pathlists require "grouped" argument to be passed to a number of functions. > Instead probably it makes sense to create a RelOptInfo for > aggregation/grouping > and set pathlist and partial_pathlist in that RelOptInfo. That will also make > a > place for keeping grouped estimates. If grouped paths require a separate RelOptInfo, why the existing partial paths do not? > For placeholders we have two function add_placeholders_to_base_rels() and > add_placeholders_to_joinrel(). We have add_grouping_info_to_base_rels(), but > do > not have corresponding function for joinrels. How do we push aggregates > involving two or more relations to the corresponding joinrels? add_grouping_info_to_base_rels() calls prepare_rel_for_grouping() which actually adds the "grouping info". For join, prepare_rel_for_grouping() is called from build_join_rel(). While PlaceHolderVars need to be finalized before planner starts to create joins, the GroupedPathInfo has to fit particular pair of joined relations. Perhaps create_grouping_info_... would be better, to indicate that the thing we're adding does not exist yet. I'll think about it. > Some
Re: [HACKERS] WIP: Aggregation push-down
Jeevan Chalke wrote: > 1. > + if (aggref->aggvariadic || > + aggref->aggdirectargs || aggref->aggorder || > + aggref->aggdistinct || aggref->aggfilter) > > I did not understand, why you are not pushing aggregate in above cases? > Can you please explain? Currently I'm trying to implement infrastructure to propagate result of partial aggregation from base relation or join to the root of the join tree and to use these as input for the final aggregation. Some special cases are disabled because I haven't thought about them enough so far. Some of these restrictions may be easy to relax, others may be hard. I'll get back to them at some point. > 2. "make check" in contrib/postgres_fdw crashes. > > SELECT COUNT(*) FROM ft1 t1; > ! server closed the connection unexpectedly > ! This probably means the server terminated abnormally > ! before or while processing the request. > ! connection to server was lost > > From your given setup, if I wrote a query like: > EXPLAIN VERBOSE SELECT count(*) FROM orders_0; > it crashes. > > Seems like missing few checks. Please consider this a temporary limitation. > 3. In case of pushing partial aggregation on the remote side, you use schema > named "partial", I did not get that change. If I have AVG() aggregate, > then I end up getting an error saying > "ERROR: schema "partial" does not exist". Attached to his email is an extension that I hacked quickly at some point, for experimental purposes only. It's pretty raw but may be useful if you just want to play with it. > 4. I am not sure about the code changes related to pushing partial > aggregate on the remote side. Basically, you are pushing a whole aggregate > on the remote side and result of that is treated as partial on the > basis of aggtype = transtype. But I am not sure that is correct unless > I miss something here. The type may be same but the result may not. You're right. I mostly used sum() and count() in my examples but these are special cases. It should also be checked whether aggfinalfn exists or not. I'll fix it, thanks. > 5. There are lot many TODO comments in the patch-set, are you working > on those? Yes. I wasn't too eager to complete all the TODOs soon because reviews of the partch may result in a major rework. And if large parts of the code will have to be wasted, some / many TODOs will be gone as well. > Thanks > > On Thu, Aug 17, 2017 at 8:52 PM, Antonin Houska wrote: > > Antonin Houska wrote: > > > Antonin Houska wrote: > > > > > This is a new version of the patch I presented in [1]. > > > > Rebased. > > > > cat .git/refs/heads/master > > b9a3ef55b253d885081c2d0e9dc45802cab71c7b > > This is another version of the patch. > > Besides other changes, it enables the aggregation push-down for postgres_fdw, > although only for aggregates whose transient aggregation state is equal to > the > output type. For other aggregates (like avg()) the remote nodes will have to > return the transient state value in an appropriate form (maybe bytea type), > which does not depend on PG version. > > shard.tgz demonstrates the typical postgres_fdw use case. One query shows > base > scans of base relation's partitions being pushed to shard nodes, the other > pushes down a join and performs aggregation of the join result on the remote > node. Of course, the query can only references one particular partition, > until > the "partition-wise join" [1] patch gets committed and merged with this my > patch. > > One thing I'm not sure about is whether the patch should remove > GetForeignUpperPaths function from FdwRoutine, which it essentially makes > obsolete. Or should it only be deprecated so far? I understand that > deprecation is important for "ordinary SQL users", but FdwRoutine is an > interface for extension developers, so the policy might be different. > > [1] https://commitfest.postgresql.org/14/994/ > > Any feedback is appreciated. > > -- > Antonin Houska > Cybertec Schönig & Schönig GmbH > Gröhrmühlgasse 26 > A-2700 Wiener Neustadt > Web: http://www.postgresql-support.de, http://www.cybertec.at > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- > Jeevan Chalke > Principal Software Engineer, Product Development > EnterpriseDB Corporation > The Enterprise PostgreSQL Company > > -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at partial.tgz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Aggregation push-down
On Thu, Aug 17, 2017 at 10:22 AM, Antonin Houska wrote: > Antonin Houska wrote: > output type. For other aggregates (like avg()) the remote nodes will have to > return the transient state value in an appropriate form (maybe bytea type), > which does not depend on PG version. Hm, that seems like an awful lot of work (new version agnostic serialization format) for very little benefit (version independent type serialization for remote aggregate pushdown). How about forcing the version to be the same for the feature to be used? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Aggregation push-down
On Thu, Aug 17, 2017 at 8:52 PM, Antonin Houska wrote: > Antonin Houska wrote: > >> Antonin Houska wrote: >> >> > This is a new version of the patch I presented in [1]. >> >> Rebased. >> >> cat .git/refs/heads/master >> b9a3ef55b253d885081c2d0e9dc45802cab71c7b > > This is another version of the patch. > > Besides other changes, it enables the aggregation push-down for postgres_fdw, > although only for aggregates whose transient aggregation state is equal to the > output type. For other aggregates (like avg()) the remote nodes will have to > return the transient state value in an appropriate form (maybe bytea type), > which does not depend on PG version. Having transient aggregation state type same as output type doesn't mean that we can feed the output of remote aggregation as is to the finalization stage locally or finalization is not needed at all. I am not sure why is that test being used to decide whether or not to push an aggregate (I assume they are partial aggregates) to the foreign server. postgres_fdw doesn't have a feature to fetch partial aggregate results from the foreign server. Till we do that, I don't think this patch can push any partial aggregates down to the foreign server. > > shard.tgz demonstrates the typical postgres_fdw use case. One query shows base > scans of base relation's partitions being pushed to shard nodes, the other > pushes down a join and performs aggregation of the join result on the remote > node. Of course, the query can only references one particular partition, until > the "partition-wise join" [1] patch gets committed and merged with this my > patch. Right. Until then joins will not have children. > > One thing I'm not sure about is whether the patch should remove > GetForeignUpperPaths function from FdwRoutine, which it essentially makes > obsolete. Or should it only be deprecated so far? I understand that > deprecation is important for "ordinary SQL users", but FdwRoutine is an > interface for extension developers, so the policy might be different. I doubt if that's correct. We can do that only when we know that all kinds aggregates/grouping can be pushed down the join tree, which looks impossible to me. Apart from that that FdwRoutine handles all kinds of upper relations like windowing, distinct, ordered which are not pushed down by this set of patches. Here's review of the patchset The patches compile cleanly when applied together. Testcase "limit" fails in make check. This is a pretty large patchset and the patches are divided by stages in the planner rather than functionality. IOW, no single patch provides a full functionality. Reviewing and probably committing such patches is not easy and may take quite long. Instead, if the patches are divided by functionality, i.e. each patch provides some functionality completely, it will be easy to review and commit such patches with each commit giving something useful. I had suggested breaking patches on that line at [1]. The patches also refactor existing code. It's better to move such refactoring in a patch/es by itself. The patches need more comments, explaining various decisions e.g. if (joinrel) { /* Create GatherPaths for any useful partial paths for rel */ -generate_gather_paths(root, joinrel); +generate_gather_paths(root, joinrel, false); /* Find and save the cheapest paths for this joinrel */ set_cheapest(joinrel); For base relations, the patch calls generate_gather_paths() with grouped as true and false, but for join relations it doesn't do that. There is no comment explaining why. OR in the following code +/* + * Do partial aggregation at base relation level if the relation is + * eligible for it. + */ +if (rel->gpi != NULL) +create_grouped_path(root, rel, path, false, true, AGG_HASHED); Why not to consider AGG_SORTED paths? The patch maintains an extra rel target, two pathlists, partial pathlist and non-partial pathlist for grouping in RelOptInfo. These two extra pathlists require "grouped" argument to be passed to a number of functions. Instead probably it makes sense to create a RelOptInfo for aggregation/grouping and set pathlist and partial_pathlist in that RelOptInfo. That will also make a place for keeping grouped estimates. For placeholders we have two function add_placeholders_to_base_rels() and add_placeholders_to_joinrel(). We have add_grouping_info_to_base_rels(), but do not have corresponding function for joinrels. How do we push aggregates involving two or more relations to the corresponding joinrels? Some minor assorted comments Avoid useless hunk. @@ -279,8 +301,8 @@ add_paths_to_joinrel(PlannerInfo *root, * joins, because there may be no other alternative. */ if (enable_hashjoin || jointype == JOIN_FULL) -hash_inner_and_outer(root, joinrel, outerrel, innerrel, - jointype, &extra); +ha
Re: [HACKERS] WIP: Aggregation push-down
Hi Antonin, To understand the feature you have proposed, I have tried understanding your patch. Here are few comments so far on it: 1. + if (aggref->aggvariadic || + aggref->aggdirectargs || aggref->aggorder || + aggref->aggdistinct || aggref->aggfilter) I did not understand, why you are not pushing aggregate in above cases? Can you please explain? 2. "make check" in contrib/postgres_fdw crashes. SELECT COUNT(*) FROM ft1 t1; ! server closed the connection unexpectedly ! This probably means the server terminated abnormally ! before or while processing the request. ! connection to server was lost >From your given setup, if I wrote a query like: EXPLAIN VERBOSE SELECT count(*) FROM orders_0; it crashes. Seems like missing few checks. 3. In case of pushing partial aggregation on the remote side, you use schema named "partial", I did not get that change. If I have AVG() aggregate, then I end up getting an error saying "ERROR: schema "partial" does not exist". 4. I am not sure about the code changes related to pushing partial aggregate on the remote side. Basically, you are pushing a whole aggregate on the remote side and result of that is treated as partial on the basis of aggtype = transtype. But I am not sure that is correct unless I miss something here. The type may be same but the result may not. 5. There are lot many TODO comments in the patch-set, are you working on those? Thanks On Thu, Aug 17, 2017 at 8:52 PM, Antonin Houska wrote: > Antonin Houska wrote: > > > Antonin Houska wrote: > > > > > This is a new version of the patch I presented in [1]. > > > > Rebased. > > > > cat .git/refs/heads/master > > b9a3ef55b253d885081c2d0e9dc45802cab71c7b > > This is another version of the patch. > > Besides other changes, it enables the aggregation push-down for > postgres_fdw, > although only for aggregates whose transient aggregation state is equal to > the > output type. For other aggregates (like avg()) the remote nodes will have > to > return the transient state value in an appropriate form (maybe bytea type), > which does not depend on PG version. > > shard.tgz demonstrates the typical postgres_fdw use case. One query shows > base > scans of base relation's partitions being pushed to shard nodes, the other > pushes down a join and performs aggregation of the join result on the > remote > node. Of course, the query can only references one particular partition, > until > the "partition-wise join" [1] patch gets committed and merged with this my > patch. > > One thing I'm not sure about is whether the patch should remove > GetForeignUpperPaths function from FdwRoutine, which it essentially makes > obsolete. Or should it only be deprecated so far? I understand that > deprecation is important for "ordinary SQL users", but FdwRoutine is an > interface for extension developers, so the policy might be different. > > [1] https://commitfest.postgresql.org/14/994/ > > Any feedback is appreciated. > > -- > Antonin Houska > Cybertec Schönig & Schönig GmbH > Gröhrmühlgasse 26 > A-2700 Wiener Neustadt > Web: http://www.postgresql-support.de, http://www.cybertec.at > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- Jeevan Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] WIP: Aggregation push-down
Antonin Houska wrote: > Antonin Houska wrote: > > > This is a new version of the patch I presented in [1]. > > Rebased. > > cat .git/refs/heads/master > b9a3ef55b253d885081c2d0e9dc45802cab71c7b This is another version of the patch. Besides other changes, it enables the aggregation push-down for postgres_fdw, although only for aggregates whose transient aggregation state is equal to the output type. For other aggregates (like avg()) the remote nodes will have to return the transient state value in an appropriate form (maybe bytea type), which does not depend on PG version. shard.tgz demonstrates the typical postgres_fdw use case. One query shows base scans of base relation's partitions being pushed to shard nodes, the other pushes down a join and performs aggregation of the join result on the remote node. Of course, the query can only references one particular partition, until the "partition-wise join" [1] patch gets committed and merged with this my patch. One thing I'm not sure about is whether the patch should remove GetForeignUpperPaths function from FdwRoutine, which it essentially makes obsolete. Or should it only be deprecated so far? I understand that deprecation is important for "ordinary SQL users", but FdwRoutine is an interface for extension developers, so the policy might be different. [1] https://commitfest.postgresql.org/14/994/ Any feedback is appreciated. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at agg_pushdown_v3.tgz Description: GNU Zip compressed data shard.tgz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers