Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-06 15:31: Thanks for the explanation. I understand that the method of comparing two function name strings is incorrect. Instead, I added the parameter isaggpartialfunc indicating whether the aggregate function and its aggpartialfunc are the same or different. Hi. This seems to be more robust, but the interface became more strange. I'm not sure what to do with it. Some ideas I had to avoid introducing this parameter. Not sure I like any of them. 1) You can use QualifiedNameGetCreationNamespace() for aggpartialfnName and still compare namespace and function name for it and aggName, aggNamespace. Seems to be not ideal, but avoids introducing new parameters. 2) You can lookup for partial aggregate function after ProcedureCreate() in AggregateCreate(), if it wasn't found at earlier stages. If it is the aggregate itself - check it. If it's still not found, error out. Also seems to be a bit ugly - you leave uncommitted garbage for vacuum in catalogue. Another issue - the patch misses recording dependency between aggpartialfn and aggregate procedure. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-06 06:08: Hi Mr.Pyhalov. Thank you for your always thoughtful review. From: Alexander Pyhalov Sent: Monday, June 5, 2023 6:00 PM Have found one issue - src/backend/catalog/pg_aggregate.c 585 if(strcmp(strVal(linitial(aggpartialfnName)), aggName) == 0){ 586 if(((aggTransType != INTERNALOID) && (finalfn != InvalidOid)) 587 || ((aggTransType == INTERNALOID) && (finalfn != serialfn))) 588 elog(ERROR, "%s is not its own aggpartialfunc", aggName); 589 } else { Here string comparison of aggName and aggpartialfnName looks very suspicios - it seems you should compare oids, not names (in this case, likely oids of transition function and partial aggregation function). The fact that aggregate name matches partial aggregation function name is not a enough to make any decisions. I see no problem with this string comparison. Here is the reason. The intent of this code is, to determine whether the user specifies the new aggregate function whose aggpartialfunc is itself. For two aggregate functions, If the argument list and function name match, then the two aggregate functions must match. By definition of aggpartialfunc, every aggregate function and its aggpartialfn must have the same argument list. Thus, if aggpartialfnName and aggName are equal as strings, I think it is correct to determine that the user is specifying the new aggregate function whose aggpartialfunc is itself. However, since the document does not state these intentions I think your suspicions are valid. Therefore, I have added a specification to the document reflecting the above intentions. Hi. Let me explain. Look at this example, taken from test. CREATE AGGREGATE udf_avg_p_int4(int4) ( sfunc = int4_avg_accum, stype = _int8, combinefunc = int4_avg_combine, initcond = '{0,0}' ); CREATE AGGREGATE udf_sum(int4) ( sfunc = int4_avg_accum, stype = _int8, finalfunc = int8_avg, combinefunc = int4_avg_combine, initcond = '{0,0}', aggpartialfunc = udf_avg_p_int4 ); Now, let's create another aggregate. # create schema test ; create aggregate test.udf_avg_p_int4(int4) ( sfunc = int4_avg_accum, stype = _int8, finalfunc = int8_avg, combinefunc = int4_avg_combine, initcond = '{0,0}', aggpartialfunc = udf_avg_p_int4 ); ERROR: udf_avg_p_int4 is not its own aggpartialfunc What's the difference between test.udf_avg_p_int4(int4) aggregate and udf_sum(int4)? They are essentially the same, but second one can't be defined. Also note difference: # CREATE AGGREGATE udf_sum(int4) ( sfunc = int4_avg_accum, stype = _int8, finalfunc = int8_avg, combinefunc = pg_catalog.int4_avg_combine, initcond = '{0,0}', aggpartialfunc = udf_avg_p_int4 ); CREATE AGGREGATE # CREATE AGGREGATE udf_sum(int4) ( sfunc = int4_avg_accum, stype = _int8, finalfunc = int8_avg, combinefunc = pg_catalog.int4_avg_combine, initcond = '{0,0}', aggpartialfunc = public.udf_avg_p_int4 ); ERROR: aggpartialfnName is invalid It seems that assumption about aggpartialfnName - that it's a non-qualified name is incorrect. And if we use qualified names, we can't compare just names, likely we should compare oids. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
On Fri, Jun 2, 2023 at 03:54:06AM +, fujii.y...@df.mitsubishielectric.co.jp wrote: > Hi Mr.Bruce, hackers. > > I updated the patch. > The following is a list of comments received on the previous version of the > patch > and my update to them in this version of the patch. This thread started in October 2021 so I would like to explain what this feature adds. Basically for partitions made up of postgres_fdw tables, there are four possible optimizations: 1. Pruning, 3 stages, see slide 30 here: https://momjian.us/main/writings/pgsql/partitioning.pdf#page=30 2. Parallelism across partitions, see slide 38 here: https://momjian.us/main/writings/pgsql/beyond.pdf#page=38 3. Pushdown of partition-wise joins and aggregates, see slide 43 here: https://momjian.us/main/writings/pgsql/partitioning.pdf#page=43 4. Pushdown of aggregates that aren't partition-wise As far as I know, over the years we have accomplished all of these items, except for #4. #3 involves aggregates where the GROUP BY or JOINed tables match the partition keys. Number 4 involves things like a SUM our COUNT that does not match the partition key, or has no groupings at all. #3 is easier than #4 since we just need to pass _rows_ back from the foreign servers. #4 is more complex because _partial_ count/sum, or even average values must be passed from the foreign servers to the requesting server. The good news is that we already have partial aggregate support as part of our parallel aggregate feature, see: https://momjian.us/main/writings/pgsql/beyond.pdf#page=38 What the patch does is to expand the existing partial aggregate code to allow partial aggregate results to pass from the foreign servers to the requesting server. This feature will be very useful for data warehouse queries that need to compute aggregate across partitions. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Partial aggregates pushdown
Bruce Momjian писал 2023-06-05 19:26: On Mon, Jun 5, 2023 at 12:00:27PM +0300, Alexander Pyhalov wrote: Note that after these changes "select sum()" will fail for certain cases, when remote server version is not the latest. In other cases we tried to preserve compatibility. Should we have a switch for a foreign server to turn this optimization off? Or do we just state that users should use different workarounds if remote server version doesn't match local one? We covered this in April in this and previous emails: https://www.postgresql.org/message-id/ZDGTza4rovCa%2BN3d%40momjian.us We don't check the version number for _any_ builtin functions so why would we need to check for aggregate pushdown? Yes, these will be new functions in PG 17, we have added functions regularly in major releases and have never heard reports of problems about that. Hi. I've seen this message. But introduction of new built-in function will break requests to old servers only if this new function is used in the request (this means that query changes). However, this patch changes the behavior of old queries, which worked prior to update. This seems to be different to me. Also I see that in connection.c (configure_remote_session()), we care about old PostgreSQL versions. And now we make querying them more tricky. Is it consistent? Do you think that enable_partitionwise_aggregate is a good enough protection in this cases? In documentation I see "F.38.7. Cross-Version Compatibility postgres_fdw can be used with remote servers dating back to PostgreSQL 8.3. Read-only capability is available back to 8.1. A limitation however is that postgres_fdw generally assumes that immutable built-in functions and operators are safe to send to the remote server for execution, if they appear in a WHERE clause for a foreign table. Thus, a built-in function that was added since the remote server's release might be sent to it for execution, resulting in “function does not exist” or a similar error. This type of failure can be worked around by rewriting the query, for example by embedding the foreign table reference in a sub-SELECT with OFFSET 0 as an optimization fence, and placing the problematic function or operator outside the sub-SELECT." Likely, this paragraph should be expanded to state that partition-wise aggregation for many functions can fail to work with old foreign servers. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
On Mon, Jun 5, 2023 at 12:00:27PM +0300, Alexander Pyhalov wrote: > Note that after these changes "select sum()" will fail for certain cases, > when remote server version is not the latest. In other cases we tried > to preserve compatibility. Should we have a switch for a foreign server to > turn this optimization off? Or do we just state that users > should use different workarounds if remote server version doesn't match > local one? We covered this in April in this and previous emails: https://www.postgresql.org/message-id/ZDGTza4rovCa%2BN3d%40momjian.us We don't check the version number for _any_ builtin functions so why would we need to check for aggregate pushdown? Yes, these will be new functions in PG 17, we have added functions regularly in major releases and have never heard reports of problems about that. This patch will filter pushdown based on the FDW extension whitelist: https://www.postgresql.org/message-id/20230408041614.wfasmdm46bupbif4%40awork3.anarazel.de -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2023-06-02 06:54: Hi Mr.Bruce, hackers. I updated the patch. The following is a list of comments received on the previous version of the patch and my update to them in this version of the patch. Hi. I've looked through the last version of the patch. Have found one issue - src/backend/catalog/pg_aggregate.c 585 if(strcmp(strVal(linitial(aggpartialfnName)), aggName) == 0){ 586 if(((aggTransType != INTERNALOID) && (finalfn != InvalidOid)) 587 || ((aggTransType == INTERNALOID) && (finalfn != serialfn))) 588 elog(ERROR, "%s is not its own aggpartialfunc", aggName); 589 } else { Here string comparison of aggName and aggpartialfnName looks very suspicios - it seems you should compare oids, not names (in this case, likely oids of transition function and partial aggregation function). The fact that aggregate name matches partial aggregation function name is not a enough to make any decisions. In documentation doc/src/sgml/postgres-fdw.sgml: 930postgres_fdw attempts to optimize remote queries to reduce 931the amount of data transferred from foreign servers. This is done by 932sending query WHERE clauses and ggregate expressions 933to the remote server for execution, and by not retrieving table columns that 934are not needed for the current query. 935To reduce the risk of misexecution of queries, 936WHERE clauses and ggregate expressions are not sent to 937the remote server unless they use only data types, operators, and functions 938that are built-in or belong to an extension that's listed in the foreign 939server's extensions option. 940Operators and functions in such clauses must 941be IMMUTABLE as well. there are misprints in lines 932 and 936 - missing "a" in "aggregate" expressions. Note that after these changes "select sum()" will fail for certain cases, when remote server version is not the latest. In other cases we tried to preserve compatibility. Should we have a switch for a foreign server to turn this optimization off? Or do we just state that users should use different workarounds if remote server version doesn't match local one? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
On Thu, Apr 13, 2023 at 10:56:26AM +, fujii.y...@df.mitsubishielectric.co.jp wrote: > > Yes, good. Did we never push down aggregates before? I thought we > > pushed down partitionwise aggregates already, and such a check should > > already be done. If the check isn't there, it should be. > Yes. The last version of this patch(and original postgres_fdw) checks if > user-defined aggregate depends some extension which is contained in > 'extensions'. > But, in the last version of this patch, there is no such check for > aggpartialfn of user-defined aggregate. So, I will add such check to this > patch. > I think that this modification is easy to do . Good, so our existing code is correct and the patch just needs adjustment. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Partial aggregates pushdown
On Wed, Nov 30, 2022 at 3:12 AM Alexander Pyhalov wrote: > 1) In previous version of the patch aggregates, which had partialaggfn, > were ok to push down. And it was a definite sign that aggregate can be > pushed down. Now we allow pushing down an aggregate, which prorettype is > not internal and aggfinalfn is not defined. Is it safe for all > user-defined (or builtin) aggregates, even if they are generally > shippable? I think that this is exactly the correct test. Here's how to think about it: to perform an aggregate, you merge all the values into the transition state, and then you apply the final function once at the end. So the process looks like this: TRANSITION_STATE_0 + VALUE_1 = TRANSITION_STATE_1 TRANSITION_STATE_1 + VALUE_2 = TRANSITION_STATE_2 ... TRANSITION_STATE_N => RESULT Here, + represents applying the transition function and => represents applying the final function. In the case of parallel query, we want every worker to be able to incorporate values into its own transition states and then merge all the transition states at the end. That's a problem, because the transition function expects a transition state and a value, not two transition states. So we invented the idea of a "combine" function to solve this problem. A combine function takes two transition states and produces a new transition state. That allows each worker to create an initially empty transition state, merge a bunch of values into it, and then pass the result back to the leader, which can combine all the transition states using the combine function, and then apply the final function at the end. The same kind of idea works here. If we want to push down an entire aggregate, there's no problem, provided the remote side supports it: just push down the whole operation and get the result. But if we want to push down part of the aggregate, then what we want to get back is a transition value that we can then combine with other values (using the transition function) or other transition states (using the combine function) locally. That's tricky, because there's no SQL syntax to ask the remote side to give us the transition value rather than the final value. I think we would need to add that to solve this problem in its full generality. However, in the special case where there's no final function, the problem goes away, because then a transition value and a result are identical. If we ask for a result, we can treat it as a transition value, and there's no problem. Internal values are a problem. Generally, you don't see internal as the return type for an aggregate, because then the aggregate couldn't be called by the user. An internal value can't be returned. However, it's pretty common to see an aggregate that has an internal value as a transition type, and something else as the result type. In such cases, even if we had some syntax telling the remote side to send the transition value rather than the final value, it would not be sufficient, because the internal value still couldn't be transmitted. This problem also arises for parallel query, where we want to move transition values between processes within a single database cluster. We solved that problem using aggserialfn and aggdeserialfn. aggserialfn converts an internal transition value (which can't be moved between processes) into a bytea, and aggdeserialfn does the reverse. Maybe we would adopt the same solution here: our syntax that tells the remote side to give us the transition value rather than the final value could also tell the remote side to serialize it to bytea if it's an internal type. However, if we did this, we'd have to be sure that our deserialization functions were pretty well hardened against unexpected or even malicious input, because who knows whether that remote server is really going to send us a bytea in the format that we're expecting to get? Anyway, for the present patch, I think that testing whether there's a final function is the right thing, and testing whether the return type is internal doesn't hurt. If we want to extend this to other cases in the future, then I think we need syntax to ask the remote side for the unfinalized aggregate, like SELECT UNFINALIZED MAX(a) FROM t1, or whatever. I'm not sure what the best concrete SQL syntax is - probably not that. -- Robert Haas EDB: http://www.enterprisedb.com
RE: Partial aggregates pushdown
Hi Mr.Momjian. > > There is one more thing I would like your opinion on. > > As the major version of PostgreSQL increase, it is possible that the > > new builtin aggregate functions are added to the newer PostgreSQL. > > This patch assume that aggpartialfns definitions exist in BKI files. > > Due to this assumption, someone should add aggpartialfns definitions of > new builtin aggregate functions to BKI files. > > There are two possible ways to address this issue. Would the way1 be > sufficient? > > Or would way2 be preferable? > > way1) Adding documentaion for how to add these definitions to BKI files > > way2) Improving this patch to automatically add these definitions to BKI > files by some tool such as initdb. > > I think documentation is sufficient. You already showed that someone can do > this with CREATE AGGREGATE for non-builtin aggregates. Thank you for your opinion. I will modify this patch according to the way1. > > > So, let's remove the PARTIALAGG_MINVERSION option from the patch and > > > just make it automatic --- we push down builtin partial aggregates > > > if the remote server is the same or newer _major_ version than the > > > sending server. For extensions, if people have older extensions on > > > the same or newer foreign servers, they can adjust 'extensions' above. > > Okay, I understand. I will remove PARTIALAGG_MINVERSION option from > > the patch and I will add check whether aggpartialfn depends on some > > extension which is containd in extensions list of the postgres_fdw's foreign > server. > > Yes, good. Did we never push down aggregates before? I thought we > pushed down partitionwise aggregates already, and such a check should > already be done. If the check isn't there, it should be. Yes. The last version of this patch(and original postgres_fdw) checks if user-defined aggregate depends some extension which is contained in 'extensions'. But, in the last version of this patch, there is no such check for aggpartialfn of user-defined aggregate. So, I will add such check to this patch. I think that this modification is easy to do . > In summary, we don't do any version check for built-in function pushdown, so > we don't need it for aggregates either. We check extension functions against > the extension pushdown list, so we should be checking this for partial > aggregate pushdown, and for partition-wise aggregate pushdown. If we don't > do that last check already, it is a bug. I understood. Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation
Re: Partial aggregates pushdown
On Thu, Apr 13, 2023 at 02:12:44AM -0400, Bruce Momjian wrote: > > In the next version of this patch, > > we can pushdown partial aggregate for an user-defined aggregate function > > only > > when the function pass through this check. > > Understood. In summary, we don't do any version check for built-in function pushdown, so we don't need it for aggregates either. We check extension functions against the extension pushdown list, so we should be checking this for partial aggregate pushdown, and for partition-wise aggregate pushdown. If we don't do that last check already, it is a bug. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Partial aggregates pushdown
On Mon, Apr 10, 2023 at 01:18:37AM +, fujii.y...@df.mitsubishielectric.co.jp wrote: > > Uh, we actually want the patch to implement partial aggregate pushdown for > > all > > builtin data types that can support it. Is that done? I think it is only > > extension > > aggregates, which we do not control, that need this documentation. > The last version of this patch can't pushdown partial aggregate for all > builtin aggregate functions that can support it. > I will improve this patch to pushdown partial aggregate for all builtin > aggregate functions > that can support it. > > There is one more thing I would like your opinion on. > As the major version of PostgreSQL increase, it is possible that > the new builtin aggregate functions are added to the newer PostgreSQL. > This patch assume that aggpartialfns definitions exist in BKI files. > Due to this assumption, someone should add aggpartialfns definitions of new > builtin aggregate functions to BKI files. > There are two possible ways to address this issue. Would the way1 be > sufficient? > Or would way2 be preferable? > way1) Adding documentaion for how to add these definitions to BKI files > way2) Improving this patch to automatically add these definitions to BKI > files by some tool such as initdb. I think documentation is sufficient. You already showed that someone can do this with CREATE AGGREGATE for non-builtin aggregates. > > So, let's remove the PARTIALAGG_MINVERSION option from the patch and just > > make it automatic --- we push down builtin partial aggregates if the remote > > server is the same or newer _major_ version than the sending server. For > > extensions, if people have older extensions on the same or newer foreign > > servers, they can adjust 'extensions' above. > Okay, I understand. I will remove PARTIALAGG_MINVERSION option from the patch > and I will add check whether aggpartialfn depends on some extension which > is containd in extensions list of the postgres_fdw's foreign server. Yes, good. Did we never push down aggregates before? I thought we pushed down partitionwise aggregates already, and such a check should already be done. If the check isn't there, it should be. > In the next version of this patch, > we can pushdown partial aggregate for an user-defined aggregate function only > when the function pass through this check. Understood. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
RE: Partial aggregates pushdown
Hi Mr.Momjian, Mr.Lane, Mr.Freund. Thank you for advices. > From: Bruce Momjian > > > > 2. Automation of creating definition of partialaggfuncs In > > > > development of v17, I manually create definition of > > > > partialaggfuncs for avg, min, max, sum, > > > count. > > > > I am concerned that this may be undesirable. > > > > So I am thinking that v17 should be modified to automate creating > > > > definition of partialaggfuncs for all built-in aggregate functions. > > > > > > Are there any other builtin functions that need this? I think we > > > can just provide documention for extensions on how to do this. > > For practical purposes, it is sufficient if partial aggregate for the > > above functions can be pushed down. > > I think you are right, it would be sufficient to document how to > > achieve partial aggregate pushdown for other built-in functions. > > Uh, we actually want the patch to implement partial aggregate pushdown for all > builtin data types that can support it. Is that done? I think it is only > extension > aggregates, which we do not control, that need this documentation. The last version of this patch can't pushdown partial aggregate for all builtin aggregate functions that can support it. I will improve this patch to pushdown partial aggregate for all builtin aggregate functions that can support it. There is one more thing I would like your opinion on. As the major version of PostgreSQL increase, it is possible that the new builtin aggregate functions are added to the newer PostgreSQL. This patch assume that aggpartialfns definitions exist in BKI files. Due to this assumption, someone should add aggpartialfns definitions of new builtin aggregate functions to BKI files. There are two possible ways to address this issue. Would the way1 be sufficient? Or would way2 be preferable? way1) Adding documentaion for how to add these definitions to BKI files way2) Improving this patch to automatically add these definitions to BKI files by some tool such as initdb. > From: Bruce Momjian > On Fri, Apr 7, 2023 at 09:16:14PM -0700, Andres Freund wrote: > > On 2023-04-07 22:53:53 -0400, Bruce Momjian wrote: > > > > postgres_fdw has no business pushing down calls to non-builtin > > > > functions unless the user has explicitly authorized that via the > > > > existing whitelisting mechanism. I think you're reinventing the > > > > wheel, and not very well. > > > > > > The patch has you assign an option at CREATE AGGREGATE time if it > > > can do push down, and postgres_fdw checks that. What whitelisting > > > mechanism are you talking about? async_capable? > > > > extensions (string) > > > > This option is a comma-separated list of names of PostgreSQL > extensions that are installed, in compatible versions, on both the local and > remote servers. Functions and operators that are immutable and belong to a > listed extension will be considered shippable to the remote server. This > option > can only be specified for foreign servers, not per-table. > > > > When using the extensions option, it is the user's responsibility that > > the > listed extensions exist and behave identically on both the local and remote > servers. Otherwise, remote queries may fail or behave unexpectedly. > > Okay, this is very helpful --- it is exactly the issue we are dealing with > --- how > can we know if partial aggregate functions exists on the remote server. (I > knew I was going to need API help on this.) > > So, let's remove the PARTIALAGG_MINVERSION option from the patch and just > make it automatic --- we push down builtin partial aggregates if the remote > server is the same or newer _major_ version than the sending server. For > extensions, if people have older extensions on the same or newer foreign > servers, they can adjust 'extensions' above. Okay, I understand. I will remove PARTIALAGG_MINVERSION option from the patch and I will add check whether aggpartialfn depends on some extension which is containd in extensions list of the postgres_fdw's foreign server. In the next version of this patch, we can pushdown partial aggregate for an user-defined aggregate function only when the function pass through this check. Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation
Re: Partial aggregates pushdown
On Sat, Apr 8, 2023 at 10:15:40AM -0400, Bruce Momjian wrote: > On Fri, Apr 7, 2023 at 09:16:14PM -0700, Andres Freund wrote: > > extensions (string) > > > > This option is a comma-separated list of names of PostgreSQL extensions > > that are installed, in compatible versions, on both the local and remote > > servers. Functions and operators that are immutable and belong to a listed > > extension will be considered shippable to the remote server. This option > > can only be specified for foreign servers, not per-table. > > > > When using the extensions option, it is the user's responsibility that > > the listed extensions exist and behave identically on both the local and > > remote servers. Otherwise, remote queries may fail or behave unexpectedly. > > Okay, this is very helpful --- it is exactly the issue we are dealing > with --- how can we know if partial aggregate functions exists on the > remote server. (I knew I was going to need API help on this.) > > So, let's remove the PARTIALAGG_MINVERSION option from the patch and > just make it automatic --- we push down builtin partial aggregates if > the remote server is the same or newer _major_ version than the sending > server. For extensions, if people have older extensions on the same or > newer foreign servers, they can adjust 'extensions' above. Looking further, I don't see any cases where we check if a builtin function added in a major release also exists on the foreign server, so maybe we don't need any checks but just need a mention in the release notes. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Partial aggregates pushdown
On Fri, Apr 7, 2023 at 09:16:14PM -0700, Andres Freund wrote: > On 2023-04-07 22:53:53 -0400, Bruce Momjian wrote: > > > postgres_fdw has no business pushing down calls to non-builtin functions > > > unless the user has explicitly authorized that via the existing > > > whitelisting mechanism. I think you're reinventing the wheel, > > > and not very well. > > > > The patch has you assign an option at CREATE AGGREGATE time if it can do > > push down, and postgres_fdw checks that. What whitelisting mechanism > > are you talking about? async_capable? > > extensions (string) > > This option is a comma-separated list of names of PostgreSQL extensions > that are installed, in compatible versions, on both the local and remote > servers. Functions and operators that are immutable and belong to a listed > extension will be considered shippable to the remote server. This option can > only be specified for foreign servers, not per-table. > > When using the extensions option, it is the user's responsibility that > the listed extensions exist and behave identically on both the local and > remote servers. Otherwise, remote queries may fail or behave unexpectedly. Okay, this is very helpful --- it is exactly the issue we are dealing with --- how can we know if partial aggregate functions exists on the remote server. (I knew I was going to need API help on this.) So, let's remove the PARTIALAGG_MINVERSION option from the patch and just make it automatic --- we push down builtin partial aggregates if the remote server is the same or newer _major_ version than the sending server. For extensions, if people have older extensions on the same or newer foreign servers, they can adjust 'extensions' above. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Partial aggregates pushdown
On 2023-04-07 22:53:53 -0400, Bruce Momjian wrote: > On Fri, Apr 7, 2023 at 10:44:09PM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > On Fri, Apr 7, 2023 at 09:55:00PM -0400, Tom Lane wrote: > > >> Uh, what? Why would we not be able to tell from the remote server's > > >> version number whether it has this ability? > > > > > The issue is not a mismatch of postgres_fdw versions but the extension > > > versions and whether the partial aggregate functions exist on the remote > > > side, e.g., something like a PostGIS upgrade. > > > > postgres_fdw has no business pushing down calls to non-builtin functions > > unless the user has explicitly authorized that via the existing > > whitelisting mechanism. I think you're reinventing the wheel, > > and not very well. > > The patch has you assign an option at CREATE AGGREGATE time if it can do > push down, and postgres_fdw checks that. What whitelisting mechanism > are you talking about? async_capable? extensions (string) This option is a comma-separated list of names of PostgreSQL extensions that are installed, in compatible versions, on both the local and remote servers. Functions and operators that are immutable and belong to a listed extension will be considered shippable to the remote server. This option can only be specified for foreign servers, not per-table. When using the extensions option, it is the user's responsibility that the listed extensions exist and behave identically on both the local and remote servers. Otherwise, remote queries may fail or behave unexpectedly.
Re: Partial aggregates pushdown
On Fri, Apr 7, 2023 at 10:53:53PM -0400, Bruce Momjian wrote: > On Fri, Apr 7, 2023 at 10:44:09PM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > On Fri, Apr 7, 2023 at 09:55:00PM -0400, Tom Lane wrote: > > >> Uh, what? Why would we not be able to tell from the remote server's > > >> version number whether it has this ability? > > > > > The issue is not a mismatch of postgres_fdw versions but the extension > > > versions and whether the partial aggregate functions exist on the remote > > > side, e.g., something like a PostGIS upgrade. > > > > postgres_fdw has no business pushing down calls to non-builtin functions > > unless the user has explicitly authorized that via the existing > > whitelisting mechanism. I think you're reinventing the wheel, > > and not very well. > > The patch has you assign an option at CREATE AGGREGATE time if it can do > push down, and postgres_fdw checks that. What whitelisting mechanism > are you talking about? async_capable? FYI, in the patch the CREATE AGGREGATE option is called PARTIALAGGFUNC. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Partial aggregates pushdown
On Fri, Apr 7, 2023 at 10:44:09PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > On Fri, Apr 7, 2023 at 09:55:00PM -0400, Tom Lane wrote: > >> Uh, what? Why would we not be able to tell from the remote server's > >> version number whether it has this ability? > > > The issue is not a mismatch of postgres_fdw versions but the extension > > versions and whether the partial aggregate functions exist on the remote > > side, e.g., something like a PostGIS upgrade. > > postgres_fdw has no business pushing down calls to non-builtin functions > unless the user has explicitly authorized that via the existing > whitelisting mechanism. I think you're reinventing the wheel, > and not very well. The patch has you assign an option at CREATE AGGREGATE time if it can do push down, and postgres_fdw checks that. What whitelisting mechanism are you talking about? async_capable? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Partial aggregates pushdown
Bruce Momjian writes: > On Fri, Apr 7, 2023 at 09:55:00PM -0400, Tom Lane wrote: >> Uh, what? Why would we not be able to tell from the remote server's >> version number whether it has this ability? > The issue is not a mismatch of postgres_fdw versions but the extension > versions and whether the partial aggregate functions exist on the remote > side, e.g., something like a PostGIS upgrade. postgres_fdw has no business pushing down calls to non-builtin functions unless the user has explicitly authorized that via the existing whitelisting mechanism. I think you're reinventing the wheel, and not very well. regards, tom lane
Re: Partial aggregates pushdown
On Fri, Apr 7, 2023 at 09:55:00PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > What I don't want is an error-prone setup where administrators have to > > remember what the per-server settings are. Based on your suggestion, > > let's allow CREATE SERVER to have an option 'enable_async_aggregate' (is > > that the right name?), which defaults to 'true' for _all_ servers, even > > those that don't support async aggregates. > > Uh, what? Why would we not be able to tell from the remote server's > version number whether it has this ability? That was covered here: https://www.postgresql.org/message-id/ZC95C0%2BPVhVP3iax%40momjian.us I think we have three possible cases for aggregate pushdown to FDWs: 1) Postgres built-in aggregate functions 2) Postgres user-defined & extension aggregate functions 3) aggregate functions calls to non-PG FDWs Your patch handles #1 by checking that the FDW Postgres version is the --> same as the calling Postgres version. However, it doesn't check for --> extension versions, and frankly, I don't see how we could implement that --> cleanly without significant complexity. The issue is not a mismatch of postgres_fdw versions but the extension versions and whether the partial aggregate functions exist on the remote side, e.g., something like a PostGIS upgrade. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Partial aggregates pushdown
Bruce Momjian writes: > What I don't want is an error-prone setup where administrators have to > remember what the per-server settings are. Based on your suggestion, > let's allow CREATE SERVER to have an option 'enable_async_aggregate' (is > that the right name?), which defaults to 'true' for _all_ servers, even > those that don't support async aggregates. Uh, what? Why would we not be able to tell from the remote server's version number whether it has this ability? regards, tom lane
Re: Partial aggregates pushdown
On Fri, Apr 7, 2023 at 09:25:52AM +, fujii.y...@df.mitsubishielectric.co.jp wrote: > Hi Mr.Momjian > > > First, my apologies for not addressing this sooner. I was so focused on my > > own tasks that I didn't realize this very important patch was not getting > > attention. I will try my best to get it into PG 17. > Thank you very much for your comments. > I will improve this patch for PG17. > I believe that this patch will help us use PostgreSQL's built-in sharding for > OLAP. Agreed! Again, my apologies for not helping with this _much_ sooner. You have done amazing work here. > > What amazes me is that you didn't need to create _any_ actual aggregate > > functions. Rather, you just needed to hook existing functions into the > > aggregate tables for partial FDW execution. > Yes. This patch enables partial aggregate pushdown using > only existing functions which belong to existing aggregate function > and are needed by parallel query(such as state transition function and > serialization function). > This patch does not need new types of function belonging to aggregate > functions > and does not need new functions belonging to existing aggregate functions. Very nice. > > I suggest we remove the version check requirement --- instead just document > > that the FDW Postgres version should be the same or newer than the calling > > Postgres server --- that way, we can assume that whatever is in the system > > catalogs of the caller is in the receiving side. > Thanks for the comment. I will modify this patch according to your comment. > > > We should add a GUC to turn off > > this optimization for cases where the FDW Postgres version is older than the > > caller. This handles case 1-2. > Thanks for the advice here too. > I thought it would be more appropriate to add a foregin server option of > postgres_fdw rather than adding GUC. > Would you mind if I ask you what you think about it? I like the GUC idea because it gives administrators a single place to check if the feature is enabled. However, I can imagine cases where you might have multiple remote FDW servers and some might be older than the sending server. What I don't want is an error-prone setup where administrators have to remember what the per-server settings are. Based on your suggestion, let's allow CREATE SERVER to have an option 'enable_async_aggregate' (is that the right name?), which defaults to 'true' for _all_ servers, even those that don't support async aggregates. With that, all FDW servers are enabled by default, and if the FDW extension supports async aggregates, they will automatically be pushed down and will report an error only if the remote FDW is too old to support it. > > > 2. Automation of creating definition of partialaggfuncs In development > > > of v17, I manually create definition of partialaggfuncs for avg, min, > > > max, sum, > > count. > > > I am concerned that this may be undesirable. > > > So I am thinking that v17 should be modified to automate creating > > > definition of partialaggfuncs for all built-in aggregate functions. > > > > Are there any other builtin functions that need this? I think we can just > > provide documention for extensions on how to do this. > For practical purposes, it is sufficient > if partial aggregate for the above functions can be pushed down. > I think you are right, it would be sufficient to document how to achieve > partial aggregate pushdown for other built-in functions. Uh, we actually want the patch to implement partial aggregate pushdown for all builtin data types that can support it. Is that done? I think it is only extension aggregates, which we do not control, that need this documentation. > > > 3. Documentation > > > I need add explanation of partialaggfunc to documents on postgres_fdw and > > other places. > > > > I can help with that once we decide on the above. > Thank you. In the next verion of this patch, I will add documents on > postgres_fdw > and other places. Good. > > I think 'partialaggfn' should be named 'aggpartialfn' to match other > > columns in > > pg_aggregate. > Thanks for the comment. I will modify this patch according to your comment. > > > For case 3, I don't even know how much pushdown those do of _any_ > > aggregates to non-PG servers, let along parallel FDW ones. Does anyone > > know the details? > To allow partial aggregate pushdown for non-PG FDWs, > I think we need to add pushdown logic to their FDWs for each function. > For example, we need to add logic avg() -> sum()/count() to their FDWs for > avg. > To allow parallel partial aggregate by non-PG FDWs, > I think we need to add FDW Routines for Asynchronous Execution to their > FDWs[1]. Okay, I think we can just implement this for 1-2 and let extensions worry about 3. > > I am confused by these changes to pg_aggegate: > > > > +{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum', > > + aggfinalfn => 'int8_avg_serialize', aggcombinefn => > > +'int8_avg
RE: Partial aggregates pushdown
Hi Mr.Momjian > First, my apologies for not addressing this sooner. I was so focused on my > own tasks that I didn't realize this very important patch was not getting > attention. I will try my best to get it into PG 17. Thank you very much for your comments. I will improve this patch for PG17. I believe that this patch will help us use PostgreSQL's built-in sharding for OLAP. > What amazes me is that you didn't need to create _any_ actual aggregate > functions. Rather, you just needed to hook existing functions into the > aggregate tables for partial FDW execution. Yes. This patch enables partial aggregate pushdown using only existing functions which belong to existing aggregate function and are needed by parallel query(such as state transition function and serialization function). This patch does not need new types of function belonging to aggregate functions and does not need new functions belonging to existing aggregate functions. > I suggest we remove the version check requirement --- instead just document > that the FDW Postgres version should be the same or newer than the calling > Postgres server --- that way, we can assume that whatever is in the system > catalogs of the caller is in the receiving side. Thanks for the comment. I will modify this patch according to your comment. > We should add a GUC to turn off > this optimization for cases where the FDW Postgres version is older than the > caller. This handles case 1-2. Thanks for the advice here too. I thought it would be more appropriate to add a foregin server option of postgres_fdw rather than adding GUC. Would you mind if I ask you what you think about it? > > 2. Automation of creating definition of partialaggfuncs In development > > of v17, I manually create definition of partialaggfuncs for avg, min, max, > > sum, > count. > > I am concerned that this may be undesirable. > > So I am thinking that v17 should be modified to automate creating > > definition of partialaggfuncs for all built-in aggregate functions. > > Are there any other builtin functions that need this? I think we can just > provide documention for extensions on how to do this. For practical purposes, it is sufficient if partial aggregate for the above functions can be pushed down. I think you are right, it would be sufficient to document how to achieve partial aggregate pushdown for other built-in functions. > > 3. Documentation > > I need add explanation of partialaggfunc to documents on postgres_fdw and > other places. > > I can help with that once we decide on the above. Thank you. In the next verion of this patch, I will add documents on postgres_fdw and other places. > I think 'partialaggfn' should be named 'aggpartialfn' to match other columns > in > pg_aggregate. Thanks for the comment. I will modify this patch according to your comment. > For case 3, I don't even know how much pushdown those do of _any_ > aggregates to non-PG servers, let along parallel FDW ones. Does anyone > know the details? To allow partial aggregate pushdown for non-PG FDWs, I think we need to add pushdown logic to their FDWs for each function. For example, we need to add logic avg() -> sum()/count() to their FDWs for avg. To allow parallel partial aggregate by non-PG FDWs, I think we need to add FDW Routines for Asynchronous Execution to their FDWs[1]. > I am confused by these changes to pg_aggegate: > > +{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum', > + aggfinalfn => 'int8_avg_serialize', aggcombinefn => > +'int8_avg_combine', > + aggserialfn => 'int8_avg_serialize', aggdeserialfn => > +'int8_avg_deserialize', > + aggtranstype => 'internal', aggtransspace => '48' }, > > ... > > +{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum', > + aggfinalfn => 'numeric_avg_serialize', aggcombinefn => > +'numeric_avg_combine', > + aggserialfn => 'numeric_avg_serialize', > + aggdeserialfn => 'numeric_avg_deserialize', > + aggtranstype => 'internal', aggtransspace => '128' }, > > Why are these marked as 'sum' but use 'avg' functions? This reason is that sum(int8)/sum(numeric) shares some functions with avg(int8)/avg(numeric), and sum_p_int8 is aggpartialfn of sum(int8) and sum_p_numeric is aggpartialfn of sum(numeric). --Part of avg(int8) in BKI file in PostgreSQL15.0[2]. { aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum', aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine', aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize', aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv', aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal', aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' }, -- --Part of sum(int8) in BKI file in PostgreSQL15.0[2]. { aggfnoid => 'sum(int8)', aggtransfn => 'int8_avg_accum', aggfinalfn => 'numeric_poly_sum', aggcombinefn => 'int8_avg_combine', aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'i
Re: Partial aggregates pushdown
On Fri, Mar 31, 2023 at 05:49:21AM +, fujii.y...@df.mitsubishielectric.co.jp wrote: > Hi Mr.Momjian > > > First, am I correct? > Yes, you are correct. This patch uses new special aggregate functions for > partial aggregate > (then we call this partialaggfunc). First, my apologies for not addressing this sooner. I was so focused on my own tasks that I didn't realize this very important patch was not getting attention. I will try my best to get it into PG 17. What amazes me is that you didn't need to create _any_ actual aggregate functions. Rather, you just needed to hook existing functions into the aggregate tables for partial FDW execution. > > Second, how far away is this from being committable > > and/or what work needs to be done to get it committable, either for PG 16 > > or 17? > I believe there are three: 1. and 2. are not clear if they are necessary or > not; 3. are clearly necessary. > I would like to hear the opinions of the development community on whether or > not 1. and 2. need to be addressed. > > 1. Making partialaggfunc user-defined function > In v17, I make partialaggfuncs as built-in functions. > Because of this approach, v17 changes specification of BKI file and > pg_aggregate. > For now, partialaggfuncs are needed by only postgres_fdw which is just an > extension of PostgreSQL. > In the future, when revising the specifications for BKI files and > pg_aggregate when modifying existing PostgreSQL functions, > It is necessary to align them with this patch's changes. > I am concerned that this may be undesirable. > So I am thinking that v17 should be modified to making partialaggfunc as user > defined function. I think we have three possible cases for aggregates pushdown to FDWs: 1) Postgres built-in aggregate functions 2) Postgres user-defined & extension aggregate functions 3) aggregate functions calls to non-PG FDWs Your patch handles #1 by checking that the FDW Postgres version is the same as the calling Postgres version. However, it doesn't check for extension versions, and frankly, I don't see how we could implement that cleanly without significant complexity. I suggest we remove the version check requirement --- instead just document that the FDW Postgres version should be the same or newer than the calling Postgres server --- that way, we can assume that whatever is in the system catalogs of the caller is in the receiving side. We should add a GUC to turn off this optimization for cases where the FDW Postgres version is older than the caller. This handles case 1-2. For case 3, I don't even know how much pushdown those do of _any_ aggregates to non-PG servers, let along parallel FDW ones. Does anyone know the details? > 2. Automation of creating definition of partialaggfuncs > In development of v17, I manually create definition of partialaggfuncs for > avg, min, max, sum, count. > I am concerned that this may be undesirable. > So I am thinking that v17 should be modified to automate creating definition > of partialaggfuncs > for all built-in aggregate functions. Are there any other builtin functions that need this? I think we can just provide documention for extensions on how to do this. > 3. Documentation > I need add explanation of partialaggfunc to documents on postgres_fdw and > other places. I can help with that once we decide on the above. I think 'partialaggfn' should be named 'aggpartialfn' to match other columns in pg_aggregate. I am confused by these changes to pg_aggegate: +{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum', + aggfinalfn => 'int8_avg_serialize', aggcombinefn => 'int8_avg_combine', + aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize', + aggtranstype => 'internal', aggtransspace => '48' }, ... +{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum', + aggfinalfn => 'numeric_avg_serialize', aggcombinefn => 'numeric_avg_combine', + aggserialfn => 'numeric_avg_serialize', + aggdeserialfn => 'numeric_avg_deserialize', + aggtranstype => 'internal', aggtransspace => '128' }, Why are these marked as 'sum' but use 'avg' functions? It would be good to explain exactly how this is diffent from background worker parallelism. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be. diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 09d6dd60dd..fe219b8d2f 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -202,7 +202,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); - +static bool partial_agg_ok(Aggref* agg, PgFdwRelationInfo* fpinfo); /* * Examine each qual clause in input_conds, and classify
RE: Partial aggregates pushdown
Hi Mr.Momjian > First, am I correct? Yes, you are correct. This patch uses new special aggregate functions for partial aggregate (then we call this partialaggfunc). > Second, how far away is this from being committable > and/or what work needs to be done to get it committable, either for PG 16 or > 17? I believe there are three: 1. and 2. are not clear if they are necessary or not; 3. are clearly necessary. I would like to hear the opinions of the development community on whether or not 1. and 2. need to be addressed. 1. Making partialaggfunc user-defined function In v17, I make partialaggfuncs as built-in functions. Because of this approach, v17 changes specification of BKI file and pg_aggregate. For now, partialaggfuncs are needed by only postgres_fdw which is just an extension of PostgreSQL. In the future, when revising the specifications for BKI files and pg_aggregate when modifying existing PostgreSQL functions, It is necessary to align them with this patch's changes. I am concerned that this may be undesirable. So I am thinking that v17 should be modified to making partialaggfunc as user defined function. 2. Automation of creating definition of partialaggfuncs In development of v17, I manually create definition of partialaggfuncs for avg, min, max, sum, count. I am concerned that this may be undesirable. So I am thinking that v17 should be modified to automate creating definition of partialaggfuncs for all built-in aggregate functions. 3. Documentation I need add explanation of partialaggfunc to documents on postgres_fdw and other places. Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation
Re: Partial aggregates pushdown
On Thu, Dec 15, 2022 at 10:23:05PM +, fujii.y...@df.mitsubishielectric.co.jp wrote: > Hi Mr.Freund. > > > cfbot complains about some compiler warnings when building with clang: > > https://cirrus-ci.com/task/6606268580757504 > > > > deparse.c:3459:22: error: equality comparison with extraneous parentheses > > [-Werror,-Wparentheses-equality] > > if ((node->aggsplit == AGGSPLIT_SIMPLE)) { > > ~~~^~ > > deparse.c:3459:22: note: remove extraneous parentheses around the > > comparison to silence this warning > > if ((node->aggsplit == AGGSPLIT_SIMPLE)) { > > ~ ^ ~ > > deparse.c:3459:22: note: use '=' to turn this equality comparison into an > > assignment > > if ((node->aggsplit == AGGSPLIT_SIMPLE)) { > > ^~ > > = > I fixed this error. Considering we only have a week left before feature freeze, I wanted to review the patch from this commitfest item: https://commitfest.postgresql.org/42/4019/ The most recent patch attached. This feature has been in development since 2021, and it is something that will allow new workloads for Postgres, specifically data warehouse sharding workloads. We currently allow parallel aggregates when the table is on the same machine, and we allow partitonwise aggregates on FDWs only with GROUP BY keys matching partition keys. The first is possible since we can share data structures between background workers, and the second is possible because if the GROUP BY includes the partition key, we are really just appending aggregate rows, not combining aggregate computations. What we can't do without this patch is to push aggregates that require partial aggregate computations (no partition key GROUP BY) to FDW partitions because we don't have a clean way to pass such information from the remote FDW server to the finalize backend. I think that is what this patch does. First, am I correct? Second, how far away is this from being committable and/or what work needs to be done to get it committable, either for PG 16 or 17? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be. >From 71993e37093b3cec325de5989a03afb3073775aa Mon Sep 17 00:00:00 2001 From: Yuki Fujii Date: Fri, 16 Dec 2022 09:33:30 +0900 Subject: [PATCH] Partial aggregates push down v17 --- contrib/postgres_fdw/deparse.c| 97 +- .../postgres_fdw/expected/postgres_fdw.out| 296 +- contrib/postgres_fdw/option.c | 24 ++ contrib/postgres_fdw/postgres_fdw.c | 22 +- contrib/postgres_fdw/postgres_fdw.h | 3 + contrib/postgres_fdw/sql/postgres_fdw.sql | 85 - src/backend/catalog/pg_aggregate.c| 32 ++ src/backend/commands/aggregatecmds.c | 8 + src/bin/pg_dump/pg_dump.c | 25 +- src/include/catalog/pg_aggregate.dat | 62 +++- src/include/catalog/pg_aggregate.h| 11 + src/include/catalog/pg_proc.dat | 35 +++ .../regress/expected/create_aggregate.out | 24 ++ src/test/regress/expected/oidjoins.out| 1 + src/test/regress/sql/create_aggregate.sql | 24 ++ 15 files changed, 715 insertions(+), 34 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 9524765650..8e5a45ceaa 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -202,7 +202,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); - +static bool partial_agg_ok(Aggref* agg, PgFdwRelationInfo* fpinfo); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -907,8 +907,9 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg, fpinfo)) return false; /* As usual, it must be shippable. */ @@ -3448,14 +3449,37 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) StringInfo buf = context->buf; bool use_variadic; - /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + + Assert((node->aggsplit == AGGSPLIT_SIMPLE) || + (node->aggsplit == AGGSPLIT_INITIAL_SERIAL)); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; - /* Find aggregate name from aggfnoid which is a pg_proc
RE: Partial aggregates pushdown
Hi Mr.Freund. > cfbot complains about some compiler warnings when building with clang: > https://cirrus-ci.com/task/6606268580757504 > > deparse.c:3459:22: error: equality comparison with extraneous parentheses > [-Werror,-Wparentheses-equality] > if ((node->aggsplit == AGGSPLIT_SIMPLE)) { > ~~~^~ > deparse.c:3459:22: note: remove extraneous parentheses around the > comparison to silence this warning > if ((node->aggsplit == AGGSPLIT_SIMPLE)) { > ~ ^ ~ > deparse.c:3459:22: note: use '=' to turn this equality comparison into an > assignment > if ((node->aggsplit == AGGSPLIT_SIMPLE)) { > ^~ > = I fixed this error. Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation 0001-Partial-aggregates-push-down-v17.patch Description: 0001-Partial-aggregates-push-down-v17.patch
Re: Partial aggregates pushdown
Hi, On 2022-12-05 02:03:49 +, fujii.y...@df.mitsubishielectric.co.jp wrote: > > Attaching minor fixes. I haven't proof-read all comments (but perhaps, they > > need attention from some native speaker). > Thank you. I fixed according to your patch. > And I fixed have proof-read all comments and messages. cfbot complains about some compiler warnings when building with clang: https://cirrus-ci.com/task/6606268580757504 deparse.c:3459:22: error: equality comparison with extraneous parentheses [-Werror,-Wparentheses-equality] if ((node->aggsplit == AGGSPLIT_SIMPLE)) { ~~~^~ deparse.c:3459:22: note: remove extraneous parentheses around the comparison to silence this warning if ((node->aggsplit == AGGSPLIT_SIMPLE)) { ~ ^ ~ deparse.c:3459:22: note: use '=' to turn this equality comparison into an assignment if ((node->aggsplit == AGGSPLIT_SIMPLE)) { ^~ = Greetings, Andres Freund
RE: Partial aggregates pushdown
Hi Mr.Pyhalov. > Attaching minor fixes. I haven't proof-read all comments (but perhaps, they > need attention from some native speaker). Thank you. I fixed according to your patch. And I fixed have proof-read all comments and messages. > Tested it with queries from > https://github.com/swarm64/s64da-benchmark-toolkit, works as expected. Thank you for additional tests. Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation 0001-Partial-aggregates-push-down-v16.patch Description: 0001-Partial-aggregates-push-down-v16.patch
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2022-12-01 05:23: Hi Mr.Pyhalov. Hi. Attaching minor fixes. I haven't proof-read all comments (but perhaps, they need attention from some native speaker). Tested it with queries from https://github.com/swarm64/s64da-benchmark-toolkit, works as expected. -- Best regards, Alexander Pyhalov, Postgres Professionaldiff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 35f2d102374..bd8a4acc112 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -3472,9 +3472,9 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) if ((aggform->aggtranstype != INTERNALOID) && (aggform->aggfinalfn == InvalidOid)) { appendFunctionName(node->aggfnoid, context); } else if(aggform->partialaggfn) { - appendFunctionName((Oid)(aggform->partialaggfn), context); + appendFunctionName(aggform->partialaggfn, context); } else { - elog(ERROR, "there in no partialaggfn %u", node->aggfnoid); + elog(ERROR, "there is no partialaggfn %u", node->aggfnoid); } ReleaseSysCache(aggtup); } @@ -3986,7 +3986,8 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, } /* - * Check that partial aggregate function of aggform exsits in remote + * Check that partial aggregate function, described by aggform, + * exists on remote server, described by fpinfo. */ static bool partial_agg_compatible(Form_pg_aggregate aggform, PgFdwRelationInfo *fpinfo)
RE: Partial aggregates pushdown
Hi Mr.Pyhalov. > One more issue I started to think about - now we don't check > partialagg_minversion for "simple" aggregates at all. Is it correct? It seems > that , > for example, we could try to pushdown bit_or(int8) to old servers, but it > didn't > exist, for example, in 8.4. I think it's a broader issue (it would be also > the case > already if we push down > aggregates) and shouldn't be fixed here. But there is an issue - > is_shippable() is too optimistic. I think it is correct for now. F.38.7 of [1] says "A limitation however is that postgres_fdw generally assumes that immutable built-in functions and operators are safe to send to the remote server for execution, if they appear in a WHERE clause for a foreign table." and says that we can avoid this limitation by rewriting query. It looks that postgres_fdw follows this policy in case of UPPERREL_GROUP_AGG aggregate pushdown. If a aggreagate has not internal aggtranstype and has not aggfinalfn , partialaggfn of it is equal to itself. So I think that it is adequate to follow this policy in case of partial aggregate pushdown for such aggregates. > >> 2) Do we really have to look at pg_proc in partial_agg_ok() and > >> deparseAggref()? Perhaps, looking at aggtranstype is enough? > > You are right. I fixed according to your comment. > > > > partial_agg_ok() still looks at pg_proc. Sorry for taking up your time. I fixed it. [1] https://www.postgresql.org/docs/current/postgres-fdw.html Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation 0001-Partial-aggregates-push-down-v15.patch Description: 0001-Partial-aggregates-push-down-v15.patch
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2022-11-30 13:01: 2) Do we really have to look at pg_proc in partial_agg_ok() and deparseAggref()? Perhaps, looking at aggtranstype is enough? You are right. I fixed according to your comment. partial_agg_ok() still looks at pg_proc. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2022-11-30 13:01: Hi Mr.Pyhalov. 1) In previous version of the patch aggregates, which had partialaggfn, were ok to push down. And it was a definite sign that aggregate can be pushed down. Now we allow pushing down an aggregate, which prorettype is not internal and aggfinalfn is not defined. Is it safe for all user-defined (or builtin) aggregates, even if they are generally shippable? Aggcombinefn is executed locally and we check that aggregate function itself is shippable. Is it enough? Perhaps, we could use partialagg_minversion (like aggregates with partialagg_minversion == -1 should not be pushed down) or introduce separate explicit flag? In what case partial aggregate pushdown is unsafe for aggregate which has not internal aggtranstype and has no aggfinalfn? By reading [1], I believe that if aggcombinefn of such aggregate recieves return values of original aggregate functions in each remote then it must produce same value that would have resulted from scanning all the input in a single operation. One more issue I started to think about - now we don't check partialagg_minversion for "simple" aggregates at all. Is it correct? It seems that , for example, we could try to pushdown bit_or(int8) to old servers, but it didn't exist, for example, in 8.4. I think it's a broader issue (it would be also the case already if we push down aggregates) and shouldn't be fixed here. But there is an issue - is_shippable() is too optimistic. -- Best regards, Alexander Pyhalov, Postgres Professional
RE: Partial aggregates pushdown
Hi Mr.Pyhalov. > 1) In previous version of the patch aggregates, which had partialaggfn, were > ok > to push down. And it was a definite sign that aggregate can be pushed down. > Now we allow pushing down an aggregate, which prorettype is not internal and > aggfinalfn is not defined. Is it safe for all user-defined (or builtin) > aggregates, > even if they are generally shippable? Aggcombinefn is executed locally and we > check that aggregate function itself is shippable. Is it enough? Perhaps, we > could use partialagg_minversion (like aggregates with partialagg_minversion > == -1 should not be pushed down) or introduce separate explicit flag? In what case partial aggregate pushdown is unsafe for aggregate which has not internal aggtranstype and has no aggfinalfn? By reading [1], I believe that if aggcombinefn of such aggregate recieves return values of original aggregate functions in each remote then it must produce same value that would have resulted from scanning all the input in a single operation. > 2) Do we really have to look at pg_proc in partial_agg_ok() and > deparseAggref()? Perhaps, looking at aggtranstype is enough? You are right. I fixed according to your comment. > 3) I'm not sure if CREATE AGGREGATE tests with invalid > PARTIALAGGFUNC/PARTIALAGG_MINVERSION should be in postgres_fdw > tests or better should be moved to src/test/regress/sql/create_aggregate.sql, > as they are not specific to postgres_fdw Thank you. I moved these tests to src/test/regress/sql/create_aggregate.sql. [1] https://www.postgresql.org/docs/15/xaggr.html#XAGGR-PARTIAL-AGGREGATES Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation 0001-Partial-aggregates-push-down-v14.patch Description: 0001-Partial-aggregates-push-down-v14.patch
Re: Partial aggregates pushdown
Hi, Yuki. 1) In previous version of the patch aggregates, which had partialaggfn, were ok to push down. And it was a definite sign that aggregate can be pushed down. Now we allow pushing down an aggregate, which prorettype is not internal and aggfinalfn is not defined. Is it safe for all user-defined (or builtin) aggregates, even if they are generally shippable? Aggcombinefn is executed locally and we check that aggregate function itself is shippable. Is it enough? Perhaps, we could use partialagg_minversion (like aggregates with partialagg_minversion == -1 should not be pushed down) or introduce separate explicit flag? 2) Do we really have to look at pg_proc in partial_agg_ok() and deparseAggref()? Perhaps, looking at aggtranstype is enough? 3) I'm not sure if CREATE AGGREGATE tests with invalid PARTIALAGGFUNC/PARTIALAGG_MINVERSION should be in postgres_fdw tests or better should be moved to src/test/regress/sql/create_aggregate.sql, as they are not specific to postgres_fdw -- Best regards, Alexander Pyhalov, Postgres Professional
RE: Partial aggregates pushdown
Hi Mr.Pyhalov. Thank you for comments. > I've looked through the patch. Overall I like this approach, but have > the following comments. > > 1) Why should we require partialaggfn for min()/max()/count()? We could > just use original functions for a lot of aggregates, and so it would be > possible to push down some partial aggregates to older servers. I'm not > sure that it's a strict requirement, but a nice thing to think about. > Can we use the function itself as partialaggfn, for example, for > sum(int4)? > For functions with internal aggtranstype (like sum(int8) it > would be more difficult). Thank you. I realized that partial aggregate pushdown is fine without partialaggfn if original function has no aggfinalfn and aggtranstype of it is not internal. So I have improved v12 by this realization. However, v13 requires partialaggfn for aggregate if it has aggfinalfn or aggtranstype of it is internal such as sum(int8). > 2) fpinfo->server_version is not aggregated, for example, when we form > fpinfo in foreign_join_ok(), it seems we should spread it in more places > in postgres_fdw.c. I have responded to your comment by adding copy of server_version in merge_fdw_options. > 3) In add_foreign_grouping_paths() it seems there's no need for > additional argument, we can look at extra->patype. Also Assert() in > add_foreign_grouping_paths() will fire in --enable-cassert build. I have fixed according to your comment. > 4) Why do you modify lookup_agg_function() signature? I don't see tests, > showing that it's neccessary. Perhaps, more precise function naming > should be used instead? I realized that there is no need of modification lookup_agg_function(). Instead, I use LookupFuncName(). > 5) In tests: > - Why version_num does have "name" type in > f_alter_server_version() function? > - You modify server_version option of 'loopback' server, but > don't reset it after test. This could affect further tests. > - "It's unsafe to push down partial aggregates with distinct" > in postgres_fdw.sql:3002 seems to be misleading. > 3001 > 3002 -- It's unsafe to push down partial aggregates with distinct > 3003 SELECT f_alter_server_version('loopback', 'set', -1); I have fixed according to your comment. > 6) While looking at it, could cause a crash with something like I have fixed this problem by using LookupFuncName() instead of lookup_agg_function. The following is readme of v13. --readme of Partial aggregates push down v13 1. interface 1) pg_aggregate There are the following additional columns. a) partialaggfn data type: regproc. default value: zero(means invalid). description : This field refers to the special aggregate function(then we call this partialaggfunc) corresponding to aggregation function(then we call src) which has aggfnoid. partialaggfunc is used for partial aggregation pushdown by postgres_fdw. The followings are differences between the src and the special aggregate function. difference1) result type The result type is same as the src's transtype if the src's transtype is not internal. Otherwise the result type is bytea. difference2) final func The final func does not exist if the src's transtype is not internal. Otherwize the final func returns serialized value. For example, there is a partialaggfunc avg_p_int4 which corresponds to avg(int4) whose aggtranstype is _int4. The result value of avg_p_int4 is a float8 array which consists of count and summation. avg_p_int4 does not have finalfunc. For another example, there is a partialaggfunc avg_p_int8 which corresponds to avg(int8) whose aggtranstype is internal. The result value of avg_p_int8 is a bytea serialized array which consists of count and summation. avg_p_int8 has finalfunc int8_avg_serialize which is serialize function of avg(int8). This field is zero if there is no partialaggfunc. b) partialagg_minversion data type: int4. default value: zero(means current version). description : This field is the minimum PostgreSQL server version which has partialaggfunc. This field is used for checking compatibility of partialaggfunc. The above fields are valid in tuples for builtin avg, sum, min, max, count. There are additional records which correspond to partialaggfunc for avg, sum, min, max, count. 2) pg_proc There are additional records which correspond to partialaggfunc for avg, sum, min, max, count. 3) postgres_fdw postgres_fdw has an additional foreign server option server_version. server_version is integer value which means remote server version number. Default value of server_version is zero. server_version is used for checking compatibility of partialaggfunc. 2. feature Partial aggregation pushdown is fine when either of the following conditions is true. condition1) aggregate function has not internal aggtranstype and has no aggfinalfn. condition2) the
Re: Partial aggregates pushdown
fujii.y...@df.mitsubishielectric.co.jp писал 2022-11-22 04:01: Hi Mr.Vondra, Mr.Pyhalov, Everyone. I discussed with Mr.Pyhalov about the above draft by directly sending mail to him(outside of pgsql-hackers). Mr.Pyhalov allowed me to update his patch along with the above draft. So I update Mr.Pyhalov's patch v10. Hi, Yuki. Thank you for your work on this. I've looked through the patch. Overall I like this approach, but have the following comments. 1) Why should we require partialaggfn for min()/max()/count()? We could just use original functions for a lot of aggregates, and so it would be possible to push down some partial aggregates to older servers. I'm not sure that it's a strict requirement, but a nice thing to think about. Can we use the function itself as partialaggfn, for example, for sum(int4)? For functions with internal aggtranstype (like sum(int8) it would be more difficult). 2) fpinfo->server_version is not aggregated, for example, when we form fpinfo in foreign_join_ok(), it seems we should spread it in more places in postgres_fdw.c. 3) In add_foreign_grouping_paths() it seems there's no need for additional argument, we can look at extra->patype. Also Assert() in add_foreign_grouping_paths() will fire in --enable-cassert build. 4) Why do you modify lookup_agg_function() signature? I don't see tests, showing that it's neccessary. Perhaps, more precise function naming should be used instead? 5) In tests: - Why version_num does have "name" type in f_alter_server_version() function? - You modify server_version option of 'loopback' server, but don't reset it after test. This could affect further tests. - "It's unsafe to push down partial aggregates with distinct" in postgres_fdw.sql:3002 seems to be misleading. 3001 3002 -- It's unsafe to push down partial aggregates with distinct 3003 SELECT f_alter_server_version('loopback', 'set', -1); 3004 EXPLAIN (VERBOSE, COSTS OFF) 3005 SELECT avg(d) FROM pagg_tab; 3006 SELECT avg(d) FROM pagg_tab; 3007 select * from pg_foreign_server; 6) While looking at it, could cause a crash with something like CREATE TYPE COMPLEX AS (re FLOAT, im FLOAT); CREATE OR REPLACE FUNCTION sum_complex (sum complex, el complex) RETURNS complex AS $$ DECLARE s complex; BEGIN if el is not null and sum is not null then sum.re:=coalesce(sum.re,0)+el.re; sum.im:=coalesce(sum.im,0)+el.im; end if; RETURN sum; END; $$ LANGUAGE plpgSQL; CREATE AGGREGATE SUM(COMPLEX) ( SFUNC=sum_complex, STYPE=complex, partialaggfunc=, partialagg_minversion=1400 ); where - something nonexisting enforce_generic_type_consistency (actual_arg_types=0x56269873d200, declared_arg_types=0x0, nargs=1, rettype=0, allow_poly=true) at parse_coerce.c:2132 2132Oid decl_type = declared_arg_types[j]; (gdb) bt #0 enforce_generic_type_consistency (actual_arg_types=0x56269873d200, declared_arg_types=0x0, nargs=1, rettype=0, allow_poly=true) at parse_coerce.c:2132 #1 0x5626960072de in lookup_agg_function (fnName=0x5626986715a0, nargs=1, input_types=0x56269873d200, variadicArgType=0, rettype=0x7ffd1a4045d8, only_normal=false) at pg_aggregate.c:916 #2 0x5626960064ba in AggregateCreate (aggName=0x562698671000 "sum", aggNamespace=2200, replace=false, aggKind=110 'n', numArgs=1, numDirectArgs=0, parameterTypes=0x56269873d1e8, allParameterTypes=0, parameterModes=0, parameterNames=0, parameterDefaults=0x0, variadicArgType=0, aggtransfnName=0x5626986712c0, aggfinalfnName=0x0, aggcombinefnName=0x0, aggserialfnName=0x0, aggdeserialfnName=0x0, aggmtransfnName=0x0, aggminvtransfnName=0x0, aggmfinalfnName=0x0, partialaggfnName=0x5626986715a0, finalfnExtraArgs=false, mfinalfnExtraArgs=false, finalfnModify=114 'r', mfinalfnModify=114 'r', aggsortopName=0x0, aggTransType=16390, aggTransSpace=0, aggmTransType=0, aggmTransSpace=0, partialaggMinversion=1400, agginitval=0x0, aggminitval=0x0, proparallel=117 'u') at pg_aggregate.c:582 #3 0x5626960a1e1c in DefineAggregate (pstate=0x56269869ab48, name=0x562698671038, args=0x5626986711b0, oldstyle=false, parameters=0x5626986713b0, replace=false) at aggregatecmds.c:450 #4 0x56269643061f in ProcessUtilitySlow (pstate=0x56269869ab48, pstmt=0x562698671a68, queryString=0x5626986705d8 "CREATE AGGREGATE SUM(COMPLEX) (\nSFUNC=sum_complex,\nSTYPE=COMPLEX,\npartialaggfunc=scomplex,\npartialagg_minversion=1400\n);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x562698671b48, qc=0x7ffd1a4053c0) at utility.c:1407 #5 0x56269642fbb4 in standard_ProcessUtility (pstmt=0x562698671a68, queryString=0x5626986705d8 "CREATE AGGREGATE SUM(COMPLEX) (\nSFUNC=sum_complex,\nSTYPE=COMPLEX,\npartialaggfunc=scomplex,\npartialagg_minversion=1400\n);", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x562698671b48, qc=0x7ffd1a4053c0) at utility.c:1074 Later will look at it again
Re: [CAUTION!! freemail] Re: Partial aggregates pushdown
On Tue, Nov 22, 2022 at 1:11 AM fujii.y...@df.mitsubishielectric.co.jp < fujii.y...@df.mitsubishielectric.co.jp> wrote: > Hi Mr.Yu. > > Thank you for comments. > > > + * Check that partial aggregate agg has compatibility > > > > If the `agg` refers to func parameter, the parameter name is aggform > I fixed the above typo and made the above comment easy to understand > New comment is "Check that partial aggregate function of aggform exsits in > remote" > > > + int32 partialagg_minversion = PG_VERSION_NUM; > > + if (aggform->partialagg_minversion == > > PARTIALAGG_MINVERSION_DEFAULT) { > > + partialagg_minversion = PG_VERSION_NUM; > > > > > > I am curious why the same variable is assigned the same value twice. It > seems > > the if block is redundant. > > > > + if ((fpinfo->server_version >= partialagg_minversion)) { > > + compatible = true; > > > > > > The above can be simplified as: return fpinfo->server_version >= > > partialagg_minversion; > I fixed according to your comment. > > Sincerely yours, > Yuuki Fujii > > > Hi, Thanks for the quick response.
Re: Partial aggregates pushdown
On Mon, Nov 21, 2022 at 5:02 PM fujii.y...@df.mitsubishielectric.co.jp < fujii.y...@df.mitsubishielectric.co.jp> wrote: > Hi Mr.Vondra, Mr.Pyhalov, Everyone. > > I discussed with Mr.Pyhalov about the above draft by directly sending mail > to > him(outside of pgsql-hackers). Mr.Pyhalov allowed me to update his patch > along with the above draft. So I update Mr.Pyhalov's patch v10. > > I wrote my patch for discussion. > My patch passes regression tests which contains additional basic > postgres_fdw tests > for my patch's feature. But my patch doesn't contain sufficient documents > and tests. > If reviewers accept my approach, I will add documents and tests to my > patch. > > The following is a my patch's readme. > # I simplified the above draft. > > --readme of my patch > 1. interface > 1) pg_aggregate > There are the following additional columns. > a) partialaggfn > data type: regproc. > default value: zero(means invalid). > description : This field refers to the special aggregate function(then > we call > this partialaggfunc) > corresponding to aggregation function(then we call src) which has > aggfnoid. > partialaggfunc is used for partial aggregation pushdown by > postgres_fdw. > The followings are differences between the src and the special > aggregate function. > difference1) result type > The result type is same as the src's transtype if the src's > transtype > is not internal. > Otherwise the result type is bytea. > difference2) final func > The final func does not exist if the src's transtype is not > internal. > Otherwize the final func returns serialized value. > For example, there is a partialaggfunc avg_p_int4 which corresponds to > avg(int4) > whose aggtranstype is _int4. > The result value of avg_p_int4 is a float8 array which consists of > count and > summation. avg_p_int4 does not have finalfunc. > For another example, there is a partialaggfunc avg_p_int8 which > corresponds to > avg(int8) whose aggtranstype is internal. > The result value of avg_p_int8 is a bytea serialized array which > consists of count > and summation. avg_p_int8 has finalfunc int8_avg_serialize which is > serialize function > of avg(int8). This field is zero if there is no partialaggfunc. > > b) partialagg_minversion > data type: int4. > default value: zero(means current version). > description : This field is the minimum PostgreSQL server version which > has > partialaggfunc. This field is used for checking compatibility of > partialaggfunc. > > The above fields are valid in tuples for builtin avg, sum, min, max, count. > There are additional records which correspond to partialaggfunc for avg, > sum, min, max, > count. > > 2) pg_proc > There are additional records which correspond to partialaggfunc for avg, > sum, min, max, > count. > > 3) postgres_fdw > postgres_fdw has an additional foreign server option server_version. > server_version is > integer value which means remote server version number. Default value of > server_version > is zero. server_version is used for checking compatibility of > partialaggfunc. > > 2. feature > postgres_fdw can pushdown partial aggregation of avg, sum, min, max, count. > Partial aggregation pushdown is fine when the following two conditions are > both true. > condition1) partialaggfn is valid. > condition2) server_version is not less than partialagg_minversion > postgres_fdw executes pushdown the patialaggfunc instead of a src. > For example, we issue "select avg_p_int4(c) from t" instead of "select > avg(c) from t" > in the above example. > > postgres_fdw can pushdown every aggregate function which supports partial > aggregation > if you add a partialaggfunc corresponding to the aggregate function by > create aggregate > command. > > 3. difference between my patch and Mr.Pyhalov's v10 patch. > 1) In my patch postgres_fdw can pushdown partial aggregation of avg > 2) In my patch postgres_fdw can pushdown every aggregate function which > supports partial > aggregation if you add a partialaggfunc corresponding to the aggregate > function. > > 4. sample commands in psql > \c postgres > drop database tmp; > create database tmp; > \c tmp > create extension postgres_fdw; > create server server_01 foreign data wrapper postgres_fdw options(host > 'localhost', dbname 'tmp', server_version '16', async_capable 'true'); > create user mapping for postgres server server_01 options(user 'postgres', > password 'postgres'); > create server server_02 foreign data wrapper postgres_fdw options(host > 'localhost', dbname 'tmp', server_version '16', async_capable 'true'); > create user mapping for postgres server server_02 options(user 'postgres', > password 'postgres'); > > create table t(dt timestamp, id int4, name text, total int4, val float4, > type int4, span interval) partition by list (type); > > create table t1(dt timestamp, id int4, name text, total int4
RE: Partial aggregates pushdown
Hi Mr.Vondra, Mr.Pyhalov. I'm interesied in Mr.Pyhalov's patch due to the following background. --Background I develop postgresql's extension such as fdw in my work. I'm interested in using postgresql for OLAP. I think the function of a previous patch "Push aggregation down to base relations and joins"[1] is desiable. I rebased the previous patch and register the rebased patch on the next commitfest[2]. And I think it would be more useful if the previous patch works on a foreign table of postgres_fdw. I realized the function of partial aggregation pushdown is necessary to make the previous patch work on a foreign table of postgres_fdw. -- So I reviewed Mr.Pyhalov's patch and discussions on this thread. I made a draft of approach to respond to Mr.Vondra's comments. Would you check whether my draft is right or not? --My draft > 1) It's not clear to me how could this get extended to aggregates with > more complex aggregate states, to support e.g. avg() and similar > fairly common aggregates. We add a special aggregate function every aggregate function (hereafter we call this src) which supports partial aggregation. The followings are differences between the src and the special aggregate function. difference1) result type The result type is same with the src's transtype if the src's transtype is not internal. Otherwise the result type is bytea. difference2) final func The final func does not exist if the src's transtype is not internal. Otherwize the final func returns serialized value. For example, let me call the special aggregate function of avg(float8) avg_p(float8). The result value of avg_p is a float8 array which consists of count and summation. avg_p does not have finalfunc. We pushdown the special aggregate function instead of a src. For example, we issue "select avg_p(c) from t" instead of "select avg(c) from t" in the above example. We add a new column partialaggfn to pg_aggregate to get the oid of the special aggregate function from the the src's oid. This column is the oid of the special aggregate function which corresponds to the src. If an aggregate function does not have any special aggregate function, then we does not pushdown any partial aggregation of the aggregate function. > 2) I'm not sure relying on aggpartialpushdownsafe without any version > checks etc. is sufficient. I mean, how would we know the remote node > has the same idea of representing the aggregate state. I wonder how > this aligns with assumptions we do e.g. for functions etc. We add compatible server versions infomation to pg_aggregate and the set of options of postgres_fdw's foreign server. We check compatibility of an aggregate function using this infomation. An additional column of pg_aggregate is compatibleversonrange. This column is a range of postgresql server versions which has compatible aggregate function. An additional options of postgres_fdw's foreign server are serverversion and bwcompatibleverson. serverversion is remote postgresql server version. bwcompatibleverson is the maximum version in which any aggregate function is compatible with local noed's one. Our version check passes if and only if at least one of the following conditions is true. condition1) the option value of serverversion is in compatibleversonrange. condition2) the local postgresql server version is between bwcompatibleverson and the option value of serverversion. We can get the local postgresql server version from PG_VERSION_NUM macro. We use condition1 if the local postgresql server version is not more than the remote one. and use condition2 if the local postgresql server version is greater than the remote one. -- Sincerely yours, Yuuki Fujii [1] https://commitfest.postgresql.org/32/1247/ [2] https://commitfest.postgresql.org/39/3764/ -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation
Re: Partial aggregates pushdown
Tomas Vondra писал 2022-03-22 15:28: On 3/22/22 01:49, Andres Freund wrote: On 2022-01-17 15:27:53 +0300, Alexander Pyhalov wrote: Alexander Pyhalov писал 2022-01-17 15:26: Updated patch. Sorry, missed attachment. Needs another update: http://cfbot.cputube.org/patch_37_3369.log Marked as waiting on author. TBH I'm still not convinced this is the right approach. I've voiced this opinion before, but to reiterate the main arguments: 1) It's not clear to me how could this get extended to aggregates with more complex aggregate states, to support e.g. avg() and similar fairly common aggregates. Hi. Yes, I'm also not sure how to proceed with aggregates with complex state. Likely it needs separate function to export their state, but then we should somehow ensure that this function exists and our 'importer' can handle its result. Note that for now we have no mechanics in postgres_fdw to find out remote server version on planning stage. 2) I'm not sure relying on aggpartialpushdownsafe without any version checks etc. is sufficient. I mean, how would we know the remote node has the same idea of representing the aggregate state. I wonder how this aligns with assumptions we do e.g. for functions etc. It seems to be not a problem for me, as for now we don't care about remote node internal aggregate state representation. We currently get just aggregate result from remote node. For aggregates with 'internal' stype we call converter locally, and it converts external result from aggregate return type to local node internal representation. Aside from that, there's a couple review comments: 1) should not remove the comment in foreign_expr_walker Fixed. 2) comment in deparseAggref is obsolete/inaccurate Fixed. 3) comment for partial_agg_ok should probably explain when we consider aggregate OK to be pushed down Expanded comment. 4) I'm not sure why get_rcvd_attinmeta comment talks about "return type bytea" and "real input type". Expanded comment. Tupdesc can be retrieved from node->ss.ss_ScanTupleSlot, and so we expect to see bytea (as should be produced by partial aggregation). But when we scan data, we get aggregate output type (which matches converter input type), so attinmeta should be fixed. If we deal with aggregate which doesn't have converter, partial_agg_ok() ensures that agg->aggfnoid return type matches agg->aggtranstype. 5) Talking about "partial" aggregates is a bit confusing, because that suggests this is related to actual "partial aggregates". But it's not. How should we call them? It's about pushing "Partial count()" or "Partial sum()" to the remote server, why it's not related to partial aggregates? Do you mean that it's not about parallel aggregate processing? 6) Can add_foreign_grouping_paths do without the new 'partial' parameter? Clearly, it can be deduced from extra->patype, no? Fixed this. 7) There's no docs for PARTIALCONVERTERFUNC / PARTIAL_PUSHDOWN_SAFE in CREATE AGGREGATE sgml docs. Added documentation. I'd appreciate advice on how it should be extended. 8) I don't think "serialize" in the converter functions is the right term, considering those functions are not "serializing" anything. If anything, it's the remote node that is serializing the agg state and the local not is deserializing it. Or maybe I just misunderstand where are the converter functions executed? Converter function transforms aggregate result to serialized internal representation, which is expected from partial aggregate. I mean, it converts aggregate result type to internal representation and then efficiently executes serialization code (i.e. converter(x) == serialize(to_internal(x))). -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 7ad4eacf017a4fd3793f0949ef43ccc8292bf3f6 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 63 +- .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 187 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- doc/src/sgml/ref/create_aggregate.sgml| 27 +++ src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 ++- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 20 +- src/include/catalog/pg_aggregate.dat | 110 ++- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 13 files changed, 702 insertions(+), 81 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..272e2391d7f 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -197,6 +197,7 @@ s
Re: Partial aggregates pushdown
On 3/22/22 01:49, Andres Freund wrote: > On 2022-01-17 15:27:53 +0300, Alexander Pyhalov wrote: >> Alexander Pyhalov писал 2022-01-17 15:26: >>> Updated patch. >> >> Sorry, missed attachment. > > Needs another update: http://cfbot.cputube.org/patch_37_3369.log > > Marked as waiting on author. > TBH I'm still not convinced this is the right approach. I've voiced this opinion before, but to reiterate the main arguments: 1) It's not clear to me how could this get extended to aggregates with more complex aggregate states, to support e.g. avg() and similar fairly common aggregates. 2) I'm not sure relying on aggpartialpushdownsafe without any version checks etc. is sufficient. I mean, how would we know the remote node has the same idea of representing the aggregate state. I wonder how this aligns with assumptions we do e.g. for functions etc. Aside from that, there's a couple review comments: 1) should not remove the comment in foreign_expr_walker 2) comment in deparseAggref is obsolete/inaccurate 3) comment for partial_agg_ok should probably explain when we consider aggregate OK to be pushed down 4) I'm not sure why get_rcvd_attinmeta comment talks about "return type bytea" and "real input type". 5) Talking about "partial" aggregates is a bit confusing, because that suggests this is related to actual "partial aggregates". But it's not. 6) Can add_foreign_grouping_paths do without the new 'partial' parameter? Clearly, it can be deduced from extra->patype, no? 7) There's no docs for PARTIALCONVERTERFUNC / PARTIAL_PUSHDOWN_SAFE in CREATE AGGREGATE sgml docs. 8) I don't think "serialize" in the converter functions is the right term, considering those functions are not "serializing" anything. If anything, it's the remote node that is serializing the agg state and the local not is deserializing it. Or maybe I just misunderstand where are the converter functions executed? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Partial aggregates pushdown
On 2022-01-17 15:27:53 +0300, Alexander Pyhalov wrote: > Alexander Pyhalov писал 2022-01-17 15:26: > > Updated patch. > > Sorry, missed attachment. Needs another update: http://cfbot.cputube.org/patch_37_3369.log Marked as waiting on author. - Andres
Re: Partial aggregates pushdown
Alexander Pyhalov писал 2022-01-17 15:26: Zhihong Yu писал 2022-01-17 11:43: Hi, + FdwScanPrivateConvertors + * Generate attinmeta if there are some converters: I think it would be better if converter is spelled the same way across the patch. For build_conv_list(): + if (IS_UPPER_REL(foreignrel)) You can return NIL for !IS_UPPER_REL(foreignrel) - this would save indentation for the body of the func. Hi. Updated patch. Sorry, missed attachment. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 4408f98a67872efd3c09a3bf89e7cbf88db2a8b2 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 57 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 196 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 20 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 12 files changed, 672 insertions(+), 83 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..6b12b7bf76b 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -197,6 +197,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -832,8 +833,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3349,7 +3352,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3819,3 +3822,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (!aggform->aggpartialpushdownsafe) + { + ReleaseSysCache(aggtup); + return false; + } + + /* + * If an aggregate requires serialization/deserialization, partial + * converter should be defined + */ + if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid) + { + ReleaseSysCache(aggtup); + return false; + } + + /* In this case we currently don't use converter */ + if (agg->aggtranstype != INTERNALOID && get_func_rettype(agg->aggfnoid) != agg->aggtranstype) + { + ReleaseSysCache(aggtup); + return false; + } + + ReleaseSysCache(aggtup); + return true; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 7d6f7d9e3df..549bb9bae61 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9336,13 +9336,13 @@ RESET enable_partitionwise_join; -- === -- test partitionwise aggregates -- === -CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a); +CREATE TABLE pagg_tab (a int, b int, c text, d numeric) PARTITION BY RANGE(a); CREATE TABLE pagg_tab_p1 (LIKE pagg_tab); CREATE
Re: Partial aggregates pushdown
Zhihong Yu писал 2022-01-17 11:43: Hi, + FdwScanPrivateConvertors + * Generate attinmeta if there are some converters: I think it would be better if converter is spelled the same way across the patch. For build_conv_list(): + if (IS_UPPER_REL(foreignrel)) You can return NIL for !IS_UPPER_REL(foreignrel) - this would save indentation for the body of the func. Hi. Updated patch. -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
On Sun, Jan 16, 2022 at 11:47 PM Alexander Pyhalov wrote: > Julien Rouhaud писал 2022-01-14 15:16: > > Hi, > > > > On Mon, Nov 15, 2021 at 04:01:51PM +0300, Alexander Pyhalov wrote: > >> > >> I've updated patch - removed catversion dump. > > > > This version of the patchset doesn't apply anymore: > > > > http://cfbot.cputube.org/patch_36_3369.log > > === Applying patches on top of PostgreSQL commit ID > > 025b920a3d45fed441a0a58fdcdf05b321b1eead === > > === applying patch ./0001-Partial-aggregates-push-down-v07.patch > > patching file src/bin/pg_dump/pg_dump.c > > Hunk #1 succeeded at 13111 (offset -965 lines). > > Hunk #2 FAILED at 14167. > > Hunk #3 succeeded at 13228 (offset -961 lines). > > Hunk #4 succeeded at 13319 (offset -966 lines). > > 1 out of 4 hunks FAILED -- saving rejects to file > > src/bin/pg_dump/pg_dump.c.rej > > > > Could you send a rebased version? In the meantime I will switch the cf > > entry > > to Waiting on Author. > > Hi. Attaching rebased patch. > -- > Best regards, > Alexander Pyhalov, > Postgres Professional Hi, + FdwScanPrivateConvertors + * Generate attinmeta if there are some converters: I think it would be better if converter is spelled the same way across the patch. For build_conv_list(): + if (IS_UPPER_REL(foreignrel)) You can return NIL for !IS_UPPER_REL(foreignrel) - this would save indentation for the body of the func. Cheers
Re: Partial aggregates pushdown
Julien Rouhaud писал 2022-01-14 15:16: Hi, On Mon, Nov 15, 2021 at 04:01:51PM +0300, Alexander Pyhalov wrote: I've updated patch - removed catversion dump. This version of the patchset doesn't apply anymore: http://cfbot.cputube.org/patch_36_3369.log === Applying patches on top of PostgreSQL commit ID 025b920a3d45fed441a0a58fdcdf05b321b1eead === === applying patch ./0001-Partial-aggregates-push-down-v07.patch patching file src/bin/pg_dump/pg_dump.c Hunk #1 succeeded at 13111 (offset -965 lines). Hunk #2 FAILED at 14167. Hunk #3 succeeded at 13228 (offset -961 lines). Hunk #4 succeeded at 13319 (offset -966 lines). 1 out of 4 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej Could you send a rebased version? In the meantime I will switch the cf entry to Waiting on Author. Hi. Attaching rebased patch. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom d9227853b4ebeaa84ce4fd9623b2128cd6b2e494 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 57 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 196 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 20 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 12 files changed, 672 insertions(+), 83 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..6b12b7bf76b 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -197,6 +197,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -832,8 +833,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3349,7 +3352,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3819,3 +3822,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (!aggform->aggpartialpushdownsafe) + { + ReleaseSysCache(aggtup); + return false; + } + + /* + * If an aggregate requires serialization/deserialization, partial + * converter should be defined + */ + if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid) + { + ReleaseSysCache(aggtup); + return false; + } + + /* In this case we currently don't use converter */ + if (agg->aggtranstype != INTERNALOID && get_func_rettype(agg->aggfnoid) != agg->aggtranstype) + { + ReleaseSysCache(aggtup); + return false; + } + + ReleaseSysCache(aggtup); + return true; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 7d6f7d9e3df..549bb9bae61 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9336,13 +9336,13 @@ RESET enable_partitionwise_jo
Re: Partial aggregates pushdown
Hi, On Mon, Nov 15, 2021 at 04:01:51PM +0300, Alexander Pyhalov wrote: > > I've updated patch - removed catversion dump. This version of the patchset doesn't apply anymore: http://cfbot.cputube.org/patch_36_3369.log === Applying patches on top of PostgreSQL commit ID 025b920a3d45fed441a0a58fdcdf05b321b1eead === === applying patch ./0001-Partial-aggregates-push-down-v07.patch patching file src/bin/pg_dump/pg_dump.c Hunk #1 succeeded at 13111 (offset -965 lines). Hunk #2 FAILED at 14167. Hunk #3 succeeded at 13228 (offset -961 lines). Hunk #4 succeeded at 13319 (offset -966 lines). 1 out of 4 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej Could you send a rebased version? In the meantime I will switch the cf entry to Waiting on Author.
Re: Partial aggregates pushdown
Daniel Gustafsson писал 2021-11-15 13:16: On 3 Nov 2021, at 15:50, Alexander Pyhalov wrote: Daniel Gustafsson писал 2021-11-03 16:45: On 2 Nov 2021, at 10:12, Alexander Pyhalov wrote: Updated and rebased patch. + state = (Int128AggState *) palloc0(sizeof(Int128AggState)); + state->calcSumX2 = false; + + if (!PG_ARGISNULL(0)) + { +#ifdef HAVE_INT128 + do_int128_accum(state, (int128) PG_GETARG_INT64(0)); +#else + do_numeric_accum(state, int64_to_numeric(PG_GETARG_INT64(0))); +#endif This fails on non-INT128 platforms as state cannot be cast to Int128AggState outside of HAVE_INT128; it's not defined there. This needs to be a PolyNumAggState no? Hi. Thank you for noticing this. It's indeed fails with pgac_cv__128bit_int=no. Updated patch. The updated patch also fails to apply now, but on the catversion.h bump. To avoid having to rebase for that I recommend to skip that part in the patch and just mention the need in the thread, any committer picking this up for commit will know to bump the catversion so there is no use in risking unneccesary conflicts. I've updated patch - removed catversion dump. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 2af16e66276938b861cf7a8db2fef967f54b800f Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 57 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 196 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 21 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 12 files changed, 673 insertions(+), 83 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index b27689d0864..a515c5662bb 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -197,6 +197,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -832,8 +833,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3349,7 +3352,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3819,3 +3822,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (!aggform->aggpartialpushdownsafe) + { + ReleaseSysCache(aggtup); + return false; + } + + /* + * If an aggregate requires serialization/deserialization, partial + * converter should be defined + */ + if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid) + { + ReleaseSysCache(aggtup); + return false; + } + + /* In this case we currently don't use converter */ + if (agg->aggtranstype != INTERNALOID && get_func_rettype(agg->aggfnoid) != agg->aggtranstype) + { + ReleaseSysCache(aggtup); + return false; + } + + Releas
Re: Partial aggregates pushdown
> On 3 Nov 2021, at 15:50, Alexander Pyhalov wrote: > > Daniel Gustafsson писал 2021-11-03 16:45: >>> On 2 Nov 2021, at 10:12, Alexander Pyhalov wrote: >>> Updated and rebased patch. >> +state = (Int128AggState *) palloc0(sizeof(Int128AggState)); >> +state->calcSumX2 = false; >> + >> +if (!PG_ARGISNULL(0)) >> +{ >> +#ifdef HAVE_INT128 >> +do_int128_accum(state, (int128) PG_GETARG_INT64(0)); >> +#else >> +do_numeric_accum(state, int64_to_numeric(PG_GETARG_INT64(0))); >> +#endif >> This fails on non-INT128 platforms as state cannot be cast to Int128AggState >> outside of HAVE_INT128; it's not defined there. This needs to be a >> PolyNumAggState no? > > Hi. > Thank you for noticing this. It's indeed fails with pgac_cv__128bit_int=no. > Updated patch. The updated patch also fails to apply now, but on the catversion.h bump. To avoid having to rebase for that I recommend to skip that part in the patch and just mention the need in the thread, any committer picking this up for commit will know to bump the catversion so there is no use in risking unneccesary conflicts. -- Daniel Gustafsson https://vmware.com/
Re: Partial aggregates pushdown
Daniel Gustafsson писал 2021-11-03 16:45: On 2 Nov 2021, at 10:12, Alexander Pyhalov wrote: Updated and rebased patch. + state = (Int128AggState *) palloc0(sizeof(Int128AggState)); + state->calcSumX2 = false; + + if (!PG_ARGISNULL(0)) + { +#ifdef HAVE_INT128 + do_int128_accum(state, (int128) PG_GETARG_INT64(0)); +#else + do_numeric_accum(state, int64_to_numeric(PG_GETARG_INT64(0))); +#endif This fails on non-INT128 platforms as state cannot be cast to Int128AggState outside of HAVE_INT128; it's not defined there. This needs to be a PolyNumAggState no? Hi. Thank you for noticing this. It's indeed fails with pgac_cv__128bit_int=no. Updated patch. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom f72a3d52a2b85ad9ea5f61f8ff5c46cb50ae3ec8 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 57 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 196 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 21 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 13 files changed, 674 insertions(+), 84 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..8cee12c1b2a 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (!aggform->aggpartialpushdownsafe) + { + ReleaseSysCache(aggtup); + return false; + } + + /* + * If an aggregate requires serialization/deserialization, partial + * converter should be defined + */ + if (agg->aggtranstype == INTERNALOID && aggform->aggpartialconverterfn == InvalidOid) + { + ReleaseSysCache(aggtup); + return false; + } + + /* In this case we currently don't use converter */ + if (agg->aggtranstype != INTERNALOID && get_func_rettype(agg->aggfnoid) != agg->aggtranstype) + { + ReleaseSysCache(aggtup); + return false; + } + + ReleaseSysCache(aggtup); + return true; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index fd141a0fa5c..80c507783e6 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9279,13 +9279,13 @@ RESET enable_partitionwise_join; -- =
Re: Partial aggregates pushdown
> On 2 Nov 2021, at 10:12, Alexander Pyhalov wrote: > Updated and rebased patch. + state = (Int128AggState *) palloc0(sizeof(Int128AggState)); + state->calcSumX2 = false; + + if (!PG_ARGISNULL(0)) + { +#ifdef HAVE_INT128 + do_int128_accum(state, (int128) PG_GETARG_INT64(0)); +#else + do_numeric_accum(state, int64_to_numeric(PG_GETARG_INT64(0))); +#endif This fails on non-INT128 platforms as state cannot be cast to Int128AggState outside of HAVE_INT128; it's not defined there. This needs to be a PolyNumAggState no? -- Daniel Gustafsson https://vmware.com/
Re: Partial aggregates pushdown
Hi. Updated and rebased patch. Ilya Gladyshev писал 2021-11-02 00:31: Hi, On 21.10.2021 13:55, Alexander Pyhalov wrote: Hi. Updated patch. Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. I don't quite understand why this is restricted only to aggregates that have 'internal' state, I feel like that should be possible for any aggregate that has a function to convert its final result back to aggregate state to be pushed down. While I couldn't come up with a useful example for this, except maybe for an aggregate whose aggfinalfn is used purely for cosmetic purposes (e.g. format the result into a string), I still feel that it is an unnecessary restriction. I don't feel comfortable with it for the following reasons. - Now partial converters translate aggregate result to serialized internal representation. In case when aggregate type is different from internal state, we'd have to translate it to non-serialized internal representation, so converters should skip serialization step. This seems like introducing two kind of converters. - I don't see any system aggregates which would benefit from this. However, it doesn't seem to be complex, and if it seems to be desirable, it can be done. For now introduced check that transtype matches aggregate type (or is internal) in partial_agg_ok(). A few minor review notes to the patch: +static List *build_conv_list(RelOptInfo *foreignrel); this should probably be up top among other declarations. Moved it upper. @@ -1433,6 +1453,48 @@ postgresGetForeignPlan(PlannerInfo *root, outer_plan); } +/* + * Generate attinmeta if there are some converters: + * they are expecxted to return BYTEA, but real input type is likely different. + */ typo in word "expecxted". Fixed. @@ -139,10 +147,13 @@ typedef struct PgFdwScanState * for a foreign join scan. */ TupleDesctupdesc;/* tuple descriptor of scan */ AttInMetadata *attinmeta;/* attribute datatype conversion metadata */ +AttInMetadata *rcvd_attinmeta;/* metadata for received tuples, NULL if + * there's no converters */ Looks like rcvd_attinmeta is redundant and you could use attinmeta for conversion metadata. Seems so, removed it. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 71de85154f9ac78e99f4ce8bf9aa341fee748609 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 57 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 196 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 21 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 13 files changed, 674 insertions(+), 84 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..8cee12c1b2a 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,51 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get her
Re: Partial aggregates pushdown
On 11/1/21 22:53, Ilya Gladyshev wrote: On 01.11.2021 13:30, Alexander Pyhalov wrote: Peter Eisentraut писал 2021-11-01 12:47: On 21.10.21 12:55, Alexander Pyhalov wrote: Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. Converters are called locally, they transform aggregate result to serialized internal representation. As converters don't have access to internal aggregate state, partial aggregates like avg() are still not pushable. It seems to me that the system should be able to determine from the existing aggregate catalog entry whether an aggregate can be pushed down. For example, it could check aggtranstype != internal and similar. A separate boolean flag should not be necessary. Hi. I think we can't infer this property from existing flags. For example, if I have avg() with bigint[] argtranstype, it doesn't mean we can push down it. We couldn't also decide if partial aggregete is safe to push down based on aggfinalfn presence (for example, it is defined for sum(numeric), but we can push it down. I think one potential way to do it would be to allow pushing down aggregates that EITHER have state of the same type as their return type, OR have a conversion function that converts their return value to the type of their state. IMO just checking (aggtranstype == result type) entirely ignores the issue of portability - we've never required the aggregate state to be portable in any meaningful way (between architectures, minor/major versions, ...) and it seems foolish to just start relying on it here. Imagine for example an aggregate using bytea state, storing some complex C struct in it. You can't just copy that between architectures. It's a bit like why we don't simply copy data types to network, but pass them through input/output or send/receive functions. The new flag is a way to mark aggregates where this is safe, and I don't think we can do away without it. The more I think about this, the more I'm convinced the proper way to do this would be adding export/import functions, similar to serial/deserial functions, with the extra portability guarantees. And we'd need to do that for all aggregates, not just those with (aggtranstype == internal). I get it - the idea of the patch is that keeping the data types the same makes it much simpler to pass the aggregate state (compared to having to export/import it). But I'm not sure it's the right approach. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Partial aggregates pushdown
On 11/1/21 22:31, Ilya Gladyshev wrote: Hi, On 21.10.2021 13:55, Alexander Pyhalov wrote: Hi. Updated patch. Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. I don't quite understand why this is restricted only to aggregates that have 'internal' state, I feel like that should be possible for any aggregate that has a function to convert its final result back to aggregate state to be pushed down. While I couldn't come up with a useful example for this, except maybe for an aggregate whose aggfinalfn is used purely for cosmetic purposes (e.g. format the result into a string), I still feel that it is an unnecessary restriction. But it's *not* restricted to aggregates with internal state. The patch merely requires aggregates with "internal" state to have an extra "converter" function. That being said, I don't think the approach used to deal with internal state is the right one. AFAICS it simply runs the aggregate on the remote node, finalizes is there, and then uses the converter function to "expand" the partial result back into the internal state. Unfortunately that only works for aggregates like "sum" where the result is enough to rebuild the internal state, but it fails for anything more complex (like "avg" or "var"). Earlier in this thread I mentioned this to serial/deserial functions, and I think we need to do something like that for internal state. I.e. we need to call the "serial" function on the remote node, and which dumps the whole internal state, and then "deserial" on the local node. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Partial aggregates pushdown
On 01.11.2021 13:30, Alexander Pyhalov wrote: Peter Eisentraut писал 2021-11-01 12:47: On 21.10.21 12:55, Alexander Pyhalov wrote: Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. Converters are called locally, they transform aggregate result to serialized internal representation. As converters don't have access to internal aggregate state, partial aggregates like avg() are still not pushable. It seems to me that the system should be able to determine from the existing aggregate catalog entry whether an aggregate can be pushed down. For example, it could check aggtranstype != internal and similar. A separate boolean flag should not be necessary. Hi. I think we can't infer this property from existing flags. For example, if I have avg() with bigint[] argtranstype, it doesn't mean we can push down it. We couldn't also decide if partial aggregete is safe to push down based on aggfinalfn presence (for example, it is defined for sum(numeric), but we can push it down. I think one potential way to do it would be to allow pushing down aggregates that EITHER have state of the same type as their return type, OR have a conversion function that converts their return value to the type of their state.
Re: Partial aggregates pushdown
Hi, On 21.10.2021 13:55, Alexander Pyhalov wrote: Hi. Updated patch. Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. I don't quite understand why this is restricted only to aggregates that have 'internal' state, I feel like that should be possible for any aggregate that has a function to convert its final result back to aggregate state to be pushed down. While I couldn't come up with a useful example for this, except maybe for an aggregate whose aggfinalfn is used purely for cosmetic purposes (e.g. format the result into a string), I still feel that it is an unnecessary restriction. A few minor review notes to the patch: +static List *build_conv_list(RelOptInfo *foreignrel); this should probably be up top among other declarations. @@ -1433,6 +1453,48 @@ postgresGetForeignPlan(PlannerInfo *root, outer_plan); } +/* + * Generate attinmeta if there are some converters: + * they are expecxted to return BYTEA, but real input type is likely different. + */ typo in word "expec*x*ted". @@ -139,10 +147,13 @@ typedef struct PgFdwScanState * for a foreign join scan. */ TupleDesc tupdesc; /* tuple descriptor of scan */ AttInMetadata *attinmeta; /* attribute datatype conversion metadata */ + AttInMetadata *rcvd_attinmeta; /* metadata for received tuples, NULL if + * there's no converters */ Looks like rcvd_attinmeta is redundant and you could use attinmeta for conversion metadata. The last thing - the patch needs to be rebased, it doesn't apply cleanly on top of current master. Thanks, Ilya Gladyshev
Re: Partial aggregates pushdown
Peter Eisentraut писал 2021-11-01 12:47: On 21.10.21 12:55, Alexander Pyhalov wrote: Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. Converters are called locally, they transform aggregate result to serialized internal representation. As converters don't have access to internal aggregate state, partial aggregates like avg() are still not pushable. It seems to me that the system should be able to determine from the existing aggregate catalog entry whether an aggregate can be pushed down. For example, it could check aggtranstype != internal and similar. A separate boolean flag should not be necessary. Hi. I think we can't infer this property from existing flags. For example, if I have avg() with bigint[] argtranstype, it doesn't mean we can push down it. We couldn't also decide if partial aggregete is safe to push down based on aggfinalfn presence (for example, it is defined for sum(numeric), but we can push it down. Or if it is, the patch should provide some guidance about how an aggregate function author should set it. Where should it be provided? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
On 21.10.21 12:55, Alexander Pyhalov wrote: Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. Converters are called locally, they transform aggregate result to serialized internal representation. As converters don't have access to internal aggregate state, partial aggregates like avg() are still not pushable. It seems to me that the system should be able to determine from the existing aggregate catalog entry whether an aggregate can be pushed down. For example, it could check aggtranstype != internal and similar. A separate boolean flag should not be necessary. Or if it is, the patch should provide some guidance about how an aggregate function author should set it.
Re: Partial aggregates pushdown
Zhihong Yu писал 2021-10-22 00:43: Hi, w.r.t. 0001-Partial-aggregates-push-down-v03.patch Hi. For partial_agg_ok(), + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + ok = false; Since SearchSysCache1() is not called yet, you can return false directly. Fixed. + if (aggform->aggpartialpushdownsafe != true) The above can be written as: if (!aggform->aggpartialpushdownsafe) Fixed. For build_conv_list(): + Oid converter_oid = InvalidOid; + + if (IsA(tlentry->expr, Aggref)) ... + } + convlist = lappend_oid(convlist, converter_oid); Do you intend to append InvalidOid to convlist (when tlentry->expr is not Aggref) ? Yes, for each tlist member (which matches fpinfo->grouped_tlist in case when foreignrel is UPPER_REL) we need to find corresponding converter. If we don't append InvalidOid, we can't find convlist member, corresponding to tlist member. Added comments to build_conv_list. Also fixed error in pg_dump.c (we selected '0' when aggpartialconverterfn was not defined in schema, but checked for '-'). -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom a18e2ff8de00592797e7c3ccb8d6cd536a2e4e72 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 46 +++- .../postgres_fdw/expected/postgres_fdw.out| 185 +++- contrib/postgres_fdw/postgres_fdw.c | 204 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 +- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 21 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_aggregate.dat | 106 - src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 13 files changed, 672 insertions(+), 83 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..fa9f487d66d 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,40 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + bool ok = true; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (!aggform->aggpartialpushdownsafe) + ok = false; + + /* + * But if an aggregate requires serialization/deserialization, partial + * converter should be defined + */ + if (ok && agg->aggtranstype == INTERNALOID) + ok = (aggform->aggpartialconverterfn != InvalidOid); + + ReleaseSysCache(aggtup); + + return ok; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 44c4367b8f9..3e1b997875e 100644 --- a/contrib/postgres_fdw/expected/postgres_
Re: Partial aggregates pushdown
Hi, w.r.t. 0001-Partial-aggregates-push-down-v03.patch For partial_agg_ok(), + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + ok = false; Since SearchSysCache1() is not called yet, you can return false directly. + if (aggform->aggpartialpushdownsafe != true) The above can be written as: if (!aggform->aggpartialpushdownsafe) For build_conv_list(): + Oid converter_oid = InvalidOid; + + if (IsA(tlentry->expr, Aggref)) ... + } + convlist = lappend_oid(convlist, converter_oid); Do you intend to append InvalidOid to convlist (when tlentry->expr is not Aggref) ? Cheers
Re: Partial aggregates pushdown
Tomas Vondra писал 2021-10-19 16:25: On 10/19/21 08:56, Alexander Pyhalov wrote: Hi. Tomas Vondra писал 2021-10-15 17:56: As for the proposed approach, it's probably good enough for the first version to restrict this to aggregates where the aggregate result is sufficient, i.e. we don't need any new export/import procedures. But it's very unlikely we'd want to restrict it the way the patch does it, i.e. based on aggregate name. That's both fragile (people can create new aggregates with such name) and against the PostgreSQL extensibility (people may implement custom aggregates, but won't be able to benefit from this just because of name). So for v0 maybe, but I think there neeeds to be a way to relax this in some way, for example we could add a new flag to pg_aggregate to mark aggregates supporting this. Updated patch to mark aggregates as pushdown-safe in pg_aggregates. So far have no solution for aggregates with internal aggtranstype. Hi. Updated patch. Now aggregates with internal states can be pushed down, if they are marked as pushdown safe (this flag is set to true for min/max/sum), have internal states and associated converters. Converters are called locally, they transform aggregate result to serialized internal representation. As converters don't have access to internal aggregate state, partial aggregates like avg() are still not pushable. For now the overall logic is quite simple. We now also call add_foreign_grouping_paths() for partial aggregation. In foreign_expr_walker() we check if aggregate is pushable (which means that it is simple, marked as pushable and if has 'internal' as aggtranstype, has associated converter). If it is pushable, we proceed as with usual aggregates (but forbid having pushdown). During postgresGetForeignPlan() we produce list of converters for aggregates. As converters has different input argument type from their result (bytea), we have to generate alternative metadata, which is used by make_tuple_from_result_row(). If make_tuple_from_result_row() encounters field with converter, it calls converter and returns result. For now we expect converter to have only one input and output argument. Existing converters just transform input value to internal representation and return its serialized form. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 52cd61fdb5cb5fceeacd832462468d8676f57ca6 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 49 - .../postgres_fdw/expected/postgres_fdw.out| 185 - contrib/postgres_fdw/postgres_fdw.c | 195 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 27 ++- src/backend/catalog/pg_aggregate.c| 28 ++- src/backend/commands/aggregatecmds.c | 23 ++- src/backend/utils/adt/numeric.c | 96 + src/bin/pg_dump/pg_dump.c | 21 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_aggregate.dat | 106 +- src/include/catalog/pg_aggregate.h| 10 +- src/include/catalog/pg_proc.dat | 6 + src/test/regress/expected/oidjoins.out| 1 + 13 files changed, 666 insertions(+), 83 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..50ef1009b97 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,43 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +st
Re: Partial aggregates pushdown
On 10/19/21 08:56, Alexander Pyhalov wrote: Hi. Tomas Vondra писал 2021-10-15 17:56: As for the proposed approach, it's probably good enough for the first version to restrict this to aggregates where the aggregate result is sufficient, i.e. we don't need any new export/import procedures. But it's very unlikely we'd want to restrict it the way the patch does it, i.e. based on aggregate name. That's both fragile (people can create new aggregates with such name) and against the PostgreSQL extensibility (people may implement custom aggregates, but won't be able to benefit from this just because of name). So for v0 maybe, but I think there neeeds to be a way to relax this in some way, for example we could add a new flag to pg_aggregate to mark aggregates supporting this. Updated patch to mark aggregates as pushdown-safe in pg_aggregates. So far have no solution for aggregates with internal aggtranstype. Thanks. Please add it to the next CF, so that we don't lose track of it. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Partial aggregates pushdown
Hi. Tomas Vondra писал 2021-10-15 17:56: As for the proposed approach, it's probably good enough for the first version to restrict this to aggregates where the aggregate result is sufficient, i.e. we don't need any new export/import procedures. But it's very unlikely we'd want to restrict it the way the patch does it, i.e. based on aggregate name. That's both fragile (people can create new aggregates with such name) and against the PostgreSQL extensibility (people may implement custom aggregates, but won't be able to benefit from this just because of name). So for v0 maybe, but I think there neeeds to be a way to relax this in some way, for example we could add a new flag to pg_aggregate to mark aggregates supporting this. Updated patch to mark aggregates as pushdown-safe in pg_aggregates. So far have no solution for aggregates with internal aggtranstype. -- Best regards, Alexander Pyhalov, Postgres ProfessionalFrom 823a389caf003a21dd4c8e758f89d08ba89c5856 Mon Sep 17 00:00:00 2001 From: Alexander Pyhalov Date: Thu, 14 Oct 2021 17:30:34 +0300 Subject: [PATCH] Partial aggregates push down --- contrib/postgres_fdw/deparse.c| 45 +++- .../postgres_fdw/expected/postgres_fdw.out| 215 +- contrib/postgres_fdw/postgres_fdw.c | 29 ++- contrib/postgres_fdw/sql/postgres_fdw.sql | 31 ++- src/backend/catalog/pg_aggregate.c| 4 +- src/backend/commands/aggregatecmds.c | 6 +- src/include/catalog/catversion.h | 2 +- src/include/catalog/pg_aggregate.dat | 101 src/include/catalog/pg_aggregate.h| 6 +- 9 files changed, 362 insertions(+), 77 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index d98bd666818..cf6b2d9f066 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -196,6 +196,7 @@ static bool is_subquery_var(Var *node, RelOptInfo *foreignrel, static void get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, int *relno, int *colno); +static bool partial_agg_ok(Aggref *agg); /* * Examine each qual clause in input_conds, and classify them into two groups, @@ -831,8 +832,10 @@ foreign_expr_walker(Node *node, if (!IS_UPPER_REL(glob_cxt->foreignrel)) return false; -/* Only non-split aggregates are pushable. */ -if (agg->aggsplit != AGGSPLIT_SIMPLE) +if ((agg->aggsplit != AGGSPLIT_SIMPLE) && (agg->aggsplit != AGGSPLIT_INITIAL_SERIAL)) + return false; + +if (agg->aggsplit == AGGSPLIT_INITIAL_SERIAL && !partial_agg_ok(agg)) return false; /* As usual, it must be shippable. */ @@ -3249,7 +3252,7 @@ deparseAggref(Aggref *node, deparse_expr_cxt *context) bool use_variadic; /* Only basic, non-split aggregation accepted. */ - Assert(node->aggsplit == AGGSPLIT_SIMPLE); + Assert(node->aggsplit == AGGSPLIT_SIMPLE || node->aggsplit == AGGSPLIT_INITIAL_SERIAL); /* Check if need to print VARIADIC (cf. ruleutils.c) */ use_variadic = node->aggvariadic; @@ -3719,3 +3722,39 @@ get_relation_column_alias_ids(Var *node, RelOptInfo *foreignrel, /* Shouldn't get here */ elog(ERROR, "unexpected expression in subquery output"); } + +/* + * Check that partial aggregate agg is fine to push down + */ +static bool +partial_agg_ok(Aggref *agg) +{ + HeapTuple aggtup; + Form_pg_aggregate aggform; + + Assert(agg->aggsplit == AGGSPLIT_INITIAL_SERIAL); + + /* We don't support complex partial aggregates */ + if (agg->aggdistinct || agg->aggvariadic || agg->aggkind != AGGKIND_NORMAL || agg->aggorder != NIL) + return false; + + /* Can't process aggregates which require serialization/deserialization */ + if (agg->aggtranstype == INTERNALOID) + return false; + + aggtup = SearchSysCache1(AGGFNOID, ObjectIdGetDatum(agg->aggfnoid)); + if (!HeapTupleIsValid(aggtup)) + elog(ERROR, "cache lookup failed for function %u", agg->aggfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggtup); + + /* Only aggregates, marked as pushdown safe, are allowed */ + if (aggform->aggpartialpushdownsafe != true) + { + ReleaseSysCache(aggtup); + return false; + } + + ReleaseSysCache(aggtup); + + return true; +} diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 44c4367b8f9..89451e208e0 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9279,13 +9279,13 @@ RESET enable_partitionwise_join; -- === -- test partitionwise aggregates -- === -CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a); +CREATE TABLE pagg_tab (a int, b int, c text, d numeric) PARTITION BY RANGE(a); CREATE TABLE pagg_tab_p1 (LIKE pagg_tab); CREATE TABLE pagg_tab_p2 (LIKE pagg_tab); CREATE TABLE pagg_tab_p3 (LIKE pagg_tab); -INSERT INTO pagg
Re: Partial aggregates pushdown
On 10/15/21 21:31, Stephen Frost wrote: Greetings, * Tomas Vondra (tomas.von...@enterprisedb.com) wrote: On 10/15/21 17:05, Alexander Pyhalov wrote: Tomas Vondra писал 2021-10-15 17:56: And then we should extend this for aggregates with more complex internal states (e.g. avg), by supporting a function that "exports" the aggregate state - similar to serial/deserial functions, but needs to be portable. I think the trickiest thing here is rewriting the remote query to call this export function, but maybe we could simply instruct the remote node to use a different final function for the top-level node? If we have some special export function, how should we find out that remote server supports this? Should it be server property or should it somehow find out it while connecting to the server? Good question. I guess there could be some initial negotiation based on remote node version etc. And we could also disable this pushdown for older server versions, etc. Yeah, I'd think we would just only support it on versions where we know it's available. That doesn't seem terribly difficult. Yeah. But maybe Alexander was concerned about cases where the nodes disagree on the aggregate definition, so one node might have the export function and the other would not. E.g. the remote node may have older version of an extension implementing the aggregate, without the export function (although the server version supports it). I don't think we can do much about that, it's just one of many issues that may be caused by mismatching schemas. I wonder if this might get more complex, though. Imagine for example a partitioned table on node A with a FDW partition, pointing to a node B. But on B, the object is partitioned again, with one partition placed on C. So it's like A -> partition on B -> partition on C When planning on A, we can consider server version on B. But what if C is an older version, not supporting the export function? Bot sure if this makes any difference, though ... in the worst case it will error out, and we should have a way to disable the feature on A. But after that, I think we can treat this just like other definitions between local/remote node - we'd assume they match (i.e. the remote server has the export function), and then we'd get an error if it does not. If you need to use remote nodes without an export function, you'd have to disable the pushdown. AFAICS this works both for case with explicit query rewrite (i.e. we send SQL with calls to the export function) and implicit query rewrite (where the remote node uses a different finalize function based on mode, specified by GUC). Not quite sure where to drop this, but I've always figured we'd find a way to use the existing PartialAgg / FinalizeAggregate bits which are used for parallel query when it comes to pushing down to foreign servers to perform aggregates. That also gives us how to serialize the results, though we'd have to make sure that works across different architectures.. I've not looked to see if that's the case today. It sure is similar to what serial/deserial functions do for partial aggs, but IIRC the functions were not designed to be portable. I think we don't even require compatibility across minor releases, because we only use this to copy data between workers running at the same time. Not saying it can't be made to work, of course. Then again, being able to transform an aggregate into a partial aggregate that runs as an actual SQL query would mean we do partial aggregate push-down against non-PG FDWs and that'd be pretty darn neat, so maybe that's a better way to go, if we can figure out how. (I mean, for avg it's pretty easy to just turn that into a SELECT that grabs the sum and the count and use that.. other aggregates are more complicated though and that doesn't work, maybe we need both?) Maybe, but that seems like a very different concept - transforming the SQL so that it calculates different set of aggregates that we know can be pushed down easily. But I don't recall any other practical example beyond the AVG() -> SUM()/COUNT(). Well, VAR() can be translated into SUM(X), SUM(X^2). Another thing is how many users would actually benefit from this. I mean, for this to matter you need partitioned table with partitions placed on a non-PG FDW, right? Seems like a pretty niche use case. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Partial aggregates pushdown
Greetings, * Tomas Vondra (tomas.von...@enterprisedb.com) wrote: > On 10/15/21 17:05, Alexander Pyhalov wrote: > >Tomas Vondra писал 2021-10-15 17:56: > >>And then we should extend this for aggregates with more complex > >>internal states (e.g. avg), by supporting a function that "exports" > >>the aggregate state - similar to serial/deserial functions, but needs > >>to be portable. > >> > >>I think the trickiest thing here is rewriting the remote query to call > >>this export function, but maybe we could simply instruct the remote > >>node to use a different final function for the top-level node? > > > >If we have some special export function, how should we find out that > >remote server supports this? Should it be server property or should it > >somehow find out it while connecting to the server? > > Good question. I guess there could be some initial negotiation based on > remote node version etc. And we could also disable this pushdown for older > server versions, etc. Yeah, I'd think we would just only support it on versions where we know it's available. That doesn't seem terribly difficult. > But after that, I think we can treat this just like other definitions > between local/remote node - we'd assume they match (i.e. the remote server > has the export function), and then we'd get an error if it does not. If you > need to use remote nodes without an export function, you'd have to disable > the pushdown. > > AFAICS this works both for case with explicit query rewrite (i.e. we send > SQL with calls to the export function) and implicit query rewrite (where the > remote node uses a different finalize function based on mode, specified by > GUC). Not quite sure where to drop this, but I've always figured we'd find a way to use the existing PartialAgg / FinalizeAggregate bits which are used for parallel query when it comes to pushing down to foreign servers to perform aggregates. That also gives us how to serialize the results, though we'd have to make sure that works across different architectures.. I've not looked to see if that's the case today. Then again, being able to transform an aggregate into a partial aggregate that runs as an actual SQL query would mean we do partial aggregate push-down against non-PG FDWs and that'd be pretty darn neat, so maybe that's a better way to go, if we can figure out how. (I mean, for avg it's pretty easy to just turn that into a SELECT that grabs the sum and the count and use that.. other aggregates are more complicated though and that doesn't work, maybe we need both?) Thanks, Stephen signature.asc Description: PGP signature
Re: Partial aggregates pushdown
On 10/15/21 17:05, Alexander Pyhalov wrote: Tomas Vondra писал 2021-10-15 17:56: Hi Alexander, Hi. And then we should extend this for aggregates with more complex internal states (e.g. avg), by supporting a function that "exports" the aggregate state - similar to serial/deserial functions, but needs to be portable. I think the trickiest thing here is rewriting the remote query to call this export function, but maybe we could simply instruct the remote node to use a different final function for the top-level node? If we have some special export function, how should we find out that remote server supports this? Should it be server property or should it somehow find out it while connecting to the server? Good question. I guess there could be some initial negotiation based on remote node version etc. And we could also disable this pushdown for older server versions, etc. But after that, I think we can treat this just like other definitions between local/remote node - we'd assume they match (i.e. the remote server has the export function), and then we'd get an error if it does not. If you need to use remote nodes without an export function, you'd have to disable the pushdown. AFAICS this works both for case with explicit query rewrite (i.e. we send SQL with calls to the export function) and implicit query rewrite (where the remote node uses a different finalize function based on mode, specified by GUC). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Partial aggregates pushdown
Tomas Vondra писал 2021-10-15 17:56: Hi Alexander, Hi. And then we should extend this for aggregates with more complex internal states (e.g. avg), by supporting a function that "exports" the aggregate state - similar to serial/deserial functions, but needs to be portable. I think the trickiest thing here is rewriting the remote query to call this export function, but maybe we could simply instruct the remote node to use a different final function for the top-level node? If we have some special export function, how should we find out that remote server supports this? Should it be server property or should it somehow find out it while connecting to the server? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: Partial aggregates pushdown
Hi Alexander, On 10/15/21 15:15, Alexander Pyhalov wrote: Hi. One of the issues when we try to use sharding in PostgreSQL is absence of partial aggregates pushdown. I see several opportunities to alleviate this issue. If we look at Citus, it implements aggregate, calculating internal state of an arbitrary agregate function and exporting it as text. So we could calculate internal states independently on all data sources and then finalize it, which allows to compute arbitrary aggregate. But, as mentioned in [1] thread, for some functions (like count/max/min/sum) we can just push down them. It seems easy and covers a lot of cases. For now there are still issues - for example you can't handle functions as avg() as we should somehow get its internal state or sum() variants, which need aggserialfn/aggdeserialfn. Preliminary version is attached. Is someone else working on the issue? Does suggested approach make sense? I think a couple people worked on this (or something similar/related) in the past, but I don't recall any recent patches. IMHO being able to push-down parts of an aggregation to other nodes is a very desirable feature, that might result in huge improvements for some analytical workloads. As for the proposed approach, it's probably good enough for the first version to restrict this to aggregates where the aggregate result is sufficient, i.e. we don't need any new export/import procedures. But it's very unlikely we'd want to restrict it the way the patch does it, i.e. based on aggregate name. That's both fragile (people can create new aggregates with such name) and against the PostgreSQL extensibility (people may implement custom aggregates, but won't be able to benefit from this just because of name). So for v0 maybe, but I think there neeeds to be a way to relax this in some way, for example we could add a new flag to pg_aggregate to mark aggregates supporting this. And then we should extend this for aggregates with more complex internal states (e.g. avg), by supporting a function that "exports" the aggregate state - similar to serial/deserial functions, but needs to be portable. I think the trickiest thing here is rewriting the remote query to call this export function, but maybe we could simply instruct the remote node to use a different final function for the top-level node? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company