Re: Partial aggregates pushdown

Alexander Pyhalov Wed, 28 Feb 2024 05:43:30 -0800

Hi.

fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-02-22 10:20:

Hi. Mr.Haas, hackers.
I apologize for the significant delay since my last post.
I have conducted investigations and considerations regarding theremaining tasks as follows.
Would it be possible for you to review them?
In particular, could you please confirm if the approach mentioned in 1.is acceptable?If there are no issues with the direction outlined in 1., I plan tomake a simple prototype based on this approach.
1. Transmitting state value safely between machines
From: Robert Haas <robertmh...@gmail.com>
Sent: Wednesday, December 6, 2023 10:25 PM
the problems around transmitting
serialized bytea blobs between machines that can't be assumed to fullytrust each other will need to be addressed in someway, which seems like it will require a good deal of design work,forming some kind of consensus, and then implementation
work to follow.
I have considered methods for safely transmitting state values betweendifferent machines.I have taken into account the version policy of PostgreSQL (5 years ofsupport) and the major version release cycle over the past 10 years (1year), and as a result, I have made the assumption that transmission isallowed only when the difference between the local version and theremote version is 5 or less.I believe that by adding new components, "export function" and "importfunction", to the aggregate functions, and further introducing a newSQL keyword to the query syntax of aggregate expressions, we canaddress this issue.If the version of the local server is higher than or equal to theversion of the remote server, the proposed method can be simplified.The export version mentioned later in (1) would not be necessary.Furthermore, if the version of the local server matches the version ofthe remote server, the proposed method can be further simplified.I would appreciate your input on reasonable assumptions regarding thedifferences in versions between the local server and the remote server.I will explain the specifications of the export function, importfunction, the new SQL keyword for aggregate expressions, and thebehavior of query processing for partial aggregation separately.
(1) Export Function Specification
This function is another final function for partial aggregate.
This function converts the state value that represents the result ofpartial aggregation into a format that can be read by the local server.This function is called instead of the existing finalfunc during thefinal stage of aggregation when performing partial aggregation.
The conversion process described above will be referred to as "export".
The argument of an export function is the version of the server thatwill receive the return value.
Hereafter, this version will be referred to as the export version.
The concept of an export version is necessary to handle cases where theversion of the local server is smaller than the version of the remoteserver.The return value of the export function is the transformed state value,and its data type is bytea.For backward compatibility, the developer of the export function mustensure that the export can be performed for major versions up to fiveversions prior to the major version of PostgreSQL that the exportfunction is being developed for.For built-in functions, I believe it is necessary to allow for thepossibility of not developing the export functionality for specificversions in the future (due to reasons such as development burden)after the export function is developed for a certain version.To achieve this, for built-in functions, we will add a column to thepg_aggregate catalog that indicates the presence or absence of exportfunctionality for each major version, including the major version beingdeveloped and the previous five major versions. This column will benamed safety_export_versions and will have a data type of boolean[6].For user-defined functions, we will refer to the extensions option andadd an external server option called safety_export_extensions, whichwill maintain a list of extensions that include only the aggregatefunctions that can be exported to the local server version.
...

I honestly think that achieving cross-version compatibility in this wayputs a significant burden on developers. Can we instead always use themore or less universal export and import function to fix possible issueswith binary representations on different architectures and just refuseto push down partial aggregates on server version mismatch? At least atthe first step?

3. Fixing the behavior when the HAVING clause is present
From: Robert Haas <robertmh...@gmail.com>
Sent: Tuesday, November 28, 2023 4:08 AM
On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov<a.pyha...@postgrespro.ru> wrote:
> Hi. HAVING is also a problem. Consider the following query
>
> SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> foreign server as HAVING needs full aggregate result, but foreign
> server don't know it.
I don't see it that way. What we would push to the foreign serverwould be something like SELECT count(a) FROM t. Then,after we get the results back and combine the various partial countslocally, we would locally evaluate the HAVING clauseafterward. That is, partial aggregation is a barrier to pushing downHAVING clause itself, but it doesn't preclude pushing
down the aggregation.
I have made modifications in the attached patch to ensure that when theHAVING clause is present, the HAVING clause is executed locally whilethe partial aggregations are pushed down.

Sorry, I don't see how it works. When we have partial aggregates andhaving clause, foreign_grouping_ok() returns false andadd_foreign_grouping_paths() adds no paths.

I'm not saying it's necessary to fix this in the first patch version.

explain verbose select sum(a) from pagg_tab having sum(a)>10;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=2282.49..2282.50 rows=1 width=8)
   Output: sum(pagg_tab.a)
   Filter: (sum(pagg_tab.a) > 10)
   ->  Append  (cost=760.81..2282.48 rows=3 width=8)
         ->  Partial Aggregate  (cost=760.81..760.82 rows=1 width=8)
               Output: PARTIAL sum(pagg_tab.a)

-> Foreign Scan on public.fpagg_tab_p1 pagg_tab(cost=100.00..753.50 rows=2925 width=4)

                     Output: pagg_tab.a
                     Remote SQL: SELECT a FROM public.pagg_tab_p1
         ->  Partial Aggregate  (cost=760.81..760.82 rows=1 width=8)
               Output: PARTIAL sum(pagg_tab_1.a)

-> Foreign Scan on public.fpagg_tab_p2 pagg_tab_1(cost=100.00..753.50 rows=2925 width=4)

                     Output: pagg_tab_1.a
                     Remote SQL: SELECT a FROM public.pagg_tab_p2
         ->  Partial Aggregate  (cost=760.81..760.82 rows=1 width=8)
               Output: PARTIAL sum(pagg_tab_2.a)

-> Foreign Scan on public.fpagg_tab_p3 pagg_tab_2(cost=100.00..753.50 rows=2925 width=4)

                     Output: pagg_tab_2.a
                     Remote SQL: SELECT a FROM public.pagg_tab_p3


Also I have some minor notices on the code.

contrib/postgres_fdw/deparse.c: comment before appendFunctionName() hasgone, this seems to be wrong.


In finalize_aggregate()

1079         /*

1080 * Apply the agg's finalfn if one is provided, else returntransValue.

1081          */

Comment should be updated to note behavior for agg_partial aggregates.

1129         else if (peragg->aggref->agg_partial

1130 && (peragg->aggref->aggtranstype ==INTERNALOID)

1131                         && OidIsValid(peragg->serialfn_oid))

In this if branch, should we check just for peragg->aggref->agg_partialand peragg->aggref->aggtranstype == INTERNALOID? It seems that ifperagg->aggref->aggtranstype == INTERNALOID and there's no

serialfn_oid, it's likely an error (and one should be generated).

Overall patch seems nicer. Will look at it more this week.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: Partial aggregates pushdown

Reply via email to