Hi.

fujii.y...@df.mitsubishielectric.co.jp писал(а) 2024-02-22 10:20:
Hi. Mr.Haas, hackers.

I apologize for the significant delay since my last post.
I have conducted investigations and considerations regarding the remaining tasks as follows.
Would it be possible for you to review them?
In particular, could you please confirm if the approach mentioned in 1. is acceptable? If there are no issues with the direction outlined in 1., I plan to make a simple prototype based on this approach.

1. Transmitting state value safely between machines
From: Robert Haas <robertmh...@gmail.com>
Sent: Wednesday, December 6, 2023 10:25 PM
the problems around transmitting
serialized bytea blobs between machines that can't be assumed to fully trust each other will need to be addressed in some way, which seems like it will require a good deal of design work, forming some kind of consensus, and then implementation
work to follow.
I have considered methods for safely transmitting state values between different machines. I have taken into account the version policy of PostgreSQL (5 years of support) and the major version release cycle over the past 10 years (1 year), and as a result, I have made the assumption that transmission is allowed only when the difference between the local version and the remote version is 5 or less. I believe that by adding new components, "export function" and "import function", to the aggregate functions, and further introducing a new SQL keyword to the query syntax of aggregate expressions, we can address this issue. If the version of the local server is higher than or equal to the version of the remote server, the proposed method can be simplified. The export version mentioned later in (1) would not be necessary. Furthermore, if the version of the local server matches the version of the remote server, the proposed method can be further simplified. I would appreciate your input on reasonable assumptions regarding the differences in versions between the local server and the remote server. I will explain the specifications of the export function, import function, the new SQL keyword for aggregate expressions, and the behavior of query processing for partial aggregation separately.
(1) Export Function Specification
This function is another final function for partial aggregate.
This function converts the state value that represents the result of partial aggregation into a format that can be read by the local server. This function is called instead of the existing finalfunc during the final stage of aggregation when performing partial aggregation.
The conversion process described above will be referred to as "export".
The argument of an export function is the version of the server that will receive the return value.
Hereafter, this version will be referred to as the export version.
The concept of an export version is necessary to handle cases where the version of the local server is smaller than the version of the remote server. The return value of the export function is the transformed state value, and its data type is bytea. For backward compatibility, the developer of the export function must ensure that the export can be performed for major versions up to five versions prior to the major version of PostgreSQL that the export function is being developed for. For built-in functions, I believe it is necessary to allow for the possibility of not developing the export functionality for specific versions in the future (due to reasons such as development burden) after the export function is developed for a certain version. To achieve this, for built-in functions, we will add a column to the pg_aggregate catalog that indicates the presence or absence of export functionality for each major version, including the major version being developed and the previous five major versions. This column will be named safety_export_versions and will have a data type of boolean[6]. For user-defined functions, we will refer to the extensions option and add an external server option called safety_export_extensions, which will maintain a list of extensions that include only the aggregate functions that can be exported to the local server version.
...

I honestly think that achieving cross-version compatibility in this way puts a significant burden on developers. Can we instead always use the more or less universal export and import function to fix possible issues with binary representations on different architectures and just refuse to push down partial aggregates on server version mismatch? At least at the first step?


3. Fixing the behavior when the HAVING clause is present
From: Robert Haas <robertmh...@gmail.com>
Sent: Tuesday, November 28, 2023 4:08 AM

On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov <a.pyha...@postgrespro.ru> wrote:
> Hi. HAVING is also a problem. Consider the following query
>
> SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> foreign server as HAVING needs full aggregate result, but foreign
> server don't know it.

I don't see it that way. What we would push to the foreign server would be something like SELECT count(a) FROM t. Then, after we get the results back and combine the various partial counts locally, we would locally evaluate the HAVING clause afterward. That is, partial aggregation is a barrier to pushing down HAVING clause itself, but it doesn't preclude pushing
down the aggregation.
I have made modifications in the attached patch to ensure that when the HAVING clause is present, the HAVING clause is executed locally while the partial aggregations are pushed down.



Sorry, I don't see how it works. When we have partial aggregates and having clause, foreign_grouping_ok() returns false and add_foreign_grouping_paths() adds no paths.
I'm not saying it's necessary to fix this in the first patch version.

explain verbose select sum(a) from pagg_tab having sum(a)>10;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=2282.49..2282.50 rows=1 width=8)
   Output: sum(pagg_tab.a)
   Filter: (sum(pagg_tab.a) > 10)
   ->  Append  (cost=760.81..2282.48 rows=3 width=8)
         ->  Partial Aggregate  (cost=760.81..760.82 rows=1 width=8)
               Output: PARTIAL sum(pagg_tab.a)
-> Foreign Scan on public.fpagg_tab_p1 pagg_tab (cost=100.00..753.50 rows=2925 width=4)
                     Output: pagg_tab.a
                     Remote SQL: SELECT a FROM public.pagg_tab_p1
         ->  Partial Aggregate  (cost=760.81..760.82 rows=1 width=8)
               Output: PARTIAL sum(pagg_tab_1.a)
-> Foreign Scan on public.fpagg_tab_p2 pagg_tab_1 (cost=100.00..753.50 rows=2925 width=4)
                     Output: pagg_tab_1.a
                     Remote SQL: SELECT a FROM public.pagg_tab_p2
         ->  Partial Aggregate  (cost=760.81..760.82 rows=1 width=8)
               Output: PARTIAL sum(pagg_tab_2.a)
-> Foreign Scan on public.fpagg_tab_p3 pagg_tab_2 (cost=100.00..753.50 rows=2925 width=4)
                     Output: pagg_tab_2.a
                     Remote SQL: SELECT a FROM public.pagg_tab_p3


Also I have some minor notices on the code.

contrib/postgres_fdw/deparse.c: comment before appendFunctionName() has gone, this seems to be wrong.

In finalize_aggregate()

1079         /*
1080 * Apply the agg's finalfn if one is provided, else return transValue.
1081          */

Comment should be updated to note behavior for agg_partial aggregates.

1129         else if (peragg->aggref->agg_partial
1130 && (peragg->aggref->aggtranstype == INTERNALOID)
1131                         && OidIsValid(peragg->serialfn_oid))

In this if branch, should we check just for peragg->aggref->agg_partial and peragg->aggref->aggtranstype == INTERNALOID? It seems that if peragg->aggref->aggtranstype == INTERNALOID and there's no
serialfn_oid, it's likely an error (and one should be generated).

Overall patch seems nicer. Will look at it more this week.
--
Best regards,
Alexander Pyhalov,
Postgres Professional


Reply via email to