On Mon, Nov 21, 2022 at 5:02 PM fujii.y...@df.mitsubishielectric.co.jp < fujii.y...@df.mitsubishielectric.co.jp> wrote:
> Hi Mr.Vondra, Mr.Pyhalov, Everyone. > > I discussed with Mr.Pyhalov about the above draft by directly sending mail > to > him(outside of pgsql-hackers). Mr.Pyhalov allowed me to update his patch > along with the above draft. So I update Mr.Pyhalov's patch v10. > > I wrote my patch for discussion. > My patch passes regression tests which contains additional basic > postgres_fdw tests > for my patch's feature. But my patch doesn't contain sufficient documents > and tests. > If reviewers accept my approach, I will add documents and tests to my > patch. > > The following is a my patch's readme. > # I simplified the above draft. > > --readme of my patch > 1. interface > 1) pg_aggregate > There are the following additional columns. > a) partialaggfn > data type : regproc. > default value: zero(means invalid). > description : This field refers to the special aggregate function(then > we call > this partialaggfunc) > corresponding to aggregation function(then we call src) which has > aggfnoid. > partialaggfunc is used for partial aggregation pushdown by > postgres_fdw. > The followings are differences between the src and the special > aggregate function. > difference1) result type > The result type is same as the src's transtype if the src's > transtype > is not internal. > Otherwise the result type is bytea. > difference2) final func > The final func does not exist if the src's transtype is not > internal. > Otherwize the final func returns serialized value. > For example, there is a partialaggfunc avg_p_int4 which corresponds to > avg(int4) > whose aggtranstype is _int4. > The result value of avg_p_int4 is a float8 array which consists of > count and > summation. avg_p_int4 does not have finalfunc. > For another example, there is a partialaggfunc avg_p_int8 which > corresponds to > avg(int8) whose aggtranstype is internal. > The result value of avg_p_int8 is a bytea serialized array which > consists of count > and summation. avg_p_int8 has finalfunc int8_avg_serialize which is > serialize function > of avg(int8). This field is zero if there is no partialaggfunc. > > b) partialagg_minversion > data type : int4. > default value: zero(means current version). > description : This field is the minimum PostgreSQL server version which > has > partialaggfunc. This field is used for checking compatibility of > partialaggfunc. > > The above fields are valid in tuples for builtin avg, sum, min, max, count. > There are additional records which correspond to partialaggfunc for avg, > sum, min, max, > count. > > 2) pg_proc > There are additional records which correspond to partialaggfunc for avg, > sum, min, max, > count. > > 3) postgres_fdw > postgres_fdw has an additional foreign server option server_version. > server_version is > integer value which means remote server version number. Default value of > server_version > is zero. server_version is used for checking compatibility of > partialaggfunc. > > 2. feature > postgres_fdw can pushdown partial aggregation of avg, sum, min, max, count. > Partial aggregation pushdown is fine when the following two conditions are > both true. > condition1) partialaggfn is valid. > condition2) server_version is not less than partialagg_minversion > postgres_fdw executes pushdown the patialaggfunc instead of a src. > For example, we issue "select avg_p_int4(c) from t" instead of "select > avg(c) from t" > in the above example. > > postgres_fdw can pushdown every aggregate function which supports partial > aggregation > if you add a partialaggfunc corresponding to the aggregate function by > create aggregate > command. > > 3. difference between my patch and Mr.Pyhalov's v10 patch. > 1) In my patch postgres_fdw can pushdown partial aggregation of avg > 2) In my patch postgres_fdw can pushdown every aggregate function which > supports partial > aggregation if you add a partialaggfunc corresponding to the aggregate > function. > > 4. sample commands in psql > \c postgres > drop database tmp; > create database tmp; > \c tmp > create extension postgres_fdw; > create server server_01 foreign data wrapper postgres_fdw options(host > 'localhost', dbname 'tmp', server_version '160000', async_capable 'true'); > create user mapping for postgres server server_01 options(user 'postgres', > password 'postgres'); > create server server_02 foreign data wrapper postgres_fdw options(host > 'localhost', dbname 'tmp', server_version '160000', async_capable 'true'); > create user mapping for postgres server server_02 options(user 'postgres', > password 'postgres'); > > create table t(dt timestamp, id int4, name text, total int4, val float4, > type int4, span interval) partition by list (type); > > create table t1(dt timestamp, id int4, name text, total int4, val float4, > type int4, span interval); > create table t2(dt timestamp, id int4, name text, total int4, val float4, > type int4, span interval); > > truncate table t1; > truncate table t2; > insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as > interval), t % 100, 'hoge' || t, 1, 1.1, 1, cast('1 seconds' as interval) > from generate_series(1, 100000, 1) t; > insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as > interval), t % 100, 'hoge' || t, 2, 2.1, 1, cast('2 seconds' as interval) > from generate_series(1, 100000, 1) t; > insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as > interval), t % 100, 'hoge' || t, 1, 1.1, 2, cast('1 seconds' as interval) > from generate_series(1, 100000, 1) t; > insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as > interval), t % 100, 'hoge' || t, 2, 2.1, 2, cast('2 seconds' as interval) > from generate_series(1, 100000, 1) t; > > create foreign table f_t1 partition of t for values in (1) server > server_01 options(table_name 't1'); > create foreign table f_t2 partition of t for values in (2) server > server_02 options(table_name 't2'); > > set enable_partitionwise_aggregate = on; > explain (verbose, costs off) select avg(total::int4), avg(total::int8) > from t; > select avg(total::int4), avg(total::int8) from t; > > Sincerely yours, > Yuuki Fujii > -- > Yuuki Fujii > Information Technology R&D Center Mitsubishi Electric Corporation > Hi, For partial_agg_compatible : + * Check that partial aggregate agg has compatibility If the `agg` refers to func parameter, the parameter name is aggform + int32 partialagg_minversion = PG_VERSION_NUM; + if (aggform->partialagg_minversion == PARTIALAGG_MINVERSION_DEFAULT) { + partialagg_minversion = PG_VERSION_NUM; I am curious why the same variable is assigned the same value twice. It seems the if block is redundant. + if ((fpinfo->server_version >= partialagg_minversion)) { + compatible = true; The above can be simplified as: return fpinfo->server_version >= partialagg_minversion; Cheers