RE: Partial aggregates pushdown

2024-07-08 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Jelte,

Thank you for comments and advises.

> From: Jelte Fennema-Nio 
> Sent: Monday, July 8, 2024 5:31 PM
> On Sun, 7 Jul 2024 at 23:46, fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > In my mind, Approach1 is superior. Therefore, if there are no objections 
> > this week, I plan to resume implementing
> Approach1 next week. I would appreciate it if anyone could discuss the topic 
> with me or ask questions.
> 
> Honestly, the more I think about this, the more I like Approach2. Not because 
> I disagree with you about some of the
> limitations of Approach2, but because I'd rather see those limitations fixed 
> in CREATE TYPE, instead of working around
> these limitations in CREATE AGGREGATE. That way more usages can benefit. 
> Detailed explanation and arguments below.
Firstly, I may have jumped to conclusions too quickly. I apologize that.
I would appreciate it if we clarify Approach 1 and Approach 2 more precisely 
and we can proceed with the discussion.

Before we get into the details, let me break down the main differences between 
Approach 1 and Approach 2.

The best thing about Approach2 is that it lets us send state values using the 
existing data type system.
I'm worried that if we choose Approach2, we might face some limits because we 
have to create new types.
But, we might be able to fix these limits if we look into it more. 

Approach1 doesn't make new types, so we can avoid these limits.
But, it means we have to make export/import functions that are similar to the 
typsend/typreceive functions.
So, we need to make sure if we really need this method.

Is this the right understanding?

> From: Jelte Fennema-Nio 
> Sent: Monday, July 8, 2024 5:59 PM
> On Sun, 30 Jun 2024 at 23:42, fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > Instead, I can make PARTIAL_AGGREGATE an unreserved word by placing it 
> > after the FILTER clause, like avg(c1) FILTER
> (WHERE c2 > 0) PARTIAL_AGGREGATE, and by marking it as an ASLABEL word like 
> FILTER.
> > I attached the patch of the method.
> > If there are no objections, I would like to proceed with the method 
> > described above.
> > I'd appreciate it if anyone comment the method.
> 
> I like this approach of using PARTIAL_AGGREGATE in the same place as the 
> FILTER clause.
Thank you for comment.

> On Sun, 7 Jul 2024 at 23:52, fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > My plan for advancing the patch involves the following steps:
> >   Step1. Decide the approach on transmitting state value.
> >   Step2. Implement code (including comments) and tests to support a subset 
> > of aggregate functions.
> > Specifically, I plan to support avg, sum, and other aggregate functions 
> > like min and max which don't need
> export/import functions.
> >   Step3. Add documentations.
> >
> > To clarify my approach, should I proceed with Step 3 before Step2?
> 
> (my opinion, Bruce might have a different one)
> 
> I think it's good that you split the original patch in two:
> 0001: non-internal partial aggregates
> 0002: internal partial aggregates
> 
> I think we're aligned on the general design of 0001. So I think now is 
> definitely the time to include documentation there, so
> we can discuss this patch in more detail, and move it forward.
> 
> I think generally for 0002 it would also be useful to have documentation, I 
> personally like reading it to understand the
> general design and then comparing that to the code. But I also understand 
> that the language differences between Japanese
> and English, makes writing such docs a significant effort for you. So I think 
> it would be fine to skip docs for 0002 for now
> until we decide on the approach we want to take for internal partial 
> aggregates.
At least for 0001, it seems like it would be a good idea to attach a document 
at this stage.

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2024-07-07 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Bruce.

> From: Bruce Momjian 
> Is there a reason the documentation is no longer a part of this patch?
> Can I help you keep it current?

Here are the reasons:
  Reason1. The approach differs significantly from the previous patch that 
included documentation, the below.

https://www.postgresql.org/message-id/attachment/152086/0001-Partial-aggregates-push-down-v34.patch
  Reason2. I have two options for transmitting the state value and I'm 
evaluating which one is optimal.
   One is what I presented you in PGConf.dev2024. The other is Jelte's one.
   He listened to my talk at the conference and gave me some useful comments on 
hackers. I'm very grateful that.
  Reason3. The implementation and test have been not finished yet.
Regarding Reason 2, I provided my conclusion in the previous message.

My plan for advancing the patch involves the following steps:
  Step1. Decide the approach on transmitting state value.
  Step2. Implement code (including comments) and tests to support a subset of 
aggregate functions.
Specifically, I plan to support avg, sum, and other aggregate functions 
like min and max which don't need export/import functions.
  Step3. Add documentations.

To clarify my approach, should I proceed with Step 3 before Step2?
I would appreciate your feedback on this.

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2024-07-07 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Jelte and hackers,

I've reconsidered which of the following two approaches is the best.
  Approach1: Adding export/import functions to transmit state values.
  Approach 2: Adding native types which are equal to state values.

In my mind, Approach1 is superior. Therefore, if there are no objections this 
week, I plan to resume implementing Approach1 next week. I would appreciate it 
if anyone could discuss the topic with me or ask questions.

I believe that while Approach1 has the extendability to support situations 
where local and remote major versions differ, Approach2 lacks this 
extendability. Additionally, it seems that Approach1 requires fewer additional 
lines of code compared to Approach2. I'm also concerned that Approach2 may 
cause the catalog pg_type to bloat.

Although Approach2 offers the benefit of avoiding the addition of columns to 
pg_aggregate, I think this benefit is smaller than the advantages of Approach1 
mentioned above.

Next, I will present my complete comparison. The comparison points are as 
follows:
  1. Extendability
  2. Amount of codes
  3. Catalog size
  4. Developer burden
  5. Additional columns to catalogs

1. Extendability
I believe it is crucial to support scenarios where the local and remote major 
versions may differ in the future (see the below).

https://www.postgresql.org/message-id/4012625.1701120204%40sss.pgh.pa.us

Regarding this aspect, I consider Approach1 superior to Approach2. The reason 
is that:
・The data type of an aggregate function's state value may change with each 
major version increment.
・In Approach1, by extending the export/import functionalities to include the 
major version in which the state value was created (refer to p.16 and p.17 of 
[1]), I can handle such situations.
・On the other hand, it appears that Approach2 fundamentally lacks the 
capability to support these scenarios.

2. Amount of codes
Regarding this aspect, I find Approach1 to be better than Approach2.
In Approach1, developers only need to export/import functions and can use a 
standardized format for transmitting state values.
In Approach2, developers have two options:
  Option1: Adding typinput/typoutput and typsend/typreceive.
  Option2: Adding typinput/typoutput only.
Option1 requires more lines of code, which may be seen as cumbersome by some 
developers.
Option2 restricts developers to using only text representation for transmitting 
state values, which I consider limiting.

3. Catalog size
Regarding this point, I believe Approach1 is better than Approach2.
In Approach1, theoretically, it is necessary to add export/import functions to 
pg_proc for each aggregate.
In Approach2, theoretically, it is necessary to add typoutput/typinput 
functions (and typsend/typreceive if necessary) to pg_proc and add a native 
type to pg_type for each aggregate.
I would like to emphasize that we should consider user-defined functions in 
addition to built-in aggregate functions.
I think most developers prefer to avoid bloating catalogs, even if they may not 
be able to specify exact reasons.
In fact, in Robert's previous review, he expressed a similar concern (see 
below).

https://www.postgresql.org/message-id/CA%2BTgmobvja%2Bjytj5zcEcYgqzOaeJiqrrJxgqDf1q%3D3k8FepuWQ%40mail.gmail.com

4. Developer burden.
Regarding this aspect, I believe Approach1 is better than Approach2.
In Approach1, developers have the following additional tasks:
  Task1-1: Create and define export/import functions.

In Approach2, developers have the following additional tasks:
  Task2-1: Create and define typoutput/input functions (and typesend/typreceive 
functions if necessary).
  Task2-2: Define a native type.

Approach1 requires fewer additional tasks, although the difference may be not 
substantial.

5. Additional columns to catalogs.
Regarding this aspect, Approach2 is better than Approach1.
Approach1 requires additional three columns in pg_aggregate, specifically the 
aggpartialpushdownsafe flag, export function reference, and import function 
reference.
Approach2 does not require any additional columns in catalogs.
However, over the past four years of discussions, no one has expressed concerns 
about additional columns in catalogs.

[1] 
https://www.postgresql.org/message-id/attachment/160659/PGConfDev2024_Presentation_Aggregation_Scaleout_FDW_Sharding_20240531.pdf

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2024-06-30 Thread fujii.y...@df.mitsubishielectric.co.jp
> From: Fujii Yuki 
> Sent: Monday, July 1, 2024 6:42 AM
> Hi hackers.
> 
> On Wed, Jun 5, 2024 at 9:15?AM fujii.y...@df.mitsubishielectric.co.jp 
>  wrote:
> >   Requirement2. Consider appropriate position of the new keyword 
> > "PARTIAL_AGGREGATE". (with Robert)
> > Existing patch: Before the target expression. Ex. avg(PARTIAL_AGGREGATE 
> > c1)
> > Ideal: Before the aggregate function. Ex. PARTIAL_AGGREGATE avg(c1)
> >   Requirement3. Consider to avoid to make the new keyword 
> > "PARTIAL_AGGREGATE" become a reserved word. (with
> Robert)
> > In the existing patch, "PARTIAL_AGGREGATE" is a reserved word.
> I considered the above two requirement.
> Based on my research, there is no way to use PARTIAL_AGGREGATE in front of a 
> function name without making it a
> reserved word.
With this way, I couldn't resolve shift/reduce conflicts.


RE: Partial aggregates pushdown

2024-06-24 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Jelte, hackers.

Thank you for explanations.

Actually, I have other tasks about "PARTIAL_AGGREAGATE" keyword
to respond Requirement1 and Requirement2 in the following mail.
https://www.postgresql.org/message-id/TYAPR01MB3088755F2281D41F5EEF06D495F92%40TYAPR01MB3088.jpnprd01.prod.outlook.com

After that tasks, I plan to compare your proposal with mine seriously, with 
additional POC patches if necessary.

I think that your proposal might seem to be a more fundamental solution.
However, to be honest, so far, I don't perfectly get the benefits and impacts 
by stopping usage of internal types
instead using a native types, especially on handling memory contexts of 
existing deserialization functions and
on the amount of codes to be modified or added.
The followings are the answers with the knowledge I have right now.

> From: Jelte Fennema-Nio 
> Sent: Tuesday, June 25, 2024 5:49 AM
> > However, I still have the following two questions.
> >
> > 1. Not necessary components of new native types Refer to pg_type.dat,
> > typinput and typoutput are required.
> > I think that in your proposal they are not necessary, so waste. I
> > think that it is not acceptable.
> > How can I resolve the problem?
> 
> I think requiring typinput/typoutput is a benefit personally, because that 
> makes it possible to do PARTIAL_AGGREGATE
> pushdown to a different PG major version. Also it makes it easier to debug 
> the partial aggregate values when using
> psql/pgregress. So yes, it requires implementing both binary (send/receive) 
> and text (input/output) serialization, but it also
> has some benefits. And in theory you might be able to skip implementing the 
> binary serialization, and rely purely on the text
> serialization to send partial aggregates between servers.
I see. It seems that adding new natives might make it easier to transmit the 
state values between local and remote have different major versions.
However, in my opinion, we should be careful to support the case in which local 
and remote have different major versions,
because the transtype of an aggregate function would may change in future major 
version due to
something related to the implementation.
Actually, something like that occurs recently, see
https://github.com/postgres/postgres/commit/519fc1bd9e9d7b408903e44f55f83f6db30742b7
I think the transtype of an aggregate function quite more changeable than 
retype.
Consequently, so far, I want to support the cases in which local and remote 
have the same major version.
If we try to resolve the limitation, it seems to need more additional codes.

And, I'm afraid that adding typinput/typoutput bothers the developers.
They also have to create a new native types in addition to create their new 
aggregate functions.
I wonder if this concern might outweigh the benefits for debugging.
And, if skipping send/receive, they have to select only the text representation 
on
the transmission of the state value. I think it is narrow.

> > 2. Many new native types
> > I think that, basically, each aggregate function does need a new native 
> > type.
> > For example,
> > avg(int8), avg(numeric), and var_pop(int4) has the same transtype, 
> > PolyNumAggState.
> > You said that it is enough to add only one native type like
> > state_poly_num_agg for supporting them, right?
> 
> Yes, correct. That's what I had in mind.
> 
> > But the combine functions of them might have quite different
> > expectation on the data items of PolyNumAggState like the range of
> > N(means count) and the true/false of calcSumX2 (the flag of
> > calculating sum of squares).
> > The final functions of them have the similar expectation.
> > So, I think that, responded to your proposal, each of them need a
> > native data type like state_int8_avg, state_numeric_avg, for safety.
> >
> > And, we do need a native type for an aggregate function whose
> > transtype is not internal and not pseudo.
> > For avg(int4), the current transtype is _int8.
> > However, I do need a validation check on the number of the array And
> > the positiveness of count(the first element of the array).
> > Responded to your proposal,
> > I do need a new native type like state_int4_avg.
> 
> To help avoid confusion let me try to restate what I think you mean
> here: You're worried about someone passing in a bogus native type into the 
> final/combine functions and then getting
> crashes and/or wrong results. With internal type people cannot do this 
> because they cannot manually call the
> combinefunc/finalfunc because the argument type is internal. To solve this 
> problem your suggestion is to make the type
> specific to the specific aggregate such that send/receive or input/output can 
> validate the input as reasonable. But this
> would then mean that we have many native types (and also many 
> deserialize/serialize functions).
Yes, right.

> Assuming that's indeed what you meant, that's an interesting thought, I 
> didn't consider that much indeed. My thinking was
> that we only

RE: Partial aggregates pushdown

2024-06-24 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi. Jelte, hackers.

Thank you for your proposal and comments.

> From: Jelte Fennema-Nio 
> Sent: Monday, June 24, 2024 6:09 PM
> > 1. Generality
> > I believe we should develop a method that can theoretically apply to any
> aggregate function, even if we cannot implement it immediately. However, I 
> find
> it exceptionally challenging to demonstrate that any internal transtype can be
> universally converted to a native data type for aggregate functions that are
> parallel-safe under the current parallel query feature. Specifically, proving 
> this
> for user-defined aggregate functions presents a significant difficulty in my
> view.
> > On the other hand, I think that the usage of export and import functions can
> theoretically apply to any aggregate functions.
> 
> The only thing required when doing CREATE TYPE is having an INPUT and
> OUTPUT function for the type, which (de)serialize the type to text format. As
> far as I can tell by definition that requirement should be fine for any 
> aggregates
> that we can do partial aggregation pushdown for. To clarify I'm not suggesting
> we should change any of the internal representation of the type for the 
> current
> internal aggregates. I'm suggesting we create new native types (i.e. do CREATE
> TYPE) for those internal representations and then use the name of that type
> instead of internal.
I see. I maybe got your proposal.
Refer to your proposal, for avg(int8), 
I create a new native type like state_int8_avg
with the new typsend/typreceive functions
and use them to transmit the state value, right?

That might seem to be a more fundamental solution
because I can avoid adding export/import functions of my proposal,
which are the new components of aggregate function.
I have never considered the solution.
I appreciate your proposal.

However, I still have the following two questions.

1. Not necessary components of new native types
Refer to pg_type.dat, typinput and typoutput are required.
I think that in your proposal they are not necessary,
so waste. I think that it is not acceptable.
How can I resolve the problem? 

2. Many new native types
I think that, basically, each aggregate function does need a new native type.
For example,
avg(int8), avg(numeric), and var_pop(int4) has the same transtype, 
PolyNumAggState.
You said that it is enough to add only one native type like state_poly_num_agg
for supporting them, right?

But the combine functions of them might have quite different expectation
on the data items of PolyNumAggState like
the range of N(means count) and the true/false of calcSumX2
(the flag of calculating sum of squares).
The final functions of them have the similar expectation.
So, I think that, responded to your proposal,
each of them need a native data type
like state_int8_avg, state_numeric_avg, for safety.

And, we do need a native type for an aggregate function
whose transtype is not internal and not pseudo.
For avg(int4), the current transtype is _int8.
However, I do need a validation check on the number of the array
And the positiveness of count(the first element of the array).
Responded to your proposal,
I do need a new native type like state_int4_avg.

Consequently, I think that, responded to your proposal, finally
each of aggregate functions need a new native type
with typinput and typoutput.
That seems need the same amount of codes and more catalog data,
right?

> > 2. Amount of codes.
> > It could need more codes.
> 
> I think it would be about the same as your proposal. Instead of serializing 
> to an
> intermediary existing type you serialize to string straight away. I think it 
> would
> probably be slightly less code actually, since you don't have to add code to
> handle the new aggpartialexportfn and aggpartialimportfn columns.
> 
> > 3. Concern about performance
> > I'm concerned that altering the current internal data types could impact
> performance.
> 
> As explained above in my proposal all the aggregation code would remain
> unchanged, only new native types will be added. Thus performance won't be
> affected, because all aggregation code will be the same. The only thing that's
> changed is that the internal type now has a name and an INPUT and OUTPUT
> function.
I got it. Thank you.

> Not a full review but some initial notes:
Thank you. I don't have time today, so I'll answer after tomorrow.

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2024-06-23 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Alexander, hackers.

> From: Alexander Pyhalov 
> Sent: Tuesday, May 28, 2024 2:45 PM
> The fix was to set child_agg->agg_partial to orig_agg->agg_partial in
> convert_combining_aggrefs(), it's already in the patch, as well as the 
> example -
> without this fix
I've just realized that you've added the necessary tests. I forgot to respond, 
my apologies.

> From: Alexander Pyhalov 
> Sent: Tuesday, May 28, 2024 2:58 PM
> BTW, there's I have an issue with test results in the last version of the 
> patch.
> Attaching regression diffs.
> I have partial sum over c_interval instead of sum(c_interval).
I think the difference stems from the commit[1], which add serialization 
function
to sum(interval). I will fix it.

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation

[1] 
https://github.com/postgres/postgres/commit/519fc1bd9e9d7b408903e44f55f83f6db30742b7






RE: Partial aggregates pushdown

2024-06-11 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Jelte,

Thank you for your comment!
> From: Jelte Fennema-Nio 
> Sent: Tuesday, June 11, 2024 11:00 PM
> How about instead of trying to serialize the output of serialfn/deserialfn, 
> instead we don't use the "internal" type and
> create actual types in pg_type for these transtypes? Then we can simply use 
> the in/out and recv/send functions of those
> types to serialize the values of the partial aggregate over the network.
> Instead of having to rely on serialfn/deserialfn to be network-safe (which 
> they probably aren't).
> 
> That indeed still leaves the pseudo types. Since non of those pseudotypes 
> have a working in/recv function (they always
> error by definition), I agree that we can simply not support those.
> 
> Basically that would mean that any aggregate with a non-internal and 
> non-pseudotype as a transtype could be used in this
> multi-node partial aggregate pushdown.
Could you please clarify what you mean?
Are you referring to:
  Option 1: Modifying existing aggregate functions to minimize the use of 
internal state values.
  Option 2: Not supporting the push down of partial aggregates for functions 
with internal state values.
  Option 3: Something other than Option 1 and Option 2.

There are many aggregate functions with internal state values, so if we go with 
Option 1,
we might need to change a lot of existing code, like transition functions and 
finalize functions.
Also, I'm not sure how many existing aggregate functions can be modified this 
way.

There are also many popular functions with internal state values,
like sum(int8) and avg(int8)(see [1]), so I don't think Option2 would be 
acceptable.

Best regards, Yuki Fujii

--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation

[1] 
https://github.com/postgres/postgres/blob/REL_16_STABLE/src/include/catalog/pg_aggregate.dat


RE: Partial aggregates pushdown

2024-06-11 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi. Bruce.

Sorry for the late. Thank you for comment.

> From: Bruce Momjian 
> Sent: Wednesday, June 5, 2024 11:04 PM
> > > *  Passes unsafe binary data from the foreign server.
> > >
> > > Can someone show me where that last item is in the patch, and why
> > > can't we just pass back values cast to text?
> >
> > In finalize_aggregate() when we see partial aggregate with
> > peragg->aggref->aggtranstype = INTERNALOID
> > we call aggregate's serialization function and return it as bytea.
> >
> > The issue is that this internal representation isn't guaranteed to be
> > compatible between servers of different versions (or architectures?).
> > So, likely, we instead should have called some export function for
> > aggregate and later - some import function on postgres_fdw side. It
> > doesn't matter much, what this export function generates - text, json
> > or some portable binary format,
> > 1) export/import functions should just "understand" it,
> > 2) it should be a stable representation.
> 
> Okay, so looking at the serialization output functions already defined, I see 
> many zeros, which I assume means just the base
> data type, and eight
> more:
> 
>   SELECT DISTINCT aggserialfn from pg_aggregate WHERE aggserialfn::oid != 
> 0;
>   aggserialfn
>   ---
>numeric_avg_serialize
>string_agg_serialize
>array_agg_array_serialize
>numeric_serialize
>int8_avg_serialize
>array_agg_serialize
>interval_avg_serialize
>numeric_poly_serialize
> 
> I realize we need to return the sum and count for average, so that makes 
> sense.
> 
> So, we need import/export text representation for the partial aggregate mode 
> for these eight, and call the base data type
> text import/export functions for the zero ones when in this mode?

I think that you are basically right.
But, I think, in a perfect world we should also add an import/export function 
for the following
two category.

  Category1. Validation Chek is needed for Safety.
For example, I think a validation check is needed for avg(float4),
 whose transition type is not internal. (See p.18 in [1])
I plan to add import functions of avg, count (See p.18, p.19 in [1]).
  Category1. Transition type is a pseudo data type.
Aggregate functions of this category needs to accept many actual data types,
including user-defined types. So I think that it is hard to implement 
import/export functions.
Consequently, I do not plan to support these category. (See p.19 in [1])

Sincerely yours,
Yuki Fujii

--
Yuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

[1] 
https://www.postgresql.org/message-id/attachment/160659/PGConfDev2024_Presentation_Aggregation_Scaleout_FDW_Sharding_20240531.pdf




Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2024-06-11 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi. Tender.

> From: Tender Wang 
> Sent: Tuesday, June 11, 2024 5:11 PM
> 
>   > From: Tender Wang mailto:tndrw...@gmail.com> >
>   > Sent: Tuesday, June 4, 2024 7:51 PM
>   >   Please add more tests.  Especially please add some negative 
> tests;
>   >   current patch checks that it is safe to apply materialization. 
> It would
>   >   be helpful to add tests checking that materialization is not 
> applied
>   >   in both checked cases:
>   >   1. when inner join path is not parallel safe
>   >   2. when matpath is not parallel safe
>   >
>   >
>   >
>   > I added a test case that inner rel is not parallel safe. Actually,
>   > matpath will not create if inner rel is not parallel safe. So I 
> didn't add test case for the second  scenario.
>   Is there case in which matpath is not parallel safe and inner rel is 
> parallel safe?
>   If right, I think that it would be suitable to add a negative test in a 
> such case.
> 
> 
> 
> I looked through create_xxx_path(), and I found that almost 
> path.parallel_safe is assigned from
> RelOptiInfo.consider_parallel.
> Some pathes take subpath->parallel_safe into account(e.g. Material path). In 
> most cases, Material is parallel_safe if rel is
> parallel safe. Now I haven't come up a query plan that material is un 
> parallel-safe but rel is parallel-safe.
Thank you for looking into the source code. I understand the situation now.

Sincerely yours,
Yuki Fujii

--
Yuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


RE: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2024-06-04 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi. Tender.

Thank you for modification.

> From: Tender Wang 
> Sent: Tuesday, June 4, 2024 7:51 PM
>   Please add more tests.  Especially please add some negative tests;
>   current patch checks that it is safe to apply materialization. It would
>   be helpful to add tests checking that materialization is not applied
>   in both checked cases:
>   1. when inner join path is not parallel safe
>   2. when matpath is not parallel safe
> 
> 
> 
> I added a test case that inner rel is not parallel safe. Actually, 
> matpath will not create if inner rel is not parallel safe. So I didn't add 
> test case for the second  scenario.
Is there case in which matpath is not parallel safe and inner rel is parallel 
safe?
If right, I think that it would be suitable to add a negative test in a such 
case.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2024-03-21 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi. Mr.Momjian, Mr.Lane, Mr.Haas, hackers.

I apologize for any misunderstanding regarding the context of the attached 
patch and
the points on which I requested a review. Could you please allow me to clarify?

In the review around early December 2023, I received the following three issues 
pointed out by Mr.Haas[1].
1. Transmitting state value safely between machines
2. Making the patch clearer by adding SQL keywords
3. Fixing the behavior when the HAVING clause is present

In the email sent on February 22, 2024[2], I provided an update on the progress 
made in addressing these issues.
Regarding issue 1, I have only provided a proposed solution in the email and 
have not started the programming. 
Therefore, the latest patch is not in a commit-ready state. As mentioned later, 
we have also temporarily reverted the changes made to the documentation.
Before proceeding with the programming, I would like to discuss the proposed 
solution with the community and seek consensus.
If it is necessary to have source code in order to discuss, I can create a 
simple prototype so that I can receive your feedback.
Would you be able to provide your opinions on it?

Regarding issue 2., I have confirmed that creating a prototype allows us to 
address the issue and clear the patch.
In this prototype creation, the main purpose was to verify if the patch can be 
cleared and significant revisions were made to the previous version.
Therefore, I have removed all the document differences.
I have submitted a patch [3] that includes the fixes for issue 3. to the patch 
that was posted in [2].
Regarding the proposed solution for issue 1, unlike the patch posted in [3], 
we have a policy of not performing partial aggregation pushdown if we cannot 
guarantee compatibility and safety.
The latest patch in [3] is a POC patch. The patch that Mr. Momjian reviewed is 
this.
If user-facing documentation is needed for this POC patch, it can be added.

I apologize for the lack of explanation regarding this positioning, which may 
have caused misunderstandings regarding the patch posted in [3].

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoYCrtOvk2f32qQKZV%3DjNL35tandf2A2Dp_2F5ASuiG1BA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/TYAPR01MB5514F0CBD9CD4F84A261198195562%40TYAPR01MB5514.jpnprd01.prod.outlook.com
[3] 
https://www.postgresql.org/message-id/TYAPR01MB55141D18188AC86ADCE35FCB952F2%40TYAPR01MB5514.jpnprd01.prod.outlook.com

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




RE: [CAUTION!! freemail] Re: Partial aggregates pushdown

2023-12-06 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Haas.

> -Original Message-
> From: Robert Haas 
> Sent: Wednesday, December 6, 2023 10:25 PM
> On Wed, Dec 6, 2023 at 3:41 AM fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > Are you concerned about the hassle and potential human errors of
> > manually adding new partial aggregation functions, rather than the catalog 
> > becoming bloated?
> 
> I'm concerned about both.
Understood. Thank you for your response.

> > The process of creating partial aggregation functions from aggregation
> > functions can be automated, so I believe this issue can be resolved.
> > However, automating it may increase the size of the patch even more, so 
> > overall, approach#2 might be better.
> > To implement approach #2, it would be necessary to investigate how much 
> > additional code is required.
> 
> Yes. Unfortunately I fear that there is quite a lot of work left to do here 
> in order to produce a committable feature. To me it
> seems necessary to conduct an investigation of approach #2. If the result of 
> that investigation is that nothing major
> stands in the way of approach #2, then I think we should adopt it, which is 
> more work. In addition, the problems around
> transmitting serialized bytea blobs between machines that can't be assumed to 
> fully trust each other will need to be
> addressed in some way, which seems like it will require a good deal of design 
> work, forming some kind of consensus, and
> then implementation work to follow. In addition to that there may be some 
> small problems that need to be solved at a
> detail level, such as the HAVING issue. I think the last category won't be 
> too hard to sort out, but that still leaves two really
> major areas to address.
Yes, I agree with you. It is clear that further investigation and discussion 
are still needed. 
I would be grateful if we can resolve this issue gradually. I would also like 
to continue the discussion if possible in the future.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2023-12-06 Thread fujii.y...@df.mitsubishielectric.co.jp
 HAVING is also a problem. Consider the following query
> >
> > SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> > foreign server as HAVING needs full aggregate result, but foreign
> > server don't know it.
> 
> I don't see it that way. What we would push to the foreign server would be 
> something like SELECT count(a) FROM t. Then,
> after we get the results back and combine the various partial counts locally, 
> we would locally evaluate the HAVING clause
> afterward. That is, partial aggregation is a barrier to pushing down HAVING 
> clause itself, but it doesn't preclude pushing
> down the aggregation.
I understand what the problem is. I will try to fix it in the next version.

> From: Robert Haas 
> Sent: Tuesday, November 28, 2023 5:03 AM
> On Wed, Nov 22, 2023 at 5:16 AM fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > I did not choose Approach2 because I was not confident that the
> > disadvantage mentioned in 2.(2)(a) would be accepted by the PostgreSQL 
> > development community.
> > If it is accepted, I think Approach 2 is smarter.
> > Could you please provide your opinion on which approach is preferable
> > after comparing these two approaches?
> > If we cannot say anything without comparing the amount of source code,
> > as Mr.Momjian mentioned, we need to estimate the amount of source code 
> > required to implement Approach2.
> 
> I've had the same concern, that approach #2 would draw objections, so I think 
> you were right to be cautious about it. I
> don't think it is a wonderful approach in all ways, but I do think that it is 
> superior to approach #1. If we add dedicated
> support to the grammar, it is mostly a one-time effort, and after that, there 
> should not be much need for anyone to be
> concerned about it. If we instead add extra aggregates, then that generates 
> extra work every time someone writes a patch
> that adds a new aggregate to core. I have a difficult time believing that 
> anyone will prefer an approach that involves an
> ongoing maintenance effort of that type over one that doesn't.
> 
> One point that seems to me to be of particular importance is that if we add 
> new aggregates, there is a risk that some
> future aggregate might do that incorrectly, so that the main aggregate works, 
> but the secondary aggregate created for this
> feature does not work. That seems like it would be very frustrating for 
> future code authors so I'd like to avoid the risk as
> much as we can.
Are you concerned about the hassle and potential human errors of manually 
adding new partial
aggregation functions, rather than the catalog becoming bloated?
The process of creating partial aggregation functions from aggregation 
functions can be automated,
so I believe this issue can be resolved. However, automating it may increase 
the size of the patch
even more, so overall, approach#2 might be better.
To implement approach #2, it would be necessary to investigate how much 
additional code is required.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2023-11-26 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian, Mr.Haas, hackers.

> From: Bruce Momjian 
> Sent: Thursday, November 23, 2023 6:16 AM
> On Wed, Nov 22, 2023 at 10:16:16AM +, 
> fujii.y...@df.mitsubishielectric.co.jp wrote:
> > 2. Approach 2
> > (1) Advantages
> > (a) No need to add partial aggregate functions to the catalogs for
> > each aggregation
> > (2) Disadvantages
> > (a) Need to add non-standard keywords to the SQL syntax.
> >
> > I did not choose Approach2 because I was not confident that the
> > disadvantage mentioned in 2.(2)(a) would be accepted by the PostgreSQL 
> > development community.
> > If it is accepted, I think Approach 2 is smarter.
> > Could you please provide your opinion on which approach is preferable
> > after comparing these two approaches?
> 
> I didn't know #2 was possible, but given the great number of catalog entries, 
> doing it in the SQL grammar seems cleaner
> to me.
Thank you for comments. Yes, I understand.

> From: Bruce Momjian 
> Sent: Wednesday, November 22, 2023 5:34 AM
> On Tue, Nov 21, 2023 at 12:16:41PM -0500, Robert Haas wrote:
> > On Mon, Nov 20, 2023 at 5:48 PM Bruce Momjian  wrote:
> > > > I do have a concern about this, though. It adds a lot of bloat. It
> > > > adds a whole lot of additional entries to pg_aggregate, and every
> > > > new aggregate we add in the future will require a bonus entry for
> > > > this, and it needs a bunch of new pg_proc entries as well. One
> > > > idea that I've had in the past is to instead introduce syntax that
> > > > just does this, without requiring a separate aggregate definition in 
> > > > each case.
> > > > For example, maybe instead of changing string_agg(whatever) to
> > > > string_agg_p_text_text(whatever), you can say PARTIAL_AGGREGATE
> > > > string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or
> > > > something. Then all aggregates could be treated in a generic way.
> > > > I'm not completely sure that's better, but I think it's worth 
> > > > considering.
> > >
> > > So use an SQL keyword to indicates a pushdown call?  We could then
> > > automate the behavior rather than requiring special catalog functions?
> >
> > Right. It would require more infrastructure in the parser, planner,
> > and executor, but it would be infinitely reusable instead of needing a
> > new thing for every aggregate. I think that might be better, but to be
> > honest I'm not totally sure.
> 
> It would make it automatic.  I guess we need to look at how big the patch is 
> to do it.
I will investigate specifically which parts of the PostgreSQL source code need 
to be modified and how big the patch will be if you take this approach.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2023-11-22 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr. Haas, hackers.

Thank you for your thoughtful comments.

> From: Robert Haas 
> Sent: Tuesday, November 21, 2023 5:52 AM
> I do have a concern about this, though. It adds a lot of bloat. It adds a 
> whole lot of additional entries to pg_aggregate, and
> every new aggregate we add in the future will require a bonus entry for this, 
> and it needs a bunch of new pg_proc entries
> as well. One idea that I've had in the past is to instead introduce syntax 
> that just does this, without requiring a separate
> aggregate definition in each case.
> For example, maybe instead of changing string_agg(whatever) to 
> string_agg_p_text_text(whatever), you can say
> PARTIAL_AGGREGATE
> string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or something. 
> Then all aggregates could be treated
> in a generic way. I'm not completely sure that's better, but I think it's 
> worth considering.
I believe this comment addresses a fundamental aspect of the approach.
So, firstly, could we discuss whether we should fundamentally reconsider the 
approach?

The approach adopted in this patch is as follows.
Approach 1: Adding partial aggregation functions to the catalogs(pg_aggregate, 
pg_proc)

The approach proposed by Mr.Haas is as follows.
Approach 2: Adding a keyword to the SQL syntax to indicate partial aggregation 
requests

The amount of code required to implement Approach 2 has not been investigated,
but comparing Approach 1 and Approach 2 in other aspects, 
I believe they each have the following advantages and disadvantages. 

1. Approach 1
(1) Advantages
(a) No need to change the SQL syntax
(2) Disadvantages
(a) Catalog bloat
As Mr.Haas pointed out, the catalog will bloat by adding partial aggregation 
functions (e.g. avg_p_int8(int8)) 
for each individual aggregate function (e.g. avg(int8)) in pg_aggregate and 
pg_proc (theoretically doubling the size).
Some PostgreSQL developers and users may find this uncomfortable.
(b) Increase in manual procedures
Developers of new aggregate functions (both built-in and user-defined) need to 
manually add the partial aggregation
functions when defining the aggregate functions.
However, the procedure for adding partial aggregation functions for a certain 
aggregate function can be automated,
so this problem can be resolved by improving the patch.
The automation method involves the core part (AggregateCreate() and related 
functions) that executes
the CREATE AGGREGATE command for user-defined functions.
For built-in functions, it involves generating the initial data for the 
pg_aggregate catalog and pg_proc catalog from pg_aggregate.dat and pg_proc.dat
(using the genbki.pl script and related scripts).

2. Approach 2
(1) Advantages
(a) No need to add partial aggregate functions to the catalogs for each 
aggregation
(2) Disadvantages
(a) Need to add non-standard keywords to the SQL syntax.

I did not choose Approach2 because I was not confident that the disadvantage 
mentioned in 2.(2)(a)
would be accepted by the PostgreSQL development community.
If it is accepted, I think Approach 2 is smarter.
Could you please provide your opinion on which
approach is preferable after comparing these two approaches?
If we cannot say anything without comparing the amount of source code, as 
Mr.Momjian mentioned,
we need to estimate the amount of source code required to implement Approach2.

Sincerely yours,
Yuuki Fujii
 
--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2023-10-26 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Momjian.

Thank you for your improvement.
As a matter of detail, I think that the areas marked below are erroneous.

--
+   Pushdown causes aggregate function cals to send partial aggregate
 ^
+   function calls to the remote server. If the partial aggregate
+   function doesn't doesn't exist on the remote server, it causes
 ^^^
--

> What else needs to be done before committers start to review
> this?
There are no others. May I make a new version of v31 with your
suggested improvements for the committer's review?

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-10-23 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian.

> Fujii-san, to get this patch closer to finished, can I modify this version of 
> the patch to improve some wording and post an
> updated version you can use for future changes?
Yes, I greatly appreciate your offer.
I would very much appreciate your modifications.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-07-17 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

> From: Alexander Pyhalov 
> Sent: Friday, July 14, 2023 10:40 PM
> 1) In foreign_join_ok() should we set fpinfo->user if
> fpinfo->check_partial_aggregate_support is set like it's done for 
> fpinfo->use_remote_estimate? It seems we can end up with fpinfo->user 
> fpinfo->=
> NULL if use_remote_estimate is not set.
You are right. I will modify this patch according to your advice.
Thank you for advice.

> 2) It seeems we found an additional issue with original patch, which 
> is present in current one. I'm attaching a patch which seems to fix 
> it, but I'm not quite sure in it.
Thank you for pointing out the issue.
If a query's group-by clause contains variable based expression(not variable)
and the query's select clause contains another expression,
the partial aggregate could be unsafe to push down.

An example of such queries:
SELECT (b/2)::numeric, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b/2

Your patch disables partial aggregate pushdown for such queries.
I'll see if we can modify the patch to safely do a partial aggregate pushdown 
for such queries as well.
Such a query expects the variable in the select clause expression to be 
included in the target of the grouped rel
(let see make_partial_grouping_target), 
but the original groupby clause has no reference to this variable,
this seems to be the direct cause(let see foreign_grouping_ok). 
I will examine whether a safe pushdown can be achieved by matching the
groupby clause information referenced by foreign_grouping_ok with the grouped 
rel target information.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2023-06-21 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian, Mr.Pyhalov, hackers.

> From: Bruce Momjian 
> Sent: Thursday, June 22, 2023 12:44 AM
> On Tue, Jun 20, 2023 at 09:59:11AM +0300, Alexander Pyhalov wrote:
> > > Therefore, it seems like it would be near-zero cost to just call
> > > conn =
> > > GetConnection() and then PQserverVersion(conn), and ReleaseConnection().
> > > You can then use the return value of PQserverVersion() to determine
> > > if you can push down partial aggregates.
> >
> > Hi.
> > Currently we don't get remote connection while planning if
> > use_remote_estimate is not set.
> > Such change would require to get remote connection in planner, not in
> > executor.
> > This can lead to change of behavior (like errors in explain when user
> > mapping is wrong - e.g. bad password is specified).
> > Also this potentially can lead to establishing connections even when
> > plan node is not actually used (like extreme example - select
> > sum(score) from t limit 0).
> > I'm not saying we shouldn't do it - just hint at possible consequences.
> 
> Agreed.  I noticed it was doing FDW connections during optimization, but 
> didn't see the postgres_fdw option that would
> turn it off.
> Interestingly, it is disabled by default.
> 
> After considering the options, I think we should have a postgres_fdw option 
> called "planner_version_check", and default
> that false.  When false, a remote server version check will not be performed 
> during planning and partial aggregates will be
> always be considered.  When true, a version check will be performed during 
> planning and partial aggregate pushdown
> disabled for pre-PG 17 foreign servers during the query.
> 
> If we want to be more specific, we can call it 
> "check_partial_aggregate_support".
Thank you both for your advice.
We will address the compatibility issues as follows.

Approach1-3:
I will add a postgres_fdw option "check_partial_aggregate_support".
This option is false, default.
Only if this option is true, postgres_fdw connect to the remote server and get 
the version of the remote server.
And if the version of the remote server is less than PG17, then partial 
aggregate push down to the remote server is disable.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

> -Original Message-
> From: Bruce Momjian 
> Sent: Thursday, June 22, 2023 12:44 AM
> To: Alexander Pyhalov 
> Cc: Fujii Yuki/藤井 雄規(MELCO/情報総研 DM最適G) 
> ;
> PostgreSQL-development ; Andres Freund 
> ; Tom Lane
> ; Tomas Vondra ; Julien 
> Rouhaud ;
> Daniel Gustafsson ; Ilya Gladyshev 
> 
> Subject: Re: Partial aggregates pushdown
> 
> On Tue, Jun 20, 2023 at 09:59:11AM +0300, Alexander Pyhalov wrote:
> > > Therefore, it seems like it would be near-zero cost to just call
> > > conn =
> > > GetConnection() and then PQserverVersion(conn), and ReleaseConnection().
> > > You can then use the return value of PQserverVersion() to determine
> > > if you can push down partial aggregates.
> >
> > Hi.
> > Currently we don't get remote connection while planning if
> > use_remote_estimate is not set.
> > Such change would require to get remote connection in planner, not in
> > executor.
> > This can lead to change of behavior (like errors in explain when user
> > mapping is wrong - e.g. bad password is specified).
> > Also this potentially can lead to establishing connections even when
> > plan node is not actually used (like extreme example - select
> > sum(score) from t limit 0).
> > I'm not saying we shouldn't do it - just hint at possible consequences.
> 
> Agreed.  I noticed it was doing FDW connections during optimization, but 
> didn't see the postgres_fdw option that would
> turn it off.
> Interestingly, it is disabled by default.
> 
> After considering the options, I think we should have a postgres_fdw option 
> called "planner_version_check", and default
> that false.  When false, a remote server version check will not be performed 
> during planning and partial aggregates will be
> always be considered.  When true, a version check will be performed during 
> planning and partial aggregate pushdown
> disabled for pre-PG 17 foreign servers during the query.
> 
> If we want to be more specific, we can call it 
> "check_partial_aggregate_support".
> 
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
> 
>   Only you can decide what is important to you.




RE: Partial aggregates pushdown

2023-06-12 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian.

Thank you for advises.

> From: Bruce Momjian 
> Sent: Monday, June 12, 2023 10:38 PM
> > I understand what the problem is. I will put a mechanism maintaining 
> > compatibility into the patch.
> > I believe there are three approaches.
> > Approach 1-1 is preferable because it does not require additional options 
> > for postgres_fdw.
> > I will revise the patch according to Approach 1-1, unless otherwise 
> > commented.
> >
> > Approach1:
> > I ensure that postgres_fdw retrieves the version of each remote server
> > and does not partial aggregate pushd down if the server version is less 
> > than 17.
> > There are two approaches to obtaining remote server versions.
> > Approach1-1: postgres_fdw connects a remote server and use 
> > PQserverVersion().
> > Approach1-2: Adding a postgres_fdw option about a remote server version 
> > (like "server_version").
> >
> > Approach2:
> > Adding a postgres_fdw option for partial aggregate pushdown is enable
> > or not (like enable_partial_aggregate_pushdown).
> 
> These are good questions.  Adding a postgres_fdw option called 
> enable_partial_aggregate_pushdown helps make the
> purpose of the option clear, but remote_version can be used for future 
> breakage as well.
> 
> I think remote_version is the best idea, and in the documention for the 
> option, let's explcitly say it is useful to disable
> partial aggreates pushdown on pre-PG 17 servers.  If we need to use the 
> option for other cases, we can just update the
> documentation.  When the option is blank, the default, everything is pushed 
> down.
> 
> I see remote_version a logical addition to match our "extensions" option that 
> controls what extension functions can be
> pushed down.

Thank you for your perspective.
So, of the approaches I have presented, you think that approach 1-2 is
preferable and that the option name remote_server is preferable?
Indeed, the option of a remote version may have other uses.
However, this information can be obtained by connecting to a remote server, 
I'm concerned that some people may find this option redundant.

Is the problem with approach 1-1 because the user cannot decide whether to 
include the compatibility check in the decision to do partial aggregate 
pushdown or not?
# If Approach 1-1 is taken, the problem is that this feature cannot be used for 
all buit-in aggregate functions
# when the remote server is older than PG17.
If so, Approache1-3 below seem more desirable.
Would it be possible for us to hear your thoughts?

Approache1-3:We add a postgres_fdw option about a compatibility check for 
partial aggregate pushdown
(like "enable_aggpartialfunc_compatibility_check"). This option is false, the 
default.
When this option is true, postgres_fdw obtains the remote server version by 
connecting the remote server and using PQserverVersion(). 
And if the remote server version is older than PG17, then the partial aggregate 
pushdown feature is enable for all buit-in aggregate functions.
Otherwise the partial aggregate pushdown feature is disable for all buit-in 
aggregate functions.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-06-12 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Bruce, Mr.Pyhalov, hackers.

Thank you for comments. I will try to respond to both of your comments as 
follows.
I plan to start revising the patch next week. If you have any comments on the 
following
respondences, I would appreciate it if you could give them to me this week.

> From: Bruce Momjian 
> Sent: Saturday, June 10, 2023 1:44 AM
> I agree that this feature is designed for built-in sharding, but it is 
> possible people could be using aggregates on partitions
> backed by foreign tables without sharding.  Adding a requirement for 
> non-sharding setups to need PG 17+ servers might
> be unreasonable.
Indeed, it is possible to use partial aggregate pushdown feature for purposes 
other than sharding.
The description of the section "F.38.6. Built-in sharding in PostgreSQL" 
assumes the use of
Built-in sharding and will be modified to eliminate this assumption.
The title of this section should be changed to something like "Aggregate on 
partitioned table".

> From: Bruce Momjian 
> Sent: Saturday, June 10, 2023 1:44 AM
> Looking at previous release note incompatibilities, we don't normally change 
> non-administrative functions in a way that
> causes errors if run on older servers.  Based on Alexander's observations, I 
> wonder if we need to re-add the postgres_fdw
> option to control partial aggregate pushdown, and default it to enabled.
> 
> If we ever add more function breakage we might need more postgres_fdw 
> options.  Fortunately, such changes are rare.

I understand what the problem is. I will put a mechanism maintaining 
compatibility into the patch.
I believe there are three approaches.
Approach 1-1 is preferable because it does not require additional options for 
postgres_fdw.
I will revise the patch according to Approach 1-1, unless otherwise commented.

Approach1:
I ensure that postgres_fdw retrieves the version of each remote server
and does not partial aggregate pushd down if the server version is less than 17.
There are two approaches to obtaining remote server versions.
Approach1-1: postgres_fdw connects a remote server and use PQserverVersion().
Approach1-2: Adding a postgres_fdw option about a remote server version (like 
"server_version").

Approach2:
Adding a postgres_fdw option for partial aggregate pushdown is enable or not
(like enable_partial_aggregate_pushdown).

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-04-13 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian.

> > There is one more thing I would like your opinion on.
> > As the major version of PostgreSQL increase, it is possible that the
> > new builtin aggregate functions are added to the newer PostgreSQL.
> > This patch assume that aggpartialfns definitions exist in BKI files.
> > Due to this assumption, someone should add aggpartialfns definitions of
> new builtin aggregate functions to BKI files.
> > There are two possible ways to address this issue. Would the way1 be
> sufficient?
> > Or would way2 be preferable?
> >   way1) Adding documentaion for how to add these definitions to BKI files
> >   way2) Improving this patch to automatically add these definitions to BKI
> files by some tool such as initdb.
> 
> I think documentation is sufficient.  You already showed that someone can do
> this with CREATE AGGREGATE for non-builtin aggregates.
Thank you for your opinion. I will modify this patch according to the way1.

> > > So, let's remove the PARTIALAGG_MINVERSION option from the patch and
> > > just make it automatic --- we push down builtin partial aggregates
> > > if the remote server is the same or newer _major_ version than the
> > > sending server.  For extensions, if people have older extensions on
> > > the same or newer foreign servers, they can adjust 'extensions' above.
> > Okay, I understand. I will remove PARTIALAGG_MINVERSION option from
> > the patch and I will add check whether aggpartialfn depends on some
> > extension which is containd in extensions list of the postgres_fdw's foreign
> server.
> 
> Yes, good.  Did we never push down aggregates before?  I thought we
> pushed down partitionwise aggregates already, and such a check should
> already be done.  If the check isn't there, it should be.
Yes. The last version of this patch(and original postgres_fdw) checks if
user-defined aggregate depends some extension which is contained in 
'extensions'.
But, in the last version of this patch, there is no such check for 
aggpartialfn of user-defined aggregate. So, I will add such check to this 
patch. 
I think that this modification is easy to do . 

> In summary, we don't do any version check for built-in function pushdown, so
> we don't need it for aggregates either.  We check extension functions against
> the extension pushdown list, so we should be checking this for partial
> aggregate pushdown, and for partition-wise aggregate pushdown.  If we don't
> do that last check already, it is a bug.
I understood.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-04-09 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian, Mr.Lane, Mr.Freund.

Thank you for advices.

> From: Bruce Momjian 
> > > > 2. Automation of creating definition of partialaggfuncs In
> > > > development of v17, I manually create definition of
> > > > partialaggfuncs for avg, min, max, sum,
> > > count.
> > > > I am concerned that this may be undesirable.
> > > > So I am thinking that v17 should be modified to automate creating
> > > > definition of partialaggfuncs for all built-in aggregate functions.
> > >
> > > Are there any other builtin functions that need this?  I think we
> > > can just provide documention for extensions on how to do this.
> > For practical purposes, it is sufficient if partial aggregate for the
> > above functions can be pushed down.
> > I think you are right, it would be sufficient to document how to
> > achieve  partial aggregate pushdown for other built-in functions.
> 
> Uh, we actually want the patch to implement partial aggregate pushdown for all
> builtin data types that can support it.  Is that done?  I think it is only 
> extension
> aggregates, which we do not control, that need this documentation.
The last version of this patch can't pushdown partial aggregate for all builtin 
aggregate functions that can support it.
I will improve this patch to pushdown partial aggregate for all builtin 
aggregate functions
that can support it.

There is one more thing I would like your opinion on.
As the major version of PostgreSQL increase, it is possible that
the new builtin aggregate functions are added to the newer PostgreSQL.
This patch assume that aggpartialfns definitions exist in BKI files.
Due to this assumption, someone should add aggpartialfns definitions of new 
builtin aggregate functions to BKI files.
There are two possible ways to address this issue. Would the way1 be sufficient?
Or would way2 be preferable?
  way1) Adding documentaion for how to add these definitions to BKI files
  way2) Improving this patch to automatically add these definitions to BKI 
files by some tool such as initdb.

> From: Bruce Momjian 
> On Fri, Apr  7, 2023 at 09:16:14PM -0700, Andres Freund wrote:
> > On 2023-04-07 22:53:53 -0400, Bruce Momjian wrote:
> > > > postgres_fdw has no business pushing down calls to non-builtin
> > > > functions unless the user has explicitly authorized that via the
> > > > existing whitelisting mechanism.  I think you're reinventing the
> > > > wheel, and not very well.
> > >
> > > The patch has you assign an option at CREATE AGGREGATE time if it
> > > can do push down, and postgres_fdw checks that.  What whitelisting
> > > mechanism are you talking about?  async_capable?
> >
> > extensions (string)
> >
> > This option is a comma-separated list of names of PostgreSQL
> extensions that are installed, in compatible versions, on both the local and
> remote servers. Functions and operators that are immutable and belong to a
> listed extension will be considered shippable to the remote server. This 
> option
> can only be specified for foreign servers, not per-table.
> >
> > When using the extensions option, it is the user's responsibility that 
> > the
> listed extensions exist and behave identically on both the local and remote
> servers. Otherwise, remote queries may fail or behave unexpectedly.
> 
> Okay, this is very helpful --- it is exactly the issue we are dealing with 
> --- how
> can we know if partial aggregate functions exists on the remote server.  (I
> knew I was going to need API help on this.)
> 
> So, let's remove the PARTIALAGG_MINVERSION option from the patch and just
> make it automatic --- we push down builtin partial aggregates if the remote
> server is the same or newer _major_ version than the sending server.  For
> extensions, if people have older extensions on the same or newer foreign
> servers, they can adjust 'extensions' above.
Okay, I understand. I will remove PARTIALAGG_MINVERSION option from the patch
and I will add check whether aggpartialfn depends on some extension which
is containd in extensions list of the postgres_fdw's foreign server.
In the next version of this patch,
we can pushdown partial aggregate for an user-defined aggregate function only 
when the function pass through this check.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2023-04-07 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian

> First, my apologies for not addressing this sooner.  I was so focused on my
> own tasks that I didn't realize this very important patch was not getting
> attention.  I will try my best to get it into PG 17.
Thank you very much for your comments. 
I will improve this patch for PG17.
I believe that this patch will help us use PostgreSQL's built-in sharding for 
OLAP.

> What amazes me is that you didn't need to create _any_ actual aggregate
> functions.  Rather, you just needed to hook existing functions into the
> aggregate tables for partial FDW execution.
Yes. This patch enables partial aggregate pushdown using 
only existing functions which belong to existing aggregate function
and are needed by parallel query(such as state transition function and 
serialization function).
This patch does not need new types of function belonging to aggregate functions
and does not need new functions belonging to existing aggregate functions.

> I suggest we remove the version check requirement --- instead just document
> that the FDW Postgres version should be the same or newer than the calling
> Postgres server --- that way, we can assume that whatever is in the system
> catalogs of the caller is in the receiving side.  
Thanks for the comment. I will modify this patch according to your comment.

> We should add a GUC to turn off
> this optimization for cases where the FDW Postgres version is older than the
> caller.  This handles case 1-2.
Thanks for the advice here too.
I thought it would be more appropriate to add a foregin server option of 
postgres_fdw rather than adding GUC. 
Would you mind if I ask you what you think about it?

> > 2. Automation of creating definition of partialaggfuncs In development
> > of v17, I manually create definition of partialaggfuncs for avg, min, max, 
> > sum,
> count.
> > I am concerned that this may be undesirable.
> > So I am thinking that v17 should be modified to automate creating
> > definition of partialaggfuncs for all built-in aggregate functions.
> 
> Are there any other builtin functions that need this?  I think we can just
> provide documention for extensions on how to do this.
For practical purposes, it is sufficient 
if partial aggregate for the above functions can be pushed down.
I think you are right, it would be sufficient to document how to achieve
 partial aggregate pushdown for other built-in functions.

> > 3. Documentation
> > I need add explanation of partialaggfunc to documents on postgres_fdw and
> other places.
> 
> I can help with that once we decide on the above.
Thank you. In the next verion of this patch, I will add documents on 
postgres_fdw
and other places. 

> I think 'partialaggfn' should be named 'aggpartialfn' to match other columns 
> in
> pg_aggregate.
Thanks for the comment. I will modify this patch according to your comment.

> For case 3, I don't even know how much pushdown those do of _any_
> aggregates to non-PG servers, let along parallel FDW ones.  Does anyone
> know the details?
To allow partial aggregate pushdown for non-PG FDWs,
I think we need to add pushdown logic to their FDWs for each function.
For example, we need to add logic avg() -> sum()/count() to their FDWs for avg.
To allow parallel partial aggregate by non-PG FDWs,
I think we need to add FDW Routines for Asynchronous Execution to their FDWs[1].

> I am confused by these changes to pg_aggegate:
> 
> +{ aggfnoid => 'sum_p_int8', aggtransfn => 'int8_avg_accum',
> +  aggfinalfn => 'int8_avg_serialize', aggcombinefn =>
> +'int8_avg_combine',
> +  aggserialfn => 'int8_avg_serialize', aggdeserialfn =>
> +'int8_avg_deserialize',
> +  aggtranstype => 'internal', aggtransspace => '48' },
> 
> ...
> 
> +{ aggfnoid => 'sum_p_numeric', aggtransfn => 'numeric_avg_accum',
> +  aggfinalfn => 'numeric_avg_serialize', aggcombinefn =>
> +'numeric_avg_combine',
> +  aggserialfn => 'numeric_avg_serialize',
> +  aggdeserialfn => 'numeric_avg_deserialize',
> +  aggtranstype => 'internal', aggtransspace => '128' },
> 
> Why are these marked as 'sum' but use 'avg' functions?
This reason is that sum(int8)/sum(numeric) shares some functions with 
avg(int8)/avg(numeric),
and sum_p_int8 is aggpartialfn of sum(int8) and sum_p_numeric is aggpartialfn 
of sum(numeric).

--Part of avg(int8) in BKI file in PostgreSQL15.0[2].
{ aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal',
  aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },
--

--Part of sum(int8) in BKI file in PostgreSQL15.0[2].
{ aggfnoid => 'sum(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_sum', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'i

RE: Partial aggregates pushdown

2023-03-30 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Momjian

> First, am I correct?
Yes, you are correct. This patch uses new special aggregate functions for 
partial aggregate
(then we call this partialaggfunc).

> Second, how far away is this from being committable
> and/or what work needs to be done to get it committable, either for PG 16 or 
> 17?
I believe there are three: 1. and 2. are not clear if they are necessary or 
not; 3. are clearly necessary.
I would like to hear the opinions of the development community on whether or 
not 1. and 2. need to be addressed.

1. Making partialaggfunc user-defined function
In v17, I make partialaggfuncs as built-in functions.
Because of this approach, v17 changes specification of BKI file and 
pg_aggregate.
For now, partialaggfuncs are needed by only postgres_fdw which is just an 
extension of PostgreSQL.
In the future, when revising the specifications for BKI files and pg_aggregate 
when modifying existing PostgreSQL functions,
It is necessary to align them with this patch's changes.
I am concerned that this may be undesirable.
So I am thinking that v17 should be modified to making partialaggfunc as user 
defined function.

2. Automation of creating definition of partialaggfuncs
In development of v17, I manually create definition of partialaggfuncs for avg, 
min, max, sum, count.
I am concerned that this may be undesirable.
So I am thinking that v17 should be modified to automate creating definition of 
partialaggfuncs
for all built-in aggregate functions.

3. Documentation
I need add explanation of partialaggfunc to documents on postgres_fdw and other 
places.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation




RE: Partial aggregates pushdown

2022-12-15 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Freund.

> cfbot complains about some compiler warnings when building with clang:
> https://cirrus-ci.com/task/6606268580757504
> 
> deparse.c:3459:22: error: equality comparison with extraneous parentheses
> [-Werror,-Wparentheses-equality]
> if ((node->aggsplit == AGGSPLIT_SIMPLE)) {
>  ~~~^~
> deparse.c:3459:22: note: remove extraneous parentheses around the
> comparison to silence this warning
> if ((node->aggsplit == AGGSPLIT_SIMPLE)) {
> ~   ^ ~
> deparse.c:3459:22: note: use '=' to turn this equality comparison into an
> assignment
> if ((node->aggsplit == AGGSPLIT_SIMPLE)) {
> ^~
> =
I fixed this error.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


0001-Partial-aggregates-push-down-v17.patch
Description: 0001-Partial-aggregates-push-down-v17.patch


RE: Add semi-join pushdown to postgres_fdw

2022-12-06 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

Thank you for fixing it and giving more explanation.

> IIRC, planner can create semi-join, which targetlist references Vars 
> from inner join relation. However, it's deparsed as exists and so we 
> can't reference it from SQL. So, there's this check - if Var is 
> referenced in semi-join target list, it can't be pushed down.
> You can see this if comment out this check.
> 
> EXPLAIN (verbose, costs off)
>  SELECT ft2.*, ft4.* FROM ft2 INNER JOIN
>  (SELECT * FROM ft4 WHERE EXISTS (
>  SELECT 1 FROM ft2 WHERE ft2.c2=ft4.c2)) ft4
>  ON ft2.c2 = ft4.c1
>  INNER JOIN
>  (SELECT * FROM ft2 WHERE EXISTS (
>  SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)) ft21
>  ON ft2.c2 = ft21.c2
>  WHERE ft2.c1 > 900
>  ORDER BY ft2.c1 LIMIT 10;
> 
> will fail with
> EXPLAIN SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT 
> NULL FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2
> 
> Here you can see that
> SELECT * FROM ft2 WHERE EXISTS (
>  SELECT 1 FROM ft4 WHERE ft2.c2=ft4.c2)
> 
> was transformed to
> SELECT r8.c2, r9.c2 FROM "S 1"."T 1" r8 WHERE (EXISTS (SELECT NULL 
> FROM "S 1"."T 3" r9 WHERE ((r8.c2 = r9.c2
> 
> where our exists subquery is referenced from tlist. It's fine for plan 
> (relations, participating in semi-join, can be referenced in tlist), 
> but is not going to work with EXISTS subquery.
> BTW, there's a comment in joinrel_target_ok(). It tells exactly that -
> 
> 5535 if (jointype == JOIN_SEMI &&
> bms_is_member(var->varno,
> innerrel->relids) && !bms_is_member(var->varno, outerrel->relids))
> 5536 {
> 5537 /* We deparse semi-join as exists() subquery, and
> so can't deparse references to inner rel in join target list. */
> 5538 ok = false;
> 5539 break;
> 5540 }
> 
> Expanded comment.
Thank you for expanding your comment and giving examples. 
Thanks to the above examples, I understood in what case planner wolud create 
semi-join, 
which targetlist references Vars from inner join relation.

> > question2) In foreign_join_ok
> >   > * Constructing queries representing ANTI joins is hard, hence
> >   Is this true? Is it hard to expand your approach to ANTI join 
> > pushdown?
> 
> I haven't tried, so don't know.
I understand the situation.

> The naming means additional conditions (for WHERE clause, by analogy 
> with ignore_conds and remote_conds). Not sure if subquery_expr sounds 
> better, but if you come with better idea, I'm fine with renaming them.
Sure.

> > question4) Although really detail, there is expression making space 
> > such as
> >   "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1".
> >   Is there reason for this difference? If not, need we use same 
> > policy for making space?
Thank you.

Later, I'm going to look at other part of your patch.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2022-12-04 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

> Attaching minor fixes. I haven't proof-read all comments (but perhaps, they
> need attention from some native speaker).
Thank you. I fixed according to your patch.
And I fixed have proof-read all comments and messages.

> Tested it with queries from
> https://github.com/swarm64/s64da-benchmark-toolkit, works as expected.
Thank you for additional tests.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


0001-Partial-aggregates-push-down-v16.patch
Description: 0001-Partial-aggregates-push-down-v16.patch


RE: Add semi-join pushdown to postgres_fdw

2022-12-02 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

Thank you for work on this useful patch.
I'm starting to review v2 patch.
I have cheked we can apply v2 patch to commit 
ec386948948c1708c0c28c48ef08b9c4dd9d47cc
(Date:Thu Dec 1 12:56:21 2022 +0100).
I briefly looked at this whole thing and did step execute this
by running simple queries such as the followings.

query1) select * from f_t1 a1 where a1.c1 in (select c1 from f_t2);
query2) select * from f_t1 a1 join f_t3 a2 on a1.c1 = a2.c1 where a1.c1 in 
(select c1 from f_t3) ;
query3) update f_t2 set c1 = 1 from f_t1 a1 where a1.c2 = f_t2.c2 and exists 
(select null from f_t2 where c1 = a1.c1);

Although I haven't seen all of v2 patch, for now I have the following questions.

question1) 
  > + if (jointype == JOIN_SEMI && bms_is_member(var->varno, innerrel->relids) 
&& !bms_is_member(var->varno, outerrel->relids))
  It takes time for me to find in what case this condition is true.
  There is cases in which this condition is true for semi-join of two baserels 
  when running query which joins more than two relations such as query2 and 
query3.
  Running queries such as query2, you maybe want to pushdown of only semi-join 
path of 
  joinrel(outerrel) defined by (f_t1 a1 join f_t3 a2 on a1.c1 = a2.c1) and 
baserel(innerrel) f_t3 
  because of safety deparse. So you add this condition.
  Becouase of this limitation, your patch can't push down subquery expression 
  "exists (select null from f_t2 where c1 = a1.c1)" in query3.
  I think, it is one of difficulty points for semi-join pushdown.
  This is my understanding of the intent of this condition and the restrictions 
imposed by this condition.
  Is my understanding right?
  I think if there are comments for the intent of this condition and the 
restrictions imposed by this condition  
  then they help PostgreSQL developper. What do you think?

question2) In foreign_join_ok
  > * Constructing queries representing ANTI joins is hard, hence
  Is this true? Is it hard to expand your approach to ANTI join pushdown?

question3) You use variables whose name is "addl_condXXX" in the following code.
  > appendStringInfo(addl_conds, "EXISTS (SELECT NULL FROM %s", 
join_sql_i.data);
  Does this naming mean additional literal?
  Is there more complehensive naming, such as "subquery_exprXXX"?

question4) Although really detail, there is expression making space such as
  "ft4.c2 = ft2.c2" and one making no space such as "c1=ftupper.c1".
  Is there reason for this difference? If not, need we use same policy for 
making space?

Later, I'm going to look at part of your patch which is used when running more 
complex query.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


RE: Partial aggregates pushdown

2022-11-30 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

> One more issue I started to think about - now we don't check
> partialagg_minversion for "simple" aggregates at all. Is it correct? It seems 
> that ,
> for example, we could try to pushdown bit_or(int8) to old servers, but it 
> didn't
> exist, for example, in 8.4.  I think it's a broader issue (it would be also 
> the case
> already if we push down
> aggregates) and shouldn't be fixed here. But there is an issue -
> is_shippable() is too optimistic.
I think it is correct for now.
F.38.7 of [1] says "A limitation however is that postgres_fdw generally assumes 
that 
immutable built-in functions and operators are safe to send to the remote 
server for 
execution, if they appear in a WHERE clause for a foreign table." and says that 
we can 
avoid this limitation by rewriting query.
It looks that postgres_fdw follows this policy in case of UPPERREL_GROUP_AGG 
aggregate pushdown.
If a aggreagate has not internal aggtranstype and has not aggfinalfn ,
partialaggfn of it is equal to itself.
So I think that it is adequate to follow this policy in case of partial 
aggregate pushdown
 for such aggregates.

> >> 2) Do we really have to look at pg_proc in partial_agg_ok() and
> >> deparseAggref()? Perhaps, looking at aggtranstype is enough?
> > You are right. I fixed according to your comment.
> >
> 
> partial_agg_ok() still looks at pg_proc.
Sorry for taking up your time. I fixed it.

[1] https://www.postgresql.org/docs/current/postgres-fdw.html

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


0001-Partial-aggregates-push-down-v15.patch
Description: 0001-Partial-aggregates-push-down-v15.patch


RE: Partial aggregates pushdown

2022-11-30 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

> 1) In previous version of the patch aggregates, which had partialaggfn, were 
> ok
> to push down. And it was a definite sign that aggregate can be pushed down.
> Now we allow pushing down an aggregate, which prorettype is not internal and
> aggfinalfn is not defined. Is it safe for all user-defined (or builtin) 
> aggregates,
> even if they are generally shippable? Aggcombinefn is executed locally and we
> check that aggregate function itself is shippable. Is it enough? Perhaps, we
> could use partialagg_minversion (like aggregates with partialagg_minversion
> == -1 should not be pushed down) or introduce separate explicit flag?
In what case partial aggregate pushdown is unsafe for aggregate which has not 
internal aggtranstype
 and has no aggfinalfn?
By reading [1], I believe that if aggcombinefn of such aggregate recieves 
return values of original
 aggregate functions in each remote then it must produce same value that would 
have resulted 
 from scanning all the input in a single operation.

> 2) Do we really have to look at pg_proc in partial_agg_ok() and
> deparseAggref()? Perhaps, looking at aggtranstype is enough?
You are right. I fixed according to your comment.

> 3) I'm not sure if CREATE AGGREGATE tests with invalid
> PARTIALAGGFUNC/PARTIALAGG_MINVERSION should be in postgres_fdw
> tests or better should be moved to src/test/regress/sql/create_aggregate.sql,
> as they are not specific to postgres_fdw
Thank you. I moved these tests to src/test/regress/sql/create_aggregate.sql.

[1] https://www.postgresql.org/docs/15/xaggr.html#XAGGR-PARTIAL-AGGREGATES

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


0001-Partial-aggregates-push-down-v14.patch
Description: 0001-Partial-aggregates-push-down-v14.patch


RE: Partial aggregates pushdown

2022-11-29 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Pyhalov.

Thank you for comments.

> I've looked through the patch. Overall I like this approach, but have
> the following comments.
> 
> 1) Why should we require partialaggfn for min()/max()/count()? We could
> just use original functions for a lot of aggregates, and so it would be
> possible to push down some partial aggregates to older servers. I'm not
> sure that it's a strict requirement, but a nice thing to think about.
> Can we use the function itself as partialaggfn, for example, for
> sum(int4)?
> For functions with internal aggtranstype (like sum(int8) it
> would be more difficult).
Thank you. I realized that partial aggregate pushdown is fine 
without partialaggfn if original function has no aggfinalfn and 
aggtranstype of it is not internal. So I have improved v12 by
this realization.
However, v13 requires partialaggfn for aggregate if it has aggfinalfn or 
aggtranstype of it is internal such as sum(int8). 

> 2) fpinfo->server_version is not aggregated, for example, when we form
> fpinfo in foreign_join_ok(), it seems we should spread it in more places
> in postgres_fdw.c.
I have responded to your comment by adding copy of server_version in 
merge_fdw_options.
 
> 3) In add_foreign_grouping_paths() it seems there's no need for
> additional argument, we can look at extra->patype. Also Assert() in
> add_foreign_grouping_paths() will fire in --enable-cassert build.
I have fixed according to your comment.

> 4) Why do you modify lookup_agg_function() signature? I don't see tests,
> showing that it's neccessary. Perhaps, more precise function naming
> should be used instead?
I realized that there is no need of modification lookup_agg_function().
Instead, I use LookupFuncName().

> 5) In tests:
>   - Why version_num does have "name" type in
> f_alter_server_version() function?
>   - You modify server_version option of 'loopback' server, but
> don't reset it after test. This could affect further tests.
>   - "It's unsafe to push down partial aggregates with distinct"
> in postgres_fdw.sql:3002 seems to be misleading.
> 3001
> 3002 -- It's unsafe to push down partial aggregates with distinct
> 3003 SELECT f_alter_server_version('loopback', 'set', -1);
I have fixed according to your comment.

> 6) While looking at it, could cause a crash with something like
I have fixed this problem by using LookupFuncName() instead of 
lookup_agg_function.

The following is readme of v13.
--readme of Partial aggregates push down v13
1. interface
1) pg_aggregate
There are the following additional columns.
a) partialaggfn
  data type: regproc.
  default value: zero(means invalid).
  description  : This field refers to the special aggregate function(then we 
call
 this partialaggfunc)
corresponding to aggregation function(then we call src) which has aggfnoid.
partialaggfunc is used for partial aggregation pushdown by postgres_fdw.
The followings are differences between the src and the special aggregate 
function.
  difference1) result type
The result type is same as the src's transtype if the src's transtype
is not internal.
Otherwise the result type is bytea.
  difference2) final func
The final func does not exist if the src's transtype is not internal.
Otherwize the final func returns serialized value.
For example, there is a partialaggfunc avg_p_int4 which corresponds to 
avg(int4)
whose aggtranstype is _int4.
The result value of avg_p_int4 is a float8 array which consists of count 
and 
summation. avg_p_int4 does not have finalfunc.
For another example, there is a partialaggfunc avg_p_int8 which corresponds 
to 
avg(int8) whose aggtranstype is internal.
The result value of avg_p_int8 is a bytea serialized array which consists 
of count 
and summation. avg_p_int8 has finalfunc int8_avg_serialize which is 
serialize function
of avg(int8). This field is zero if there is no partialaggfunc.

b) partialagg_minversion
  data type: int4.
  default value: zero(means current version).
  description  : This field is the minimum PostgreSQL server version which has 
partialaggfunc. This field is used for checking compatibility of 
partialaggfunc.

The above fields are valid in tuples for builtin avg, sum, min, max, count.
There are additional records which correspond to partialaggfunc for avg, sum, 
min, max, count.

2) pg_proc
There are additional records which correspond to partialaggfunc for avg, sum, 
min, max, count.

3) postgres_fdw
postgres_fdw has an additional foreign server option server_version. 
server_version is 
integer value which means remote server version number. Default value of 
server_version 
is zero. server_version is used for checking compatibility of partialaggfunc.

2. feature
Partial aggregation pushdown is fine when either of the following conditions is 
true.
  condition1) aggregate function has not internal aggtranstype and has no 
aggfinalfn.
  condition2) the

RE: Partial aggregates pushdown

2022-07-31 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi Mr.Vondra, Mr.Pyhalov.

I'm interesied in Mr.Pyhalov's patch due to the following background.
--Background
I develop postgresql's extension such as fdw in my work. 
I'm interested in using postgresql for OLAP. 
I think the function of a previous patch "Push aggregation down to base 
relations and joins"[1] is desiable. I rebased the previous patch and register 
the rebased patch on the next commitfest[2].
And I think it would be more useful if the previous patch works on a foreign 
table of postgres_fdw.
I realized the function of partial aggregation pushdown is necessary  to make 
the previous patch work on a foreign table of postgres_fdw.
--

So I reviewed Mr.Pyhalov's patch and discussions on this thread.
I made a draft of approach to respond to Mr.Vondra's comments.
Would you check whether my draft is right or not?

--My draft
> 1) It's not clear to me how could this get extended to aggregates with 
> more complex aggregate states, to support e.g. avg() and similar 
> fairly common aggregates.
We add a special aggregate function every aggregate function (hereafter we call 
this src)  which supports partial aggregation.
The followings are differences between the src and the special aggregate 
function.
difference1) result type
The result type is same with the src's transtype if the src's transtype is not 
internal.
Otherwise the result type is bytea.

difference2) final func
The final func does not exist if the src's transtype is not internal.
Otherwize the final func returns serialized value.

For example, let me call the special aggregate function of avg(float8) 
avg_p(float8).
The result value of avg_p is a float8 array which consists of count and 
summation.
avg_p does not have finalfunc.

We pushdown the special aggregate function instead of a src.
For example, we issue "select avg_p(c) from t" instead of "select avg(c) from t"
in the above example.

We add a new column partialaggfn to pg_aggregate to get the oid of  the special 
aggregate function from the the src's oid.
This column is the oid of the special aggregate function which corresponds to 
the src.

If an aggregate function does not have any special aggregate function,  then we 
does not pushdown any partial aggregation of the aggregate function.

> 2) I'm not sure relying on aggpartialpushdownsafe without any version 
> checks etc. is sufficient. I mean, how would we know the remote node 
> has the same idea of representing the aggregate state. I wonder how 
> this aligns with assumptions we do e.g. for functions etc.
We add compatible server versions infomation to pg_aggregate and  the set of 
options of postgres_fdw's foreign server.
We check compatibility of an aggregate function using this infomation.

An additional column of pg_aggregate is compatibleversonrange.
This column is a range of postgresql server versions which  has compatible 
aggregate function.
An additional options of postgres_fdw's foreign server are serverversion and 
bwcompatibleverson.
serverversion is remote postgresql server version.
bwcompatibleverson is the maximum version in which any aggregate function is 
compatible with local noed's one.
Our version check passes if and only if at least one of the following 
conditions is true.
condition1) the option value of serverversion is in compatibleversonrange.
condition2) the local postgresql server version is between bwcompatibleverson 
and the option value of serverversion.

We can get the local postgresql server version from PG_VERSION_NUM macro.
We use condition1 if the local postgresql server version is not more than the 
remote one.
and use condition2 if the local postgresql server version is greater than the 
remote one.
--

Sincerely yours,
Yuuki Fujii

[1] https://commitfest.postgresql.org/32/1247/
[2] https://commitfest.postgresql.org/39/3764/

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


WIP: Aggregation push-down - take2

2022-04-15 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi everyone.

I develop postgresql's extension such as fdw in my work. 
I'm interested in using postgresql for OLAP. 
After [1] having been withdrawn, I reviewed [1].
I think that this patch is realy useful when using OLAP queries.
Furthermore, I think it would be more useful if this patch works on a foreign 
table.
So, I would like to ask you a question on this patch in this new thread.

I changed this patch a little and confirmed that my idea is true.
The followings are things I found and differences of between my prototype and 
this patch. 
  1. Things I found
   I execute a query which contain join of postgres_fdw's foreign table and a 
table and aggregation of the join result.
   In my setting, my prototype reduce this query's response by 93%.
  2. Differences between my prototype and this patch
   (1) Pushdown aggregation of foeign table if FDW pushdown partial aggregation
   (2) postgres_fdw pushdowns some partial aggregations
I attached my prototype source code and content of my experiment.
I want to resume development of this patch if there is some possibility of 
accept of this patch's function.
I took a contact to Mr.Houska on resuming development of this patch.
As a result, Mr.Houska advised for me that I ask in pgsql-hackers whether any 
reviewers / committers are 
interested to work on the patch.
Is anyone interested in my work?

Sincerely yours.
Yuuki Fujii

[1] https://commitfest.postgresql.org/32/

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation


v17-0004-pushdown-aggregation-foreign-table.patch
Description: v17-0004-pushdown-aggregation-foreign-table.patch
1. Restrictions of my prototype
(1) aggregate functions are avg, sum, count, min, max
(2) argment or sum of avg is bigint var

2. My experiment settings
(1) hardware settings
  cpu:Intel Corei5-9500 processor(3.0 GHz/9MB cache)(6 cores)
  ram:DDR4 3200Mhz 128GB
  storage:512GB SSD(M.2 NVMe)
(2) software
  os: CentOS7
  postgresql source code:14(in development) with commit-id 
947456a823d6b0973b68c6b38c8623a0504054e7
  # build by "make world"
  patchs in [1] I applied:
v17-0001-Introduce-RelInfoList-structure.patch
v17-0002-Aggregate-push-down-basic-functionality.patch
v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch
  [1] https://commitfest.postgresql.org/32/
(3) PostgreSQL settings
  max_parallel_workers_per_gather=6
  work_mem=100
(4) tables and data
  execute the following commands using postgresql's client tool such as psql
--
create database tmp;
\c tmp;
create table log as select 1::bigint as id, 1::bigint as size, 1::bigint as 
total from generate_series(1,10) t;

create table master as select t as id, 'hoge' || mod(t, 1000) as name from 
generate_series(1,100) t;
create index master_idx on master(id);

create extension postgres_fdw;
create server local_server foreign data wrapper postgres_fdw OPTIONS (host 
'localhost', port '5432', dbname 'tmp');
create user mapping for user server local_server options (user 'postgres', 
password 'postgres');
create foreign table f_log(id bigint, size bigint, total bigint) server 
local_server options (table_name 'log');

analyze;

set enable_agg_pushdown=true;
explain (verbose, analyze) select b.name, avg(a.total) from f_log a join master 
b on a.id = b.id group by b.name;

set enable_agg_pushdown=false;
explain (verbose, analyze) select b.name, avg(a.total) from f_log a join master 
b on a.id = b.id group by b.name;
--


RE: WIP: Aggregation push-down

2022-03-21 Thread fujii.y...@df.mitsubishielectric.co.jp
Hi everyone.

I develop postgresql's extension such as fdw in my work. 
I'm interested in using postgresql for OLAP. 

I think that this patch is realy useful when using OLAP queries.
Furthermore, I think it would be more useful if this patch works on a foreign 
table.
Actually, I changed this patch a little and confirmed that my idea is true.
The followings are things I found and differences of between my prototype and 
this patch. 
  1. Things I found
   I execute a query which contain join of postgres_fdw's foreign table and a 
table and aggregation of the josin result.
   In my setting, my prototype reduce this query's response by 93%.
  2. Differences between my prototype and this patch
   (1) Pushdown aggregation of foeign table if FDW pushdown partioal aggregation
   (2) postgres_fdw pushdowns some partial aggregations
I attached my prototype source code and content of my experiment.
I want to resume development of this patch if there is some possibility of 
accept of this patch's function.
. I took a contact to Mr.Houska on resuming development of this patch.
As a result, Mr.Houska advised for me that I ask in pgsql-hackers whether any 
reviewers / committers are 
interested to work on the patch.
Is anyone interested in my work?

Sincerely yours.
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation



v17-0004-pushdown-aggregation-foreign-table.patch
Description: v17-0004-pushdown-aggregation-foreign-table.patch
1. Restrictions of my prototype
(1) aggregate functions are avg, sum, count, min, max
(2) argment or sum of avg is bigint var

2. My experiment settings
(1) hardware settings
  cpu:Intel Corei5-9500 processor(3.0 GHz/9MB cache)(6 cores)
  ram:DDR4 3200Mhz 128GB
  storage:512GB SSD(M.2 NVMe)
(2) software
  os: CentOS7
  postgresql source code:14(in development) with commit-id 
947456a823d6b0973b68c6b38c8623a0504054e7
  # build by "make world"
  patchs I applied:
v17-0001-Introduce-RelInfoList-structure.patch
v17-0002-Aggregate-push-down-basic-functionality.patch
v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch
(3) PostgreSQL settings
  max_parallel_workers_per_gather=6
  work_mem=100
(4) tables and data
  execute the following commands using postgresql's client tool such as psql
--
\c tmp;
create table log as select 1::bigint as id, 1::bigint as size, 1::bigint as 
total from generate_series(1,10) t;

create table master as select t as id, 'hoge' || mod(t, 1000) as name from 
generate_series(1,100) t;
create index master_idx on master(id);

create extension postgres_fdw;
create server local_server foreign data wrapper postgres_fdw OPTIONS (host 
'localhost', port '5432', dbname 'tmp');
create user mapping for user server local_server options (user 'postgres', 
password 'postgres');
create foreign table f_log(id bigint, size bigint, total bigint) server 
local_server options (table_name 'log');

analyze;

set enable_agg_pushdown=true;
explain (verbose, analyze) select b.name, avg(a.total) from f_log a join master 
b on a.id = b.id group by b.name;

set enable_agg_pushdown=false;
explain (verbose, analyze) select b.name, avg(a.total) from f_log a join master 
b on a.id = b.id group by b.name;
--