On Sun, 8 Oct 2023 at 23:52, Richard Guo <guofengli...@gmail.com> wrote:
> On Thu, Oct 5, 2023 at 2:26 PM David Rowley <dgrowle...@gmail.com> wrote:
>>
>> So in short, I propose the attached fix without any regression tests
>> because I feel that any regression test would just mark that there was
>> a big in create_agg_path() and not really help with ensuring we don't
>> end up with some similar problem in the future.
>
>
> If the pathkeys that were added by adjust_group_pathkeys_for_groupagg()
> are computable from the targetlist, it seems that we do not need to trim
> them off, because prepare_sort_from_pathkeys() will add resjunk target
> entries for them.  But it's also no harm if we trim them off.  So I
> think the patch is a pretty safe fix.  +1 to it.

hmm, I think one of us does not understand what is going on here. I
tried to explain in [1] why we *need* to strip off the pathkeys added
by adjust_group_pathkeys_for_groupagg().

Given the following example:

create table ab (a int,b int);
explain (costs off) select a,count(distinct b) from ab group by a;
         QUERY PLAN
----------------------------
 GroupAggregate
   Group Key: a
   ->  Sort
         Sort Key: a, b
         ->  Seq Scan on ab
(5 rows)

adjust_group_pathkeys_for_groupagg() will add the pathkey for the "b"
column and that results in the Sort node sorting on {a,b}.  It's
simply not at all valid to have the GroupAggregate path claim that its
pathkeys are also (effectively) {a,b}" as "b" does not and cannot
legally exist after the aggregation takes place.  We cannot put a
resjunk "b" in the targetlist of the GroupAggregate either as there
could be any number "b" values aggregated.

Can you explain why you think we can put a resjunk "b" in the target
list of the GroupAggregate in the above case?

>>
>> I have some concerns that the assert_pathkeys_in_target() function
>> might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm
>> not proposing to commit that without further discussion.
>
>
> Yeah, it looks like some heavy to call assert_pathkeys_in_target() for
> each path node.  Can we run some benchmarks to see how much overhead it
> would bring to USE_ASSERT_CHECKING build?

I think it'll be easy to show that there is an overhead to it.  It'll
be in the realm of the ~41ms patched vs ~24ms unpatched that I showed
in [2].  That's quite an extreme case.

Maybe it's worth checking the total planning time spent in a run of
the regression tests with and without the patch to see how much
overhead it adds to the "average case".

David

[1] 
https://postgr.es/m/CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=n-tb6...@mail.gmail.com
[2] 
https://postgr.es/m/caaphdvo7rzcqyw-gnkzr6qcijcqf8vjlkj4xfk-kawvyaw1...@mail.gmail.com


Reply via email to