On Tue, Feb 14, 2023 at 10:53 AM David Rowley <dgrowle...@gmail.com> wrote:

> I looked at the three patches and have some thoughts:


Thanks for reviewing!


> 0001:
>
> Does the newly added test have to be this complex?  I think it might
> be better to just come up with the most simple test you can that uses
> an incremental sort. I really can't think why the test requires a FOR
> UPDATE, to test incremental sort, for example. The danger with making
> a test more complex than it needs to be is that it frequently gets
> broken by unrelated changes.  The complexity makes it harder for
> people to understand the test's intentions and that increases the risk
> that the test eventually does not test what it was originally meant to
> test as the underlying code changes and the expected output is
> updated.


That makes sense.  I agree that we should always use the minimal query
for test.  For this patch, as pointed by Etsuro, it may not be a good
idea for the change.  So I'll remove 0001.


> 0002:
>
> I think the following existing comment does not seem to be true any longer:
>
> > However, there's one more
> > * possibility: it may make sense to sort the cheapest partial path
> > * according to the required output order and then use Gather Merge.
>
> You've removed the comment that talks about trying Incremental Sort ->
> Gather Merge paths yet the code is still doing that, the two are just
> more consolidated now, so perhaps you need to come up with a new
> comment to explain what we're trying to achieve.


Yes, you are right.  How about the comment below?

- * possibility: it may make sense to sort the cheapest partial path
- * according to the required output order and then use Gather Merge.
+ * possibility: it may make sense to sort the cheapest partial path or
+ * incrementally sort any partial path that is partially sorted according
+ * to the required output order and then use Gather Merge.

Looking at the codes now I have some concern that what we do in
create_ordered_paths for partial paths may have already been done in
generate_useful_gather_paths, especially when query_pathkeys is equal to
sort_pathkeys.  Not sure if this is a problem.  And the comment there
mentions generate_gather_paths(), but ISTM we should mention what
generate_useful_gather_paths has done.


> 0003:
>
> Looking at gather_grouping_paths(), I see it calls
> generate_useful_gather_paths() which generates a bunch of Gather Merge
> paths after sorting the cheapest path and incrementally sorting any
> partially sorted paths.  Then back in gather_grouping_paths(), we go
> and create Gather Merge paths again, but this time according to the
> group_pathkeys instead of the query_pathkeys. I know you're not really
> changing anything here, but as I'm struggling to understand why
> exactly we're creating two sets of Gather Merge paths, it makes it a
> bit scary to consider changing anything in this area. I've not really
> found any comments that can explain to me sufficiently well enough so
> that I understand why this needs to be done.  Do you know?


I'm also not sure why gather_grouping_paths creates Gather Merge paths
according to group_pathkeys after what generate_useful_gather_paths has
done.  There is comment that mentions this but seems more explanation is
needed.

* generate_useful_gather_paths does most of the work, but we also consider a
* special case: we could try sorting the data by the group_pathkeys and then
* applying Gather Merge.

It seems that if there is available group_pathkeys, we will set
query_pathkeys to group_pathkeys because we want the result sorted for
grouping.  In this case gather_grouping_paths may just generate
duplicate Gather Merge paths because generate_useful_gather_paths has
generated Gather Merge paths according to query_pathkeys.  I tried to
reduce the code of gather_grouping_paths to just call
generate_useful_gather_paths and found no diffs in regression tests.

Thanks
Richard

Reply via email to