On Tue, Feb 14, 2023 at 10:53 AM David Rowley <dgrowle...@gmail.com> wrote:
> I looked at the three patches and have some thoughts: Thanks for reviewing! > 0001: > > Does the newly added test have to be this complex? I think it might > be better to just come up with the most simple test you can that uses > an incremental sort. I really can't think why the test requires a FOR > UPDATE, to test incremental sort, for example. The danger with making > a test more complex than it needs to be is that it frequently gets > broken by unrelated changes. The complexity makes it harder for > people to understand the test's intentions and that increases the risk > that the test eventually does not test what it was originally meant to > test as the underlying code changes and the expected output is > updated. That makes sense. I agree that we should always use the minimal query for test. For this patch, as pointed by Etsuro, it may not be a good idea for the change. So I'll remove 0001. > 0002: > > I think the following existing comment does not seem to be true any longer: > > > However, there's one more > > * possibility: it may make sense to sort the cheapest partial path > > * according to the required output order and then use Gather Merge. > > You've removed the comment that talks about trying Incremental Sort -> > Gather Merge paths yet the code is still doing that, the two are just > more consolidated now, so perhaps you need to come up with a new > comment to explain what we're trying to achieve. Yes, you are right. How about the comment below? - * possibility: it may make sense to sort the cheapest partial path - * according to the required output order and then use Gather Merge. + * possibility: it may make sense to sort the cheapest partial path or + * incrementally sort any partial path that is partially sorted according + * to the required output order and then use Gather Merge. Looking at the codes now I have some concern that what we do in create_ordered_paths for partial paths may have already been done in generate_useful_gather_paths, especially when query_pathkeys is equal to sort_pathkeys. Not sure if this is a problem. And the comment there mentions generate_gather_paths(), but ISTM we should mention what generate_useful_gather_paths has done. > 0003: > > Looking at gather_grouping_paths(), I see it calls > generate_useful_gather_paths() which generates a bunch of Gather Merge > paths after sorting the cheapest path and incrementally sorting any > partially sorted paths. Then back in gather_grouping_paths(), we go > and create Gather Merge paths again, but this time according to the > group_pathkeys instead of the query_pathkeys. I know you're not really > changing anything here, but as I'm struggling to understand why > exactly we're creating two sets of Gather Merge paths, it makes it a > bit scary to consider changing anything in this area. I've not really > found any comments that can explain to me sufficiently well enough so > that I understand why this needs to be done. Do you know? I'm also not sure why gather_grouping_paths creates Gather Merge paths according to group_pathkeys after what generate_useful_gather_paths has done. There is comment that mentions this but seems more explanation is needed. * generate_useful_gather_paths does most of the work, but we also consider a * special case: we could try sorting the data by the group_pathkeys and then * applying Gather Merge. It seems that if there is available group_pathkeys, we will set query_pathkeys to group_pathkeys because we want the result sorted for grouping. In this case gather_grouping_paths may just generate duplicate Gather Merge paths because generate_useful_gather_paths has generated Gather Merge paths according to query_pathkeys. I tried to reduce the code of gather_grouping_paths to just call generate_useful_gather_paths and found no diffs in regression tests. Thanks Richard