On Fri, Nov 24, 2023 at 6:29 AM David Rowley <dgrowle...@gmail.com> wrote:
> I've attached the updated patch. This one is probably ready for > someone to test out. There will be more work to do, however. I just started reviewing this patch and haven't looked through all the details yet. Here are some feedbacks that came to my mind. Post them first so that I don’t forget them after the holidays. * I think we should update truncate_useless_pathkeys() to account for the ordering requested by the query's set operation; otherwise we may not get a subquery's path with the expected pathkeys. For instance, create table t (a int, b int); create index on t (a, b); set enable_hashagg to off; -- on v1 patch explain (costs off) (select * from t order by a) UNION (select * from t order by a); QUERY PLAN ------------------------------------------------------------ Unique -> Merge Append Sort Key: t.a, t.b -> Incremental Sort Sort Key: t.a, t.b Presorted Key: t.a -> Index Only Scan using t_a_b_idx on t -> Incremental Sort Sort Key: t_1.a, t_1.b Presorted Key: t_1.a -> Index Only Scan using t_a_b_idx on t t_1 (11 rows) -- after accounting for setop_pathkeys in truncate_useless_pathkeys() explain (costs off) (select * from t order by a) UNION (select * from t order by a); QUERY PLAN ------------------------------------------------------ Unique -> Merge Append Sort Key: t.a, t.b -> Index Only Scan using t_a_b_idx on t -> Index Only Scan using t_a_b_idx on t t_1 (5 rows) * I understand that we need to sort (full or incremental) the paths of the subqueries to meet the ordering required for setop_pathkeys, so that MergeAppend has something to work with. Currently in the v1 patch this sorting is performed during the planning phase of the subqueries (in grouping_planner). And we want to add the subquery's cheapest_total_path as-is to allow that path to be used in hash-based UNIONs, and we also want to add a sorted path on top of cheapest_total_path. And then we may encounter the issue you mentioned earlier regarding add_path() potentially freeing the cheapest_total_path, leaving the Sort's subpath gone. I'm thinking that maybe it'd be better to move the work of sorting the subquery's paths to the outer query level, specifically within the build_setop_child_paths() function, just before we stick SubqueryScanPath on top of the subquery's paths. I think this is better because: 1. This minimizes the impact on subquery planning and reduces the footprint within the grouping_planner() function as much as possible. 2. This can help avoid the aforementioned add_path() issue because the two involved paths will be structured as: cheapest_path -> subqueryscan and cheapest_path -> sort -> subqueryscan If the two paths cost fuzzily the same and add_path() decides to keep the second one due to it having better pathkeys and pfree the first one, it would not be a problem. BTW, I haven't looked through the part involving partial paths, but I think we can do the same to partial paths. * I noticed that in generate_union_paths() we use a new function build_setop_pathkeys() to build the 'union_pathkeys'. I wonder why we don't simply utilize make_pathkeys_for_sortclauses() since we already have the grouplist for the setop's output columns. To assist the discussion I've attached a diff file that includes all the changes above. Thanks Richard
v1-0001-review-diff.patch
Description: Binary data