While reviewing Heikki's Omit-junk-columns patchset[1], I noticed that root->upper_targets[] is used to set target for partial_distinct_rel, which is not great because root->upper_targets[] is not supposed to be used by the core code. The comment in grouping_planner() says:
* Save the various upper-rel PathTargets we just computed into * root->upper_targets[]. The core code doesn't use this, but it * provides a convenient place for extensions to get at the info. Then while fixing this issue, I noticed an opportunity for improvement in how we generate Gather/GatherMerge paths for the two-phase DISTINCT. The Gather/GatherMerge paths are added by generate_gather_paths(), which does not consider ordering that might be useful above the GatherMerge node. This can be improved by using generate_useful_gather_paths() instead. With this change I can see query plan improvement from the regression test "select_distinct.sql". For instance, -- Test parallel DISTINCT SET parallel_tuple_cost=0; SET parallel_setup_cost=0; SET min_parallel_table_scan_size=0; SET max_parallel_workers_per_gather=2; -- Ensure we get a parallel plan EXPLAIN (costs off) SELECT DISTINCT four FROM tenk1; -- on master EXPLAIN (costs off) SELECT DISTINCT four FROM tenk1; QUERY PLAN ---------------------------------------------------- Unique -> Sort Sort Key: four -> Gather Workers Planned: 2 -> HashAggregate Group Key: four -> Parallel Seq Scan on tenk1 (8 rows) -- on patched EXPLAIN (costs off) SELECT DISTINCT four FROM tenk1; QUERY PLAN ---------------------------------------------------- Unique -> Gather Merge Workers Planned: 2 -> Sort Sort Key: four -> HashAggregate Group Key: four -> Parallel Seq Scan on tenk1 (8 rows) I believe the second plan is better. Attached is a patch that includes this change and also eliminates the usage of root->upper_targets[] in the core code. It also makes some tweaks for the comment. Any thoughts? [1] https://www.postgresql.org/message-id/flat/2ca5865b-4693-40e5-8f78-f3b45d5378fb%40iki.fi Thanks Richard
v1-0001-Improve-parallel-DISTINCT.patch
Description: Binary data