Hi, Recently, I found while using `union all order by limit` that `MergeAppend` adds a `Sort` node even when child node data is already ordered, leading to low execution efficiency.
The issue can be reproduced with the following case (on the latest master branch). In the execution plan below, although `t2` is ordered by `a, b` via `t_a_b_idx`, a `Sort` node is still added, causing the outer `limit` to be ineffective. Most of the time is spent on the `Sort` node. ```sql create table t(a int, b int); insert into t select 1, i from generate_series(1, 10000)i; insert into t select i, i from generate_series(10000, 20000)i; create index on t(a, b); analyze t; set enable_bitmapscan to off; set enable_seqscan to off; explain select a, b from ( (select a, b from t t1 where a > 19000 order by a, b) union all (select a, b from t t2 where a = 1 and b > 1 order by a, b) ) t order by a, b limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------- Limit (cost=366.56..366.57 rows=1 width=8) -> Merge Append (cost=366.56..531.05 rows=10999 width=8) Sort Key: t1.a, t1.b -> Index Only Scan using t_a_b_idx on t t1 (cost=0.29..29.79 rows=1000 width=8) Index Cond: (a > 19000) -> Sort (cost=366.26..391.26 rows=9999 width=8) Sort Key: t2.a, t2.b -> Index Only Scan using t_a_b_idx on t t2 (cost=0.29..316.27 rows=9999 width=8) Index Cond: ((a = 1) AND (b > 1)) (9 rows) ``` I briefly analyzed the cause of this issue: in the `standard_qp_callback` function, while setting `sort_pathkeys`, the `a` column is not added to `sort_pathkeys` because its `EquivalenceClass` contains `Const`. Hence, `a` is absent in `PlannerInfo->query_pathkeys`. The purpose of this might be to reduce the number of Sort columns. Subsequently, when `build_index_paths` generates `IndexPath`, the `truncate_useless_pathkeys` function removes the `a` column from `Path->pathkeys`. Thus, this issue can be minimally reproduced with the following SQL: ```sql explain select * from (select * from t where a > 19000 order by a, b) order by a, b limit 1; QUERY PLAN ---------------------------------------------------------------------------------- Limit (cost=0.29..0.32 rows=1 width=8) -> Index Only Scan using t_a_b_idx on t (cost=0.29..29.79 rows=1000 width=8) Index Cond: (a > 19000) (3 rows) explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1; QUERY PLAN ----------------------------------------------------------------------------------------- Limit (cost=366.26..366.27 rows=1 width=8) -> Sort (cost=366.26..391.26 rows=9999 width=8) Sort Key: t.a, t.b -> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8) Index Cond: ((a = 1) AND (b > 1)) (5 rows) ``` Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the upper layers of the subquery, rather than just keeping the portion trimmed by `PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct. Furthermore, is there a more efficient way to write this, to avoid the `Sort` node mentioned above? Best Regards, Fei Changhong