On 21/05/2024 05:58, David Rowley wrote:
Let this thread be for at least the coding portion of this or be my
thread for this patch for the v18 cycle if the RMT rules in favour of
keeping that code reverted for v17.

I've attached 2 patches.

0001 is a simple revert of Tom's revert (7204f3591).
0002 fixes the issue reported by Hubert.

If anyone wants to have a look, I'd be grateful for that.  Tom did
call for further review after this being the 4th issue reported for
66c0185a3.

My planner experience is a bit rusty, but I took a quick look. Looks generally OK to me. Some comments below:

+       /* For for UNIONs (not UNION ALL), try sorting, if sorting is possible 
*/

Duplicated word: "For for"

/*
 * build_setop_child_paths
 *              Build paths for the set op child relation denoted by 'rel'.
 *
 * interesting_pathkeys: if not NIL, also include paths that suit these
 * pathkeys, sorting any unsorted paths as required.
 * *pNumGroups: if not NULL, we estimate the number of distinct groups
 *              in the result, and store it there

The indentation on 'interesting_pathkeys' and '*pNumGroups' is inconsistent.

I have a vague feeling that this comment deserves to be longer. The function does a lot of things. How is 'child_tlist' different from rel->reltarget for example?

'interesting_pathkeys' is modified by the call to add_setop_child_rel_equivalences(): it adds members to the EquivalenceClasses of the pathkeys. Is that worth mentioning here, or is that obvious to someone who know more about the planner?

                /*
                 * Create paths to suit final sort order required for 
setop_pathkeys.
                 * Here we'll sort the cheapest input path (if not sorted 
already) and
                 * incremental sort any paths which are partially sorted.
                 */
                is_sorted = pathkeys_count_contained_in(setop_pathkeys,
                                                                                   
             subpath->pathkeys,
                                                                                    
            &presorted_keys);

                if (!is_sorted)
                {

Maybe also mention that if it's already sorted, it's used as is.

BTW, could the same machinery be used for INTERSECT as well? There was a brief mention of that in the original thread, but I didn't understand the details. Not for v17, but I'm curious. I was wondering if build_setop_child_paths() should be named build_union_child_paths(), since it's only used with UNIONs, but I guess it could be used as is for INTERSECT too.


# Testing

postgres=# begin; create table foo as select i from generate_series(1, 1000000) i; create index on foo (i); commit;
BEGIN
SELECT 1000000
CREATE INDEX
COMMIT
postgres=# set enable_seqscan=off;
SET
postgres=# explain (select 1 as i union select i from foo) order by i;
QUERY PLAN
------------------------------------------------------------------------------------------------------
 Unique  (cost=144370.89..149370.89 rows=1000001 width=4)
   ->  Sort  (cost=144370.89..146870.89 rows=1000001 width=4)
         Sort Key: (1)
         ->  Append  (cost=0.00..31038.44 rows=1000001 width=4)
               ->  Result  (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo (cost=0.42..26038.42 rows=1000000 width=4)
(6 rows)

I'm disappointed it couldn't produce a MergeAppend plan. If you replace the "union" with "union all" you do get a MergeAppend.

Some more cases where I hoped for a MergeAppend:

postgres=# explain (select i, 'foo' from foo union select i, 'foo' from foo) order by 1; QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Unique  (cost=380767.54..395767.54 rows=2000000 width=36)
   ->  Sort  (cost=380767.54..385767.54 rows=2000000 width=36)
         Sort Key: foo.i, ('foo'::text)
         ->  Append  (cost=0.42..62076.85 rows=2000000 width=36)
-> Index Only Scan using foo_i_idx on foo (cost=0.42..26038.42 rows=1000000 width=36) -> Index Only Scan using foo_i_idx on foo foo_1 (cost=0.42..26038.42 rows=1000000 width=36)
(6 rows)


postgres=# explain (select 'foo', i from foo union select 'bar', i from foo) order by 1; QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Unique  (cost=380767.54..395767.54 rows=2000000 width=36)
   ->  Sort  (cost=380767.54..385767.54 rows=2000000 width=36)
         Sort Key: ('foo'::text), foo.i
         ->  Append  (cost=0.42..62076.85 rows=2000000 width=36)
-> Index Only Scan using foo_i_idx on foo (cost=0.42..26038.42 rows=1000000 width=36) -> Index Only Scan using foo_i_idx on foo foo_1 (cost=0.42..26038.42 rows=1000000 width=36)
(6 rows)


The following two queries are the same from the user's point of view, but one is written using WITH:

postgres=# explain (select i from foo union (select 1::int order by 1) union select i from foo) order by 1; QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Unique  (cost=326083.66..336083.67 rows=2000001 width=4)
   ->  Sort  (cost=326083.66..331083.67 rows=2000001 width=4)
         Sort Key: foo.i
         ->  Append  (cost=0.42..62076.87 rows=2000001 width=4)
-> Index Only Scan using foo_i_idx on foo (cost=0.42..26038.42 rows=1000000 width=4)
               ->  Result  (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo foo_1 (cost=0.42..26038.42 rows=1000000 width=4)
(7 rows)

postgres=# explain with x (i) as (select 1::int order by 1) (select i from foo union select i from x union select i from foo) order by 1; QUERY PLAN
------------------------------------------------------------------------------------------------------
 Unique  (cost=0.89..82926.54 rows=2000001 width=4)
   ->  Merge Append  (cost=0.89..77926.54 rows=2000001 width=4)
         Sort Key: foo.i
-> Index Only Scan using foo_i_idx on foo (cost=0.42..26038.42 rows=1000000 width=4)
         ->  Sort  (cost=0.02..0.03 rows=1 width=4)
               Sort Key: (1)
               ->  Result  (cost=0.00..0.01 rows=1 width=4)
-> Index Only Scan using foo_i_idx on foo foo_1 (cost=0.42..26038.42 rows=1000000 width=4)
(8 rows)

I would've expected a MergeAppend in both cases.


None of these test cases are broken as such, you just don't get the benefit of the optimization. I suspect they might all have the same root cause, as they all involve constants in the target list. I think that's a pretty common use case of UNION though.

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to