Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

Heikki Linnakangas Tue, 21 May 2024 04:48:37 -0700

On 21/05/2024 05:58, David Rowley wrote:

Let this thread be for at least the coding portion of this or be my
thread for this patch for the v18 cycle if the RMT rules in favour of
keeping that code reverted for v17.


I've attached 2 patches.

0001 is a simple revert of Tom's revert (7204f3591).
0002 fixes the issue reported by Hubert.

If anyone wants to have a look, I'd be grateful for that.  Tom did
call for further review after this being the 4th issue reported for
66c0185a3.

My planner experience is a bit rusty, but I took a quick look. Looksgenerally OK to me. Some comments below:

+       /* For for UNIONs (not UNION ALL), try sorting, if sorting is possible 
*/


Duplicated word: "For for"

/*
 * build_setop_child_paths
 *              Build paths for the set op child relation denoted by 'rel'.
 *
 * interesting_pathkeys: if not NIL, also include paths that suit these
 * pathkeys, sorting any unsorted paths as required.
 * *pNumGroups: if not NULL, we estimate the number of distinct groups
 *              in the result, and store it there


The indentation on 'interesting_pathkeys' and '*pNumGroups' is inconsistent.

I have a vague feeling that this comment deserves to be longer. Thefunction does a lot of things. How is 'child_tlist' different fromrel->reltarget for example?

'interesting_pathkeys' is modified by the call toadd_setop_child_rel_equivalences(): it adds members to theEquivalenceClasses of the pathkeys. Is that worth mentioning here, or isthat obvious to someone who know more about the planner?

                /*
                 * Create paths to suit final sort order required for 
setop_pathkeys.
                 * Here we'll sort the cheapest input path (if not sorted 
already) and
                 * incremental sort any paths which are partially sorted.
                 */
                is_sorted = pathkeys_count_contained_in(setop_pathkeys,
                                                                                   
             subpath->pathkeys,
                                                                                    
            &presorted_keys);

                if (!is_sorted)
                {


Maybe also mention that if it's already sorted, it's used as is.

BTW, could the same machinery be used for INTERSECT as well? There was abrief mention of that in the original thread, but I didn't understandthe details. Not for v17, but I'm curious. I was wondering ifbuild_setop_child_paths() should be named build_union_child_paths(),since it's only used with UNIONs, but I guess it could be used as is forINTERSECT too.



# Testing

postgres=# begin; create table foo as select i from generate_series(1,1000000) i; create index on foo (i); commit;

BEGIN
SELECT 1000000
CREATE INDEX
COMMIT
postgres=# set enable_seqscan=off;
SET
postgres=# explain (select 1 as i union select i from foo) order by i;

QUERY PLAN

------------------------------------------------------------------------------------------------------
 Unique  (cost=144370.89..149370.89 rows=1000001 width=4)
   ->  Sort  (cost=144370.89..146870.89 rows=1000001 width=4)
         Sort Key: (1)
         ->  Append  (cost=0.00..31038.44 rows=1000001 width=4)
               ->  Result  (cost=0.00..0.01 rows=1 width=4)

-> Index Only Scan using foo_i_idx on foo(cost=0.42..26038.42 rows=1000000 width=4)

(6 rows)

I'm disappointed it couldn't produce a MergeAppend plan. If you replacethe "union" with "union all" you do get a MergeAppend.


Some more cases where I hoped for a MergeAppend:

postgres=# explain (select i, 'foo' from foo union select i, 'foo' fromfoo) order by 1;QUERY PLAN

-------------------------------------------------------------------------------------------------------------
 Unique  (cost=380767.54..395767.54 rows=2000000 width=36)
   ->  Sort  (cost=380767.54..385767.54 rows=2000000 width=36)
         Sort Key: foo.i, ('foo'::text)
         ->  Append  (cost=0.42..62076.85 rows=2000000 width=36)

-> Index Only Scan using foo_i_idx on foo(cost=0.42..26038.42 rows=1000000 width=36)-> Index Only Scan using foo_i_idx on foo foo_1(cost=0.42..26038.42 rows=1000000 width=36)

(6 rows)

postgres=# explain (select 'foo', i from foo union select 'bar', i fromfoo) order by 1;QUERY PLAN

-------------------------------------------------------------------------------------------------------------
 Unique  (cost=380767.54..395767.54 rows=2000000 width=36)
   ->  Sort  (cost=380767.54..385767.54 rows=2000000 width=36)
         Sort Key: ('foo'::text), foo.i
         ->  Append  (cost=0.42..62076.85 rows=2000000 width=36)

-> Index Only Scan using foo_i_idx on foo(cost=0.42..26038.42 rows=1000000 width=36)-> Index Only Scan using foo_i_idx on foo foo_1(cost=0.42..26038.42 rows=1000000 width=36)

(6 rows)

The following two queries are the same from the user's point of view,but one is written using WITH:

postgres=# explain (select i from foo union (select 1::int order by 1)union select i from foo) order by 1;QUERY PLAN

------------------------------------------------------------------------------------------------------------
 Unique  (cost=326083.66..336083.67 rows=2000001 width=4)
   ->  Sort  (cost=326083.66..331083.67 rows=2000001 width=4)
         Sort Key: foo.i
         ->  Append  (cost=0.42..62076.87 rows=2000001 width=4)

-> Index Only Scan using foo_i_idx on foo(cost=0.42..26038.42 rows=1000000 width=4)

               ->  Result  (cost=0.00..0.01 rows=1 width=4)

-> Index Only Scan using foo_i_idx on foo foo_1(cost=0.42..26038.42 rows=1000000 width=4)

(7 rows)

postgres=# explain with x (i) as (select 1::int order by 1) (select ifrom foo union select i from x union select i from foo) order by 1;QUERY PLAN

------------------------------------------------------------------------------------------------------
 Unique  (cost=0.89..82926.54 rows=2000001 width=4)
   ->  Merge Append  (cost=0.89..77926.54 rows=2000001 width=4)
         Sort Key: foo.i

-> Index Only Scan using foo_i_idx on foo(cost=0.42..26038.42 rows=1000000 width=4)

         ->  Sort  (cost=0.02..0.03 rows=1 width=4)
               Sort Key: (1)
               ->  Result  (cost=0.00..0.01 rows=1 width=4)

-> Index Only Scan using foo_i_idx on foo foo_1(cost=0.42..26038.42 rows=1000000 width=4)

(8 rows)

I would've expected a MergeAppend in both cases.

None of these test cases are broken as such, you just don't get thebenefit of the optimization. I suspect they might all have the same rootcause, as they all involve constants in the target list. I think that'sa pretty common use case of UNION though.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

Reply via email to