Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Bruce Momjian Thu, 07 Sep 2023 11:09:22 -0700


Does anyone else have an opinion on this patch?  It looks promising.


---------------------------------------------------------------------------

On Wed, Jun 28, 2023 at 04:53:06PM +0300, Alena Rybakina wrote:
> Hi!
> 
> On 15.06.2023 11:30, Andrey Lepikhov wrote:
> 
>     Hi, all.
> 
>     Some of my clients use JOIN's with three - four clauses. Quite frequently,
>     I see complaints on unreasonable switch of JOIN algorithm to Merge Join
>     instead of Hash Join. Quick research have shown one weak place - 
> estimation
>     of an average bucket size in final_cost_hashjoin (see q2.sql in 
> attachment)
>     with very conservative strategy.
>     Unlike estimation of groups, here we use smallest ndistinct value across
>     all buckets instead of multiplying them (or trying to make multivariate
>     analysis).
>     It works fine for the case of one clause. But if we have many clauses, and
>     if each has high value of ndistinct, we will overestimate average size of 
> a
>     bucket and, as a result, prefer to use Merge Join. As the example in
>     attachment shows, it leads to worse plan than possible, sometimes
>     drastically worse.
>     I assume, this is done with fear of functional dependencies between hash
>     clause components. But as for me, here we should go the same way, as
>     estimation of groups.
>     The attached patch shows a sketch of the solution.
> 
> 
> This problem is very important.
> 
> Honestly, I'm still learning your code and looking for cases on which cases
> your patch can affect for the worse or for the better. But I have already 
> found
> something that seemed interesting to me. I have found several other 
> interesting
> cases where your patch can solve some problem in order to choose a more 
> correct
> plan, but in focus on memory consumption.
> 
> To make it easier to evaluate, I added a hook to your patch that makes it
> easier to switch to your or the original way of estimating the size of baskets
> (diff_estimate.diff).
> 
> Here are other cases where your fix improves the query plan.
> 
> 
> First of all, I changed the way creation of tables are created to look at the
> behavior of the query plan in terms of planning and execution time:
> 
> DROP TABLE IF EXISTS a,b CASCADE;
> CREATE TABLE a AS
>   SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
>   FROM generate_series(1,1e5) AS gs;
> CREATE TABLE b AS
>   SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
>   FROM generate_series(1,1e5) AS gs;
> ANALYZE a,b;
> 
> SET enable_cost_size = 'on';
> EXPLAIN ANALYZE
> SELECT * FROM a,b
> WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
> SET enable_cost_size = 'off';
> EXPLAIN ANALYZE
> SELECT * FROM a,b
> WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
> 
>                                 QUERY PLAN                                 
> ---------------------------------------------------------------------------
>  Hash Join (actual time=200.872..200.879 rows=0 loops=1)
>    Hash Cond: ((b.x = a.x) AND (b.y = a.y) AND (b.z = a.z))
>    ->  Seq Scan on b (actual time=0.029..15.946 rows=100000 loops=1)
>    ->  Hash (actual time=97.645..97.649 rows=100000 loops=1)
>          Buckets: 131072  Batches: 1  Memory Usage: 5612kB
>          ->  Seq Scan on a (actual time=0.024..17.153 rows=100000 loops=1)
>  Planning Time: 2.910 ms
>  Execution Time: 201.949 ms
> (8 rows)
> 
> SET
>                                 QUERY PLAN                                 
> ---------------------------------------------------------------------------
>  Merge Join (actual time=687.415..687.416 rows=0 loops=1)
>    Merge Cond: ((b.y = a.y) AND (b.x = a.x) AND (b.z = a.z))
>    ->  Sort (actual time=462.022..536.716 rows=100000 loops=1)
>          Sort Key: b.y, b.x, b.z
>          Sort Method: external merge  Disk: 3328kB
>          ->  Seq Scan on b (actual time=0.017..12.326 rows=100000 loops=1)
>    ->  Sort (actual time=111.295..113.196 rows=16001 loops=1)
>          Sort Key: a.y, a.x, a.z
>          Sort Method: external sort  Disk: 2840kB
>          ->  Seq Scan on a (actual time=0.020..10.129 rows=100000 loops=1)
>  Planning Time: 0.752 ms
>  Execution Time: 688.829 ms
> (12 rows)
> 
> Secondly, I found another case that is not related to the fact that the 
> planner
> would prefer to choose merge join rather than hash join, but we have the
> opportunity to see that the plan has become better due to the consumption of
> less memory, and also takes less planning time.
> 
> Here, with the same query, the planning time was reduced by 5 times, and the
> number of buckets by 128 times, therefore, memory consumption also decreased:
> 
> DROP TABLE IF EXISTS a,b CASCADE;
> 
> CREATE TABLE a AS
>   SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
>   FROM generate_series(1,600) AS gs;
> CREATE TABLE b AS
>   SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
>   FROM generate_series(1,1e5) AS gs;
> ANALYZE a,b;
> 
> SET enable_cost_size = 'on';
> EXPLAIN ANALYZE
> SELECT * FROM a,b
> WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
> SET enable_cost_size = 'off';
> EXPLAIN ANALYZE
> SELECT * FROM a,b
> WHERE a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
>                                                    QUERY
> PLAN                                                   
> ----------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=20.50..3157.58 rows=8 width=32) (actual time=95.648..95.651
> rows=0 loops=1)
>    Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND (b.z =
> (a.z)::numeric))
>    ->  Seq Scan on b  (cost=0.00..1637.00 rows=100000 width=20) (actual time=
> 0.027..17.980 rows=100000 loops=1)
>    ->  Hash  (cost=10.00..10.00 rows=600 width=12) (actual time=2.046..2.047
> rows=600 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 34kB
>          ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) (actual time=
> 0.022..0.315 rows=600 loops=1)
>  Planning Time: 0.631 ms
>  Execution Time: 95.730 ms
> (8 rows)
> 
> SET
>                                                       QUERY
> PLAN                                                      
> ----------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=3387.00..8621.58 rows=8 width=32) (actual time=
> 102.873..102.877 rows=0 loops=1)
>    Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND
> ((a.z)::numeric = b.z))
>    ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) (actual time=
> 0.014..0.131 rows=600 loops=1)
>    ->  Hash  (cost=1637.00..1637.00 rows=100000 width=20) (actual time=
> 101.920..101.921 rows=100000 loops=1)
>          Buckets: 131072  Batches: 1  Memory Usage: 6474kB
>          ->  Seq Scan on b  (cost=0.00..1637.00 rows=100000 width=20) (actual
> time=0.013..16.349 rows=100000 loops=1)
>  Planning Time: 0.153 ms
>  Execution Time: 103.518 ms
> (8 rows)
> 
> I also give an improvement relative to the left external or right connection:
> 
> DROP TABLE IF EXISTS a,b CASCADE;
> 
> CREATE TABLE a AS
>   SELECT ((3*gs) % 300) AS x, ((3*gs+1) % 300) AS y, ((3*gs+2) % 300) AS z
>   FROM generate_series(1,600) AS gs;
> CREATE TABLE b AS
>   SELECT gs % 90 AS x, gs % 49 AS y, gs %100 AS z, 'abc' || gs AS payload
>   FROM generate_series(1,1e5) AS gs;
> ANALYZE a,b;
> 
> 
> SET enable_cost_size = 'on';
> 
> EXPLAIN ANALYZE
> SELECT * FROM a right join b
> on a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
> SET enable_cost_size = 'off';
> EXPLAIN ANALYZE
> SELECT * FROM a right join b
> on a.x=b.x AND a.y=b.y AND a.z=b.z;
> 
>                                                    QUERY
> PLAN                                                   
> ----------------------------------------------------------------------------------------------------------------
>  Hash Left Join  (cost=20.50..3157.58 rows=100000 width=32) (actual time=
> 1.846..102.264 rows=100000 loops=1)
>    Hash Cond: ((b.x = (a.x)::numeric) AND (b.y = (a.y)::numeric) AND (b.z =
> (a.z)::numeric))
>    ->  Seq Scan on b  (cost=0.00..1637.00 rows=100000 width=20) (actual time=
> 0.041..15.328 rows=100000 loops=1)
>    ->  Hash  (cost=10.00..10.00 rows=600 width=12) (actual time=1.780..1.781
> rows=600 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 34kB
>          ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) (actual time=
> 0.031..0.252 rows=600 loops=1)
>  Planning Time: 0.492 ms
>  Execution Time: 107.609 ms
> (8 rows)
> 
> SET
>                                                       QUERY
> PLAN                                                      
> ----------------------------------------------------------------------------------------------------------------------
>  Hash Right Join  (cost=3387.00..8500.08 rows=100000 width=32) (actual time=
> 80.919..101.613 rows=100000 loops=1)
>    Hash Cond: (((a.x)::numeric = b.x) AND ((a.y)::numeric = b.y) AND
> ((a.z)::numeric = b.z))
>    ->  Seq Scan on a  (cost=0.00..10.00 rows=600 width=12) (actual time=
> 0.017..0.084 rows=600 loops=1)
>    ->  Hash  (cost=1637.00..1637.00 rows=100000 width=20) (actual time=
> 80.122..80.123 rows=100000 loops=1)
>          Buckets: 131072  Batches: 1  Memory Usage: 6474kB
>          ->  Seq Scan on b  (cost=0.00..1637.00 rows=100000 width=20) (actual
> time=0.015..11.819 rows=100000 loops=1)
>  Planning Time: 0.194 ms
>  Execution Time: 104.662 ms
> (8 rows)
> 
> --
> Regards,
> Alena Rybakina
> Postgres Professional
> 

> diff --git a/src/backend/optimizer/path/costsize.c 
> b/src/backend/optimizer/path/costsize.c
> index ef475d95a18..31771dfba46 100644
> --- a/src/backend/optimizer/path/costsize.c
> +++ b/src/backend/optimizer/path/costsize.c
> @@ -153,6 +153,7 @@ bool              enable_parallel_hash = true;
>  bool         enable_partition_pruning = true;
>  bool         enable_presorted_aggregate = true;
>  bool         enable_async_append = true;
> +bool                 enable_cost_size = true;
>  
>  typedef struct
>  {
> @@ -4033,11 +4034,22 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
>                               thismcvfreq = restrictinfo->left_mcvfreq;
>                       }
>  
> +                     if (enable_cost_size)
> +                     {
> +                             innerbucketsize *= thisbucketsize;
> +                             innermcvfreq *= thismcvfreq;
> +                     }
> +                     else
> +                     {
>                       if (innerbucketsize > thisbucketsize)
>                               innerbucketsize = thisbucketsize;
>                       if (innermcvfreq > thismcvfreq)
>                               innermcvfreq = thismcvfreq;
> +                     }
>               }
> +
> +             if (enable_cost_size && innerbucketsize > virtualbuckets)
> +                     innerbucketsize = 1.0 / virtualbuckets;
>       }
>  
>       /*
> diff --git a/src/backend/utils/misc/guc_tables.c 
> b/src/backend/utils/misc/guc_tables.c
> index 71e27f8eb05..ded9ba3b7a9 100644
> --- a/src/backend/utils/misc/guc_tables.c
> +++ b/src/backend/utils/misc/guc_tables.c
> @@ -1007,6 +1007,19 @@ struct config_bool ConfigureNamesBool[] =
>               true,
>               NULL, NULL, NULL
>       },
> +     {
> +             {"enable_cost_size", PGC_USERSET, QUERY_TUNING_OTHER,
> +                     gettext_noop("set the optimizer coefficient"
> +                                              "so that custom or generic 
> plan is selected more often. "
> +                                              "by default, the value is set 
> to 1, which means that "
> +                                              "the choice of using both 
> depends on the calculated cost"),
> +                     NULL,
> +                     GUC_EXPLAIN
> +             },
> +             &enable_cost_size,
> +             true,
> +             NULL, NULL, NULL
> +     },
>       {
>               {"enable_async_append", PGC_USERSET, QUERY_TUNING_METHOD,
>                       gettext_noop("Enables the planner's use of async append 
> plans."),
> diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
> index 6cf49705d3a..c79ec12e6d5 100644
> --- a/src/include/optimizer/cost.h
> +++ b/src/include/optimizer/cost.h
> @@ -71,6 +71,7 @@ extern PGDLLIMPORT bool enable_partition_pruning;
>  extern PGDLLIMPORT bool enable_presorted_aggregate;
>  extern PGDLLIMPORT bool enable_async_append;
>  extern PGDLLIMPORT int constraint_exclusion;
> +extern PGDLLIMPORT bool enable_cost_size;
>  
>  extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
>                                                                 double 
> index_pages, PlannerInfo *root);


-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: MergeJoin beats HashJoin in the case of multiple hash clauses

Reply via email to