Hi,

while investigating a poor query plan choice, I've noticed suspicious semi join estimates looking like this:

 ->  Nested Loop  (cost=35.52..1787.31 rows=281653470 width=566)
       Buffers: shared hit=9305
       ->  HashAggregate  (cost=34.81..36.81 rows=200 width=32)
             Group Key: data.column1
             ->  CTE Scan on data  (cost=0.00..30.94 rows=1547 width=32)
       ->  Index Scan using t_pkey on t (cost=0.71..8.74 rows=1 ...)
             Index Cond: ((a = data.column1) AND (...))
             Buffers: shared hit=9305

Notice the cardinality estimates - we expect the outer relation to have 200 rows, and for each of the 200 rows 1 matching row in the inner rel. Yet we estimate the join cardinality to be ~280M rows, which is rather excessive (it even exceeds the cartesian product cardinality).

So, how could this happen? Quite easily, actually ...

Consider a simple table

   CREATE TABLE t (a INT PRIMARY KEY, ...);

and this query:

WITH data AS (VALUES (1), (2), (3), ... list of more IDs ...)
SELECT * FROM t
WHERE a IN (SELECT column1 FROM data);

This is a simple semi join - the planner will do a HashAggregate on the CTE to find the list of distinct values, and then it will do a lookup on 't' table to find matching rows. There may be additional conditions, as suggested by the initial plan, but let's ignore that for now.

Now, the CTE actually has 1547 values in it, but the HashAggregate is estimated to output just 200 values. Where does that come from?

Turns out, when there are no stats for the relation (e.g. when it's a CTE), get_variable_numdistinct() behaves differently depending on the cardinality of the relation. Essentially, this happens:

  if (rows < DEFAULT_NUM_DISTINCT)
  {
      *isdefault = false;
      return rows;
  }
  else
  {
     *isdefault = true;
     return DEFAULT_NUM_DISTINCT;
  }

i.e. we cap the ndistinct estimate to DEFAULT_NUM_DISTINCT and for some reason return different value for the 'isdefault' flag.

Then, eqjoinsel_semi() falls through to this part of the code, as there is no MCV for the CTE table:

    if (!isdefault1 && !isdefault2)
    {
        if (nd1 <= nd2 || nd2 < 0)
            selec = 1.0 - nullfrac1;
        else
            selec = (nd2 / nd1) * (1.0 - nullfrac1);
    }
    else
        selec = 0.5 * (1.0 - nullfrac1);

Assuming the other side of the join is a regular table with proper stats (isdefault=false), depending on the CTE cardinality we fall into different branches of the if condition.

When rows<200, we end up with isdefault=false and use e.g.

    selec = (nd2 / nd1) * (1.0 - nullfrac1)

which produces a sane estimate. But with rows >= 200, we suddenly switch to the other estimate

    selec = 0.5 * (1.0 - nullfrac1);

which completely ignores the ndistinct values and instead evaluates to 0.5 (when nullfrac1=0.0). This is the source of overestimates, because the actual join selectivity is way lower than 0.5 and we simply do

    rows = outer_rows * jselec;

in calc_joinrel_size_estimate() for JOIN_SEMI. It's not difficult to construct examples of those over-estimates - attached are two scripts with examples demonstrating the issue, differing in the scale of the over-estimates.

It's also possible to construct examples for the inner if condition (see example-3.sql):

    if (nd1 <= nd2 || nd2 < 0)
        selec = 1.0 - nullfrac1;
    else
        selec = (nd2 / nd1) * (1.0 - nullfrac1);

but in this case we only switch between selec=1.0 and 0.5 (again, assuming nullfrac=0.0). While this sudden jump in estimates is not great, it can't result in such excessive estimates.

Clearly, this use of CTEs is a bit unnecessary and it's trivial to replace it with a simple IN() clause.

Moreover we can't really expect good estimates without stats, and I don't think changing DEFAULT_NUM_DISTINCT from one arbitrary value to some other arbitrary value would help much - we'd only see the strange plan changes with different numbers of values in the CTE. But perhaps we can do at least a bit better.

Firstly, we could clamp the join cardinality estimate to the size of cartesian product, eliminating at least the most obvious over-estimates.

Secondly, I think it'd be nice to somehow eliminate the sudden jumps in the estimates - I'm not quite sure why need the three different formulas in eqjoinsel_semi, so maybe we can get rid of some of them?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment: semijoin-example-3.sql
Description: application/sql

Attachment: semijoin-example-1.sql
Description: application/sql

Attachment: semijoin-example-2.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to