Re: [HACKERS] PATCH: use foreign keys to improve join estimates v1

Tomas Vondra Mon, 18 May 2015 16:08:55 -0700

Hi David,

On 05/17/15 14:31, David Rowley wrote:

Hi Tomas,


I did glance at this patch a while back, but just thinking on it again.

I think if you find any quals that are a part of *any* foreign key
between the 2 join tables, then you should be never assume these quals
to reduce the number of rows. I believe this should be the case even if
you don't fully match all columns of the foreign key.

If we modify your example a little, let's say your foreign key between
fact and dim is made from 3 columns (a,b,c)

If we do:

EXPLAIN SELECT * FROM fact f JOIN dim d USING (a,b);

Then we should always (under normal circumstances) find at least one
matching row, although in this case since the join qual for c is
missing, we could find more than 1 matching row.

Without digging too deep here, I'd say that the best way to do this
would be to either have calc_joinrel_size_estimate() build a list of
restrictinfo items of all quals that are NOT part of any foreign key and
pass that trimmed list down to clauselist_selectivity() for selectivity
estimates. Or perhaps a better way would be just to
teach clauselist_selectivity() about foreign keys. Likely
clauselist_selectivity() would just have to skip over RestrictInfo items
that are part of a foreign key.

I'm not particularly happy about the way the current patch tweaks theselectivity, but I think simply removing the clauses is not going towork, because that implies the (removed) conditions have selectivity 1.0(so the estimate would match a cartesian product). So we really need toestimate the selectivity, we can't just throw them away.

And that's essentially what the current patch does - it realizes theclauses match a FK, and then computes the estimate as 1/N where N is thecardinality of the table with PK.

Another thing is that there may be multiple "candidate" foreign keys,and we can't just remove clauses matching at least one of them. Imaginee.g. a (somewhat artificial) example:


    CREATE TABLE parent (
       a INT,
       b INT,
       c INT,
       d INT,
       PRIMARY KEY (a,b),
       UNIQUE (c,d)
    );

    CREATE TABLE child (
       a INT,
       b INT,
       c INT,
       d INT,
       FOREIGN KEY (a,b) REFERENCES parent (a,b),
       FOREIGN KEY (c,d) REFERENCES parent (c,b)
    );

and a join on (a,b,c,d). We can't just discard all the conditions,because (a,b) and (c,d) may 'mismatch'. We know (a,b) and (c,d) matchesabout 1/N of the 'parent' table, but we don't know selectivity for thewhole join condition.

And it may be more complex, e.g. there may be two overlapping FKeys,e.b. (a,b) and (b,c) - how do you split that?

But this may be an overkill (multiple multi-column FK join), although ifwe could handle that reasonably, then why not ... someone out therecertainly has schema like that ;-)

What I think is somewhat more realistic is conditions matching only asubset of FK columns - for example with a FK on (a,b) the join may onlyuse (a), for example. Then again - we can't just discard that, we needto estimate that (and use it to compute the actual selectivity).

I agree that if we want to do anything fancy with this, we will have toteach clauselist_selectivity() about foreign keys.


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: use foreign keys to improve join estimates v1

Reply via email to