Re: Self-join optimisation

Matthias van de Meent Thu, 11 Mar 2021 06:32:45 -0800

On Thu, 11 Mar 2021 at 15:15, Hywel Carver <[email protected]> wrote:
>
> Hi,
>
> I asked this question in the Postgres Slack, and was recommended to ask here 
> instead.
>
> A few times, I've been in a situation where I want to join a table to itself 
> on its primary key. That typically happens because I have some kind of 
> summary view, which I then want to join to the original table (using its 
> primary key) to flesh out the summary data with other columns. That's 
> executed as a join, which surprised me. But in this case, I could extend the 
> view to have all of the columns of the original table to avoid the join.
>
> But there's another case that's harder to solve this way: combining views 
> together. Here's a trivial example:
>
> CREATE TABLE users (id BIGINT PRIMARY KEY, varchar name);
> CREATE VIEW only_some_users AS (SELECT * FROM users WHERE id < 10);
> CREATE VIEW some_other_users AS (SELECT * FROM users WHERE id > 3);
>
> EXPLAIN SELECT * FROM only_some_users
> INNER JOIN some_other_users ON only_some_users.id = some_other_users.id;
>
> Hash Join  (cost=29.23..43.32 rows=90 width=144)
>   Hash Cond: (users.id = users_1.id)
>   ->  Bitmap Heap Scan on users  (cost=6.24..19.62 rows=270 width=72)
>         Recheck Cond: (id < 10)
>         ->  Bitmap Index Scan on users_pkey  (cost=0.00..6.18 rows=270 
> width=0)
>               Index Cond: (id < 10)
>   ->  Hash  (cost=19.62..19.62 rows=270 width=72)
>         ->  Bitmap Heap Scan on users users_1  (cost=6.24..19.62 rows=270 
> width=72)
>               Recheck Cond: (id > 3)
>               ->  Bitmap Index Scan on users_pkey  (cost=0.00..6.18 rows=270 
> width=0)
>                     Index Cond: (id > 3)
>
> Is there a reason why Postgres doesn't have an optimisation built in to 
> optimise this JOIN? What I'm imagining is that a join between two aliases for 
> the same table on its primary key could be optimised by treating them as the 
> same table. I think the same would be true for self-joins on any non-null 
> columns covered by a uniqueness constraint.
>
> If this is considered a desirable change, I'd be keen to work on it (with 
> some guidance).


There's currently a patch registered in the commitfest that could fix
this for you, called "Remove self join on a unique column" [0].


With regards,

Matthias van de Meent

[0] https://commitfest.postgresql.org/31/1712/, thread at
https://www.postgresql.org/message-id/flat/[email protected]

Re: Self-join optimisation

Reply via email to