Hi, On Sat, Feb 5, 2022 at 9:32 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote:
> > I'm also not claiming this is 100% worth it - queries with a suitable > combination of clauses (conditions on the join keys) seems rather > uncommon. Thanks for showing interest in this. I want to add some other user cases which seem not very uncommon. a). When we join the key on a foregin table, in which case, push down a qual to foregin key would be pretty good to reduce the data transformed from the network. b). If the people join many partitioned table on partitioned key, but they want to query more than 1 partitions (which means the qual on partition key is not a simple "partitionKey = Const"), then we have to do a run-time partition prune (lose the chance for initial partition prune). We have big difference on the performance aspect as well. I guess some of the people who think we may need this feature are not very clear about what bad it would be if we add this feature (Of course Including me). I summarized the discussion before and hacked the solution at [1], the current state looks reasonable to me. I'm not sure if I missed any point. > Of course, this breaks the estimates in the faster query, because we now > apply the condition twice - once for the index scan, one as the join > clause. So instead of ~100k rows the join is estimated as ~1000 rows. I think my patch has addressed this. Here is the example: postgres=# set geqo to off; -- disable this feature, we have an estimation error. -- using geqo guc in patch is just for easy testing. SET postgres=# explain analyze SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a) WHERE (t1.a > 99000) and t2.a > 99000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=0.73..2408.37 rows=990 width=8) (actual time=0.032..21.350 rows=99900 loops=1) Merge Cond: (t1.a = t2.a) -> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.64 rows=991 width=4) (actual time=0.014..0.121 rows=1000 loops=1) Index Cond: (a > 99000) Heap Fetches: 0 -> Index Only Scan using t2_a_idx on t2 (cost=0.43..2113.20 rows=101301 width=4) (actual time=0.013..9.854 rows=99900 loops=1) Index Cond: (a > 99000) Heap Fetches: 0 Planning Time: 0.282 ms Execution Time: 24.823 ms (10 rows) postgres=# set geqo to on; -- enable this feature and let planner derive the qual by itself, the estimation -- is good. SET postgres=# explain analyze SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a) WHERE (t1.a > 99000) ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=0.73..2408.37 rows=97680 width=8) (actual time=0.031..21.296 rows=99900 loops=1) Merge Cond: (t1.a = t2.a) -> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.64 rows=991 width=4) (actual time=0.014..0.116 rows=1000 loops=1) Index Cond: (a > 99000) Heap Fetches: 0 -> Index Only Scan using t2_a_idx on t2 (cost=0.43..2113.20 rows=101301 width=4) (actual time=0.012..9.751 rows=99900 loops=1) Index Cond: (a > 99000) Heap Fetches: 0 Planning Time: 0.269 ms Execution Time: 24.749 ms (10 rows) So I think knowing what bad it is to have this feature is the key point to discussion now. [1] https://www.postgresql.org/message-id/CAKU4AWpo9z0hMHDWUKuce4Z-NpcybV0J2UVu5%2BDVwyP-CrHCQg%40mail.gmail.com -- Best Regards Andy Fan