On Tue, Aug 2, 2022 at 3:13 PM Richard Guo <guofengli...@gmail.com> wrote:
> On Sun, Jul 31, 2022 at 12:07 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > >> [ wanders away wondering if JOIN_RIGHT_SEMI should become a thing ... ] > > Maybe this is something we can do. Currently for the query below: > > # explain select * from foo where a in (select c from bar); > QUERY PLAN > ------------------------------------------------------------------------- > Hash Semi Join (cost=154156.00..173691.29 rows=10 width=8) > Hash Cond: (foo.a = bar.c) > -> Seq Scan on foo (cost=0.00..1.10 rows=10 width=8) > -> Hash (cost=72124.00..72124.00 rows=5000000 width=4) > -> Seq Scan on bar (cost=0.00..72124.00 rows=5000000 width=4) > (5 rows) > > I believe we can get a cheaper plan if we are able to swap the outer and > inner for SEMI JOIN and use the smaller 'foo' as inner rel. > I'm thinking about the JOIN_RIGHT_SEMI thing and it seems that it can be implemented for HashJoin with very short change. What we want to do is to just have the first match for each inner tuple. So after scanning the hash bucket for matches, we just need to check whether the inner tuple has been set match and skip it if so, something like { if (!ExecScanHashBucket(node, econtext)) { /* out of matches; check for possible outer-join fill */ node->hj_JoinState = HJ_FILL_OUTER_TUPLE; continue; } } + /* + * In a right-semijoin, we only need the first match for each + * inner tuple. + */ + if (node->js.jointype == JOIN_RIGHT_SEMI && + HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple))) + continue; + I have a simple implementation locally and tried it with the query below and saw a speedup of 2055.617 ms VS. 1156.772 ms (both best of 3). # explain (costs off, analyze) select * from foo where a in (select c from bar); QUERY PLAN ------------------------------------------------------------------------------- Hash Semi Join (actual time=1957.748..2055.058 rows=10 loops=1) Hash Cond: (foo.a = bar.c) -> Seq Scan on foo (actual time=0.026..0.029 rows=10 loops=1) -> Hash (actual time=1938.818..1938.819 rows=5000000 loops=1) Buckets: 262144 Batches: 64 Memory Usage: 4802kB -> Seq Scan on bar (actual time=0.016..853.010 rows=5000000 loops=1) Planning Time: 0.327 ms Execution Time: 2055.617 ms (8 rows) # explain (costs off, analyze) select * from foo where a in (select c from bar); QUERY PLAN ------------------------------------------------------------------------- Hash Right Semi Join (actual time=11.525..1156.713 rows=10 loops=1) Hash Cond: (bar.c = foo.a) -> Seq Scan on bar (actual time=0.034..523.036 rows=5000000 loops=1) -> Hash (actual time=0.027..0.029 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on foo (actual time=0.009..0.014 rows=10 loops=1) Planning Time: 0.312 ms Execution Time: 1156.772 ms (8 rows) It may not be easy for MergeJoin and NestLoop though, as we do not have a way to know if an inner tuple has been already matched or not. But the benefit of swapping inputs for MergeJoin and NestLoop seems to be small, so I think it's OK to ignore them. So is it worthwhile to make JOIN_RIGHT_SEMI come true? Thanks Richard