Re: Synthetic semi-join

Mostafa Mokhtar Fri, 08 Aug 2014 13:35:33 -0700

This concept is named SJR  (Semi Join reduction) , this paper covers the
concept in detail
http://www-db.in.tum.de/research/publications/conferences/semijoin.pdf.


As Vladimir mentioned SJR analogous to bloom filters but the dimension
table itself is used opposed to using just a bloom filter.

TPC-H queries 17 and 20 are good candidates for semi join reduction.

This feature should definitely be on our roadmap.

Thanks
Mostafa




On Fri, Aug 8, 2014 at 1:04 PM, Julian Hyde <[email protected]> wrote:

> You’re right that bloom filters are useful. I was just exploring what
> could be done at the logical level; when it comes to implementing the
> semi-join, bloom filters are a good option, if you can accept an
> approximate answer.
>
> Here’s a scenario where it would make sense to transform JoinRel(X, Y) —>
> JoinRel(SemiJoinRel(X, Y), Y). Let’s suppose that Y has a large number of
> rows and columns (i.e. the average row length is large). We can ship the
> set of distinct Y key values to X, semi-join them, then send the filtered X
> rows to Y.
>
> So, SemiJoin(X, Y) has significantly lower I/O cost than Join(X, Y) even
> though it reads the same number of rows from X and Y, because it reads
> fewer columns from Y.
>
> We’ve replaced one shuffle join with two map joins.
>
> Julian

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Synthetic semi-join

Reply via email to