>X is large, Y is small (i.e. candidate for map-join)
Why don't you just do a map-join then? Why that map-semijoin would improve
anything?

Is there something specific why you want using SemiJoins?

It looks like you are trying to eliminate some rows before the actual join
starts, aren't you?
I guess Bloom filters are used in this area. You can tune the knob of
memory to cpu (e.g. allocate more memory to Bloom and get less
false-positives).

JoinRel(X, Y) -> JoinRel(BloomBuild(X), BloomFilter(Y))
In other words, you build the Bloom filter while scanning X and use that
filter to eliminate rows from Y.

Vladimir

Reply via email to