>X is large, Y is small (i.e. candidate for map-join) Why don't you just do a map-join then? Why that map-semijoin would improve anything?
Is there something specific why you want using SemiJoins? It looks like you are trying to eliminate some rows before the actual join starts, aren't you? I guess Bloom filters are used in this area. You can tune the knob of memory to cpu (e.g. allocate more memory to Bloom and get less false-positives). JoinRel(X, Y) -> JoinRel(BloomBuild(X), BloomFilter(Y)) In other words, you build the Bloom filter while scanning X and use that filter to eliminate rows from Y. Vladimir
