That makes sense.

In https://issues.apache.org/jira/browse/OPTIQ-369, I have implemented 
EnumerableSemiJoinRel. An example (from misc.oq):

select * from "hr"."emps"
where exists (
  select 1 from "hr"."depts" where "depts"."deptno" = "emps"."deptno");
+-------+--------+-----------+---------+------------+
| empid | deptno | name      | salary  | commission |
+-------+--------+-----------+---------+------------+
|   100 |     10 | Bill      | 10000.0 |       1000 |
|   110 |     10 | Theodore  | 11500.0 |        250 |
|   150 |     10 | Sebastian | 7000.0  |            |
+-------+--------+-----------+---------+------------+
(3 rows)

!ok
EnumerableSemiJoinRel(condition=[=($1, $5)], joinType=[inner])
  EnumerableTableAccessRel(table=[[hr, emps]])
  EnumerableCalcRel(expr#0..3=[{inputs}], expr#4=[true], $f01=[$t0], $f0=[$t4])
    EnumerableJoinRel(condition=[=($0, $1)], joinType=[inner])
      EnumerableAggregateRel(group=[{0}])
        EnumerableCalcRel(expr#0..4=[{inputs}], $f0=[$t1])
          EnumerableTableAccessRel(table=[[hr, emps]])
      EnumerableTableAccessRel(table=[[hr, depts]])

The generated code uses the new Enumerables.semiJoin function, which builds a 
HashSet of the keys the “inner” side (depts.deptno) then probes using a stream 
from the “outer” side (emps).

The other use of “depts” in the query plan is as a “value generator”. As 
https://issues.apache.org/jira/browse/OPTIQ-375 notes, the value generator is 
often not necessary.

Vladimir, 

Can you please log a JIRA case for your proposed “reverse” implementation of 
semi-join where you build the non-semi-join side? It would help if you add a 
cost function that would help us decide which implementation to use.

Julian

 
On Aug 11, 2014, at 1:48 PM, Vladimir Sitnikov <[email protected]> 
wrote:

>> Filter(Scan(emp), gender = ‘F’) is the “build” side, and you can imagine
> implementing it by populating a hash-table with distinct keys before
> starting to read from the “dept” table.
> 
> There is more than one way to skin a hash semi-join.
> 
> Imagine the following physical plan:
> 1) Scan(dept) first and build Map<deptno, List<dept.row>>
> 2) Perform Filter(Scan(emp), gender = ‘F’) and check if the join key is in
> memory. If found, remove the entry from the map and return rows. If not
> found, just ignore the row.
> 
> This can be more efficient in case dept is smaller than Filter(Scan(emp),
> gender = ‘F’).
> 
> I suggest the following:
> 1) Let the right side of the semi join be the one that is semi-joined (as
> in your example above)
> 2) "build" side (in your terms above) is more of a physical plan, thus it
> is irrelevant at the logical level (see below). For example, for enumerable
> convention we could have EnumerableHashJoinSemi (builds table over left
> relation) and EnumerableHashJoinSemiRight (builds table over right
> relation).
> 
> Vladimir

Reply via email to