Vladimir Sitnikov created OPTIQ-379:
---------------------------------------
Summary: Alternative implementation of semi-join: both sides
should be considered for building a map
Key: OPTIQ-379
URL: https://issues.apache.org/jira/browse/OPTIQ-379
Project: Optiq
Issue Type: New Feature
Reporter: Vladimir Sitnikov
Assignee: Julian Hyde
When implementing semi-join, one can build a map from either of two inputs (see
\[1\]).
In general it looks to be more efficient to build a map over a smaller input,
thus avoiding materialization of a large input.
Consider the following query:
{code:sql}select * from "hr"."emps"
where exists (
select 1 from "hr"."depts" where "depts"."deptno" = "emps"."deptno");{code}
There is a trade-off (assuming semi-join is used, assuming no spill-to-disk
happens):
1) If semi-join is implemented as BuildMap(Scan(depts)) and scan through emps,
the map will take {{count(distinct depts.deptno)\*(map_entry_overhead +
avg_size_of_deptno_column)}} bytes
2) If semi-join is implemented as BuildMap(Scan(emps)) and scan through depts,
then the map would take {{count(emps.\*)\*(map_entry_overhead +
avg_size_of_emps_row)}} bytes
The same applies to anti-joins.
\[1\]: [Semi-join
orientation|http://mail-archives.apache.org/mod_mbox/optiq-dev/201408.mbox/browser]
--
This message was sent by Atlassian JIRA
(v6.2#6252)