Boaz Ben-Zvi created DRILL-5912:
-----------------------------------
Summary: Hash Join Enhancement: Avoid copying probe side values
Key: DRILL-5912
URL: https://issues.apache.org/jira/browse/DRILL-5912
Project: Apache Drill
Issue Type: Improvement
Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Boaz Ben-Zvi
Priority: Minor
When the Hash Join Operator (inner, or left outer) performs the "probe and
project" task, it copies each probe side values to be projected. Example:
{code}
public void projectProbeRecord(int probeIndex, int outIndex)
throws SchemaChangeException
{
{
vv15 .copyFromSafe((probeIndex), (outIndex), vv12);
}
{
vv21 .copyFromSafe((probeIndex), (outIndex), vv18);
}
}
{code}
In the case where there are no duplicate-key entries in the build side, and no
spilling took place, then each of the outer values is projected exactly once
(for left outer), or at most once (for inner join).
In such (common) cases, we could avoid the above copy, and just transfer the
value vectors as is (or add a Selection Vector 2 for the inner join, to
eliminate the unmatched entries).
This can be a significant performance enhancement, as copying each set of
values is much more expensive than transposing vectors (e.g., perform the copy
64K times, plus allocation of the vectors, and possible resizing for variable
sized types).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)