Boaz Ben-Zvi created DRILL-6444: ----------------------------------- Summary: Hash Join: Avoid partitioning when memory is sufficient Key: DRILL-6444 URL: https://issues.apache.org/jira/browse/DRILL-6444 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators Reporter: Boaz Ben-Zvi Assignee: Boaz Ben-Zvi
The Hash Join Spilling feature introduced partitioning (of the incoming build side) which adds some overhead (copying the incoming data, row by row). That happens even when no spilling is needed. Suggested optimization: Try reading the incoming build data without partitioning, while checking that enough memory is available. In case the whole build side (plus hash table) fits in memory - then continue like a "single partition". In case not, then need to partition the data read so far and continue as usual (with partitions). (See optimization 8.1 in the Hash Join Spill design document: [https://docs.google.com/document/d/1-c_oGQY4E5d58qJYv_zc7ka834hSaB3wDQwqKcMoSAI/edit] ) This is currently implemented only for the case of num_partitions = 1 (i.e, no spilling, and no memory checking). -- This message was sent by Atlassian JIRA (v7.6.3#76005)