[ https://issues.apache.org/jira/browse/DRILL-7141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967208#comment-16967208 ]
Boaz Ben-Zvi commented on DRILL-7141: ------------------------------------- This enhancement requires the use of good statistics (on the probe-side), as spilling decisions (i.e., which partitions) happen during the build phase. > Hash-Join (and Agg) should always spill to disk the least used partition > ------------------------------------------------------------------------ > > Key: DRILL-7141 > URL: https://issues.apache.org/jira/browse/DRILL-7141 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators > Affects Versions: 1.15.0 > Reporter: Kunal Khatua > Assignee: Boaz Ben-Zvi > Priority: Major > > When the probe-side data for a hash join is skewed, it is preferable to have > the corresponding partition on the build side to be in memory. > Currently, with the spill-to-disk feature, the partition selected for > spilling to disk is done at random. This means that a highly skewed > probe-side data would also spill for lack of a corresponding hash table > partition in memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)