Tim Armstrong has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/14688 )
Change subject: IMPALA-9127: explicit probe state machine in hash join ...................................................................... IMPALA-9127: explicit probe state machine in hash join This refactors the main loop in PartitionedHashJoinNode::GetNext() to use an explicit state machine, rather than the hard-to-follow implicit state machine previously used. A new state variable 'probe_state_' is used to drive the loop, with DCHECKs enforcing invariants of other member variables. I deliberately tried to minimise changes to other functions (including any attempts to factor logic out of GetNext()) to minimise the scope of this patch. The new logic is mostly equivalent to the old logic, although there may be a different number of trips through the loop because of the way the cascading checks in the old version worked. A few notable changes: * DoneProbing() is consistently called when probing is finished, including in cases, like probing a single spilled partition, where it wasn't previously. * The repeated AtCapacity() checks are consolidated into a single check that happens at the end of the loop. Resources attached to batches should still be flushed at the appropriate points, since each previous "if (out_batch->AtCapacity()) break;" corresponds to a new loop iteration in the new code. * OutputNullAwareNullProbe() and OutputNullAwareProbeRows() now explicitly signal when they are done using an output argument, instead of implicitly via AtCapacity(), which is incredibly error-prone. Testing: We have adequate coverage for different join modes, including with spilling. * Ran exhaustive tests. * Ran a single node stress test with TPC-H and TPC-DS * Ran a single node stress test with a debug action to force spilling: --impalad_args="-default_query_options=DEBUG_ACTION=-1:OPEN:SET_DENY_RESERVATION_PROBABILITY@0.5" Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 --- M be/src/exec/partitioned-hash-join-node-ir.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/partitioned-hash-join-node.h 3 files changed, 329 insertions(+), 201 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/14688/5 -- To view, visit http://gerrit.cloudera.org:8080/14688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ebdf0054d2ce4562b851439e300323601fb064 Gerrit-Change-Number: 14688 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>