Tim Armstrong has posted comments on this change. Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through ......................................................................
Patch Set 2: (6 comments) Need to look at fe and tests, but had some comments on the backend. http://gerrit.cloudera.org:8080/#/c/5816/2/be/src/exec/union-node.cc File be/src/exec/union-node.cc: PS2, Line 122: isPassThrough I think we would either pass child_idx_ into isPassThrough() or call it something like currentChildIsPassthrough() so that it's clear it refers to the child. Line 123: DCHECK(child(child_idx_)->row_desc().Equals(row_batch->row_desc())); Move this into the Open() branch so we don't execute it unnecessarily. Line 126: // We will not close any PassThrough children nodes here, they will be closed in We need to think about the implications of this. This could increase resource consumption quite a bit in some cases. E.g. if you had a union of N hash join nodes, each of which had a limit applied, this could increase resource consumption Nx. We need to fix the memory management model to allow early closing in cases like this, but that shouldn't block progress here. The workaround we generally use is to set the MarkNeedsDeepCopy() flag on the last row batch. This forces all operators to either finish processing the batch (in the common case) or copy the data (in the case of nested loop join). It generally works ok except in subplans, where it can cause performance problems by forcing many tiny batches. I think the best course would be to disable passthrough in subplans and set the MarkNeedsDeepCopy() flag on the last batch returned from each child. We do something like this in PartitionedAggregationNode::HandleOutputStrings(). I'm eventually going to get rid of MarkNeedsDeepCopy(), but using it here would flag it as one of the places we need to clean up. http://gerrit.cloudera.org:8080/#/c/5816/2/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: PS2, Line 438: verify Verify (caps) PS2, Line 442: std:: Don't need std::, we automatically import it in common/names.h. PS2, Line 442: Make these references with & to avoid copying the vector. -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes