Dan Hecht has posted comments on this change. Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder ......................................................................
Patch Set 14: (25 comments) Here's a first set of comments, mostly focused on phjn.h. I need to go back through phjn.cc, builder.h/cc. http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/analytic-eval-node.cc File be/src/exec/analytic-eval-node.cc: Line 207: state, Substitute(PREPARE_FAILED_ERROR_MSG, "write")); old code and agg uses state_->block_mgr()->MemLimitTooLowError(). Any reason this uses MemLimitExceeded() directly instead? Just wondering the reasoning. http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: Line 79: /// Logically, the algorithm has the following modes:. I think this comment could be a lot more useful if we explain how the phases work together to give the full algorithm. Probably that could replace the state transition graph which I find confusing. PS14, Line 81: partition them into hash_partitions_. The input can this comment is kind of disjoint since it talks about what the phase does, then where the input comes from, and then explains more about what the phase does. Let's format each phase consistently, maybe describe inputs first, then actions, then outputs? PS14, Line 86: PROCESSING why do we use PROCESSING term here rather than PARTITIONING, which would be more consistent? PS14, Line 87: spilled single spilled (or remove single from #3 since having it in one place but not the other makes the reader wonder if there's a difference). Line 89: /// table in memory, we perform the join, otherwise we spill the probe row. this last sentence is confusing since it mostly restates what the first says adds a small amount of new info. Let's combine them. PS14, Line 90: PROBING_SPILLED_PARTITION it would be good to be more upfront about when this phase is used and how, for a spilled partition, we either do this phase or REPARTITIONING_PROBE phase. PS14, Line 90: construct what does this phase construct? isn't the hash table already constructed? PS14, Line 91: walking what does this mean? processing? also, from these comments it's not clear how these stages relate. You could read this to mean that #2 a fallback if we can't do #3, or that #3 is the probing that "perform the join" of #2. PS14, Line 97: * what does this star mean to say? We can go from REPARTITIONING_PROBE back to PROBING_SPILLED_PARTITION, right? PS14, Line 143: buffer write buffer? PS14, Line 264: the its PS14, Line 267: the that PS14, Line 291: Walks What does this mean. Iterates over? PS14, Line 291: hash partitions is this talking about the probe hash partitions or the builder's? PS14, Line 384: builder_->hash_partitions is this referring to the entries in the builder_->hash_partition_ array? PS14, Line 385: This is not used when processing a single partition. what does this mean? PS14, Line 410: we then iterate over all the probe rows I can't tell if this means literally all the probe rows, or all the rows in this stream. clarify this paragraph. PS14, Line 432: and this makes it sound like there are two conditions required to create a probe partition, but I think you mean this is implied by the first part. "because"? PS14, Line 436: preallocated what does this mean? PS14, Line 437: and double and. And is this saying the constructor prepares it or the caller should have already? is this right: ... should be an unpinned stream that has been prepared for writing with ... PS14, Line 443: should will PS14, Line 464: meaning : /// it has to call Close() on it) but transferred to the parent exec node this doesn't seem right. don't we either Close() it or transfer it (not both), in PartitionedHashJoinNode::ProbePartition::Close()? http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: PS14, Line 409: DCHECK DCHECK_EQ Line 412: children_.insert(children_.begin(), child_entry); rather than having two special cases (here and line 395-399, how about making AddChildLocked() take the position iterator? This case passes begin() and then AddChild() case passes either end() or the middle iterator. -- To view, visit http://gerrit.cloudera.org:8080/3873 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes