Dan Hecht has posted comments on this change.

Change subject: IMPALA-3567 Part 2, IMPALA-3899: factor out PHJ builder
......................................................................


Patch Set 14:

(25 comments)

Here's a first set of comments, mostly focused on phjn.h.  I need to go back 
through phjn.cc, builder.h/cc.

http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:

Line 207:         state, Substitute(PREPARE_FAILED_ERROR_MSG, "write"));
old code and agg uses state_->block_mgr()->MemLimitTooLowError().  Any reason 
this uses MemLimitExceeded() directly instead? Just wondering the reasoning.


http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

Line 79:   /// Logically, the algorithm has the following modes:.
I think this comment could be a lot more useful if we explain how the phases 
work together to give the full algorithm.  Probably that could replace the 
state transition graph which I find confusing.


PS14, Line 81: partition them into hash_partitions_. The input can 
this comment is kind of disjoint since it talks about what the phase does, then 
where the input comes from, and then explains more about what the phase does.  
Let's format each phase consistently, maybe describe inputs first, then 
actions, then outputs?


PS14, Line 86: PROCESSING
why do we use PROCESSING term here rather than PARTITIONING, which would be 
more consistent?


PS14, Line 87: spilled
single spilled
(or remove single from #3 since having it in one place but not the other makes 
the reader wonder if there's a difference).


Line 89:   ///      table in memory, we perform the join, otherwise we spill 
the probe row.
this last sentence is confusing since it mostly restates what the first says 
adds a small amount of new info. Let's combine them.


PS14, Line 90: PROBING_SPILLED_PARTITION
it would be good to be more upfront about when this phase is used and how, for 
a spilled partition, we either do this phase or REPARTITIONING_PROBE phase.


PS14, Line 90: construct
what does this phase construct? isn't the hash table already constructed?


PS14, Line 91: walking
what does this mean? processing? 

also, from these comments it's not clear how these stages relate. You could 
read this to mean that #2 a fallback if we can't do #3, or that #3 is the 
probing that "perform the join" of #2.


PS14, Line 97: *
what does this star mean to say? We can go from REPARTITIONING_PROBE back to 
PROBING_SPILLED_PARTITION, right?


PS14, Line 143: buffer
write buffer?


PS14, Line 264: the
its


PS14, Line 267: the
that


PS14, Line 291: Walks
What does this mean. Iterates over?


PS14, Line 291: hash partitions
is this talking about the probe hash partitions or the builder's?


PS14, Line 384: builder_->hash_partitions
is this referring to the entries in the builder_->hash_partition_ array?


PS14, Line 385: This is not used when processing a single partition.
what does this mean?


PS14, Line 410:  we then iterate over all the probe rows
I can't tell if this means literally all the probe rows, or all the rows in 
this stream.  clarify this paragraph.


PS14, Line 432: and
this makes it sound like there are two conditions required to create a probe 
partition, but I think you mean this is implied by the first part.  "because"?


PS14, Line 436: preallocated
what does this mean?


PS14, Line 437: and
double and.  And is this saying the constructor prepares it or the caller 
should have already?  is this right:

... should be an unpinned stream that has been prepared for writing with ...


PS14, Line 443: should
will


PS14, Line 464: meaning
              :     /// it has to call Close() on it) but transferred to the 
parent exec node
this doesn't seem right. don't we either Close() it or transfer it (not both), 
in PartitionedHashJoinNode::ProbePartition::Close()?


http://gerrit.cloudera.org:8080/#/c/3873/14/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

PS14, Line 409: DCHECK
DCHECK_EQ


Line 412:   children_.insert(children_.begin(), child_entry);
rather than having two special cases (here and line 395-399, how about making 
AddChildLocked() take the position iterator? This case passes begin() and then 
AddChild() case passes either end() or the middle iterator.


-- 
To view, visit http://gerrit.cloudera.org:8080/3873
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e02ea9c7a7d1a0f373b11aa06c3237e1c7bd4cb
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to