Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows
......................................................................


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/5389/2/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

Line 378: 
> We don't even need to do this for many join modes. E.g. for INNER JOIN if t
Done


Line 387:     }
> The log message might be a bit confusing, since it's 
I added this mostly for testability, though obviously its not ideal to expose 
something in the profile just for testing that's not going to be useful to 
users.

I tried playing around with making a "NumHashTablesBuilt" counter or something 
that could be compared to the number of partitions, which we already have a 
counter for, but its tricky because some partitions won't have hash tables 
built (if they're too big and get repartitioned), and some will have their hash 
tables built twice (if they get evicted after building and before being 
processed). I didn't look at how hard it would be to add a 
"NumHashTablesRebuiltAfterBeingEvicted" counter.

Any ideas? Maybe just turning this into a counter like you say would be a 
little better.


Line 649: 
> Can you factor out the two cases into different functions?
Done


PS2, Line 651: // We 
> DCHECK_EQ
Done


Line 656:   }
> Can't we skip this check and just look at eos below? BTS::GetNext() has the
Done


Line 666:   // it is done by the loop in GetNext(), possibly with multiple 
calls to OutputAllBUild()
> Can we just save the batch iterator instead of output_unmatched_pos_?
Done


Line 671:   const int num_conjuncts = conjunct_ctxs_.size();
> This output row creation is kind of subtle and duplicated - factor out into
Done


Line 691:       }
> Why is this true?
I added a comment explaining this above the 
DCHECK_EQ(output_build_partitions_.size(), 1) above here which is enforcing the 
same condition.


http://gerrit.cloudera.org:8080/#/c/5389/2/be/src/exec/partitioned-hash-join-node.h
File be/src/exec/partitioned-hash-join-node.h:

PS2, Line 460: ransfer to output batch
> Normally we single-quote variable names in comments so they stand out a bit
Done


Line 464:   /// (i.e. the resulting joined row passes other_join_conjuncts).
> We should mention that it's a batch from output_build_partitions_.front()
Done


http://gerrit.cloudera.org:8080/#/c/5389/2/testdata/workloads/functional-query/queries/QueryTest/spilling.test
File testdata/workloads/functional-query/queries/QueryTest/spilling.test:

Line 631: ---- QUERY
> Can you comment which join modes these are testing?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to