Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows ......................................................................
Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/5389/2/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 378: > We don't even need to do this for many join modes. E.g. for INNER JOIN if t Done Line 387: } > The log message might be a bit confusing, since it's I added this mostly for testability, though obviously its not ideal to expose something in the profile just for testing that's not going to be useful to users. I tried playing around with making a "NumHashTablesBuilt" counter or something that could be compared to the number of partitions, which we already have a counter for, but its tricky because some partitions won't have hash tables built (if they're too big and get repartitioned), and some will have their hash tables built twice (if they get evicted after building and before being processed). I didn't look at how hard it would be to add a "NumHashTablesRebuiltAfterBeingEvicted" counter. Any ideas? Maybe just turning this into a counter like you say would be a little better. Line 649: > Can you factor out the two cases into different functions? Done PS2, Line 651: // We > DCHECK_EQ Done Line 656: } > Can't we skip this check and just look at eos below? BTS::GetNext() has the Done Line 666: // it is done by the loop in GetNext(), possibly with multiple calls to OutputAllBUild() > Can we just save the batch iterator instead of output_unmatched_pos_? Done Line 671: const int num_conjuncts = conjunct_ctxs_.size(); > This output row creation is kind of subtle and duplicated - factor out into Done Line 691: } > Why is this true? I added a comment explaining this above the DCHECK_EQ(output_build_partitions_.size(), 1) above here which is enforcing the same condition. http://gerrit.cloudera.org:8080/#/c/5389/2/be/src/exec/partitioned-hash-join-node.h File be/src/exec/partitioned-hash-join-node.h: PS2, Line 460: ransfer to output batch > Normally we single-quote variable names in comments so they stand out a bit Done Line 464: /// (i.e. the resulting joined row passes other_join_conjuncts). > We should mention that it's a batch from output_build_partitions_.front() Done http://gerrit.cloudera.org:8080/#/c/5389/2/testdata/workloads/functional-query/queries/QueryTest/spilling.test File testdata/workloads/functional-query/queries/QueryTest/spilling.test: Line 631: ---- QUERY > Can you comment which join modes these are testing? Done -- To view, visit http://gerrit.cloudera.org:8080/5389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes