Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/13545 )
Change subject: IMPALA-8630: Hash the full path when calculating consistent remote placement ...................................................................... Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/13545/6/be/src/scheduling/scheduler-test.cc File be/src/scheduling/scheduler-test.cc: http://gerrit.cloudera.org:8080/#/c/13545/6/be/src/scheduling/scheduler-test.cc@267 PS6, Line 267: // Set of Impala hosts > Extra space indentation Done http://gerrit.cloudera.org:8080/#/c/13545/6/be/src/scheduling/scheduler-test.cc@282 PS6, Line 282: /// overlapping by chance is extremely low, so this test should only fail if the > Can we control how the RNG used for the random replica selection is seeded? Good point. I was thinking the probability of overlap should be small enough not to matter (~10^-18 or so), but then I noticed that we do srand(0) in SchedulerTest constructor: https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler-test.cc#L31 The constructor runs for each test, so I guess these are deterministic. Changed this comment to indicate this is deterministic (which is fine, because these tests don't benefit much from actual randomness). http://gerrit.cloudera.org:8080/#/c/13545/6/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/13545/6/common/thrift/PlanNodes.thrift@195 PS6, Line 195: // hash of the partition's path > I think it would be useful to document how the hash should be computed (sin Added a comment about using Java String.hashCode() and that it needs to be consistent across processes/machines. http://gerrit.cloudera.org:8080/#/c/13545/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/13545/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@951 PS6, Line 951: partition.getLocation().hashCode()); > This was surprisingly subtle. I remembered that hashCode() was not guarante Good point, I added a comment in the thrift file. -- To view, visit http://gerrit.cloudera.org:8080/13545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b Gerrit-Change-Number: 13545 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Wed, 19 Jun 2019 18:03:11 +0000 Gerrit-HasComments: Yes