Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13545 )
Change subject: IMPALA-8630: Hash the full path when calculating consistent remote placement ...................................................................... Patch Set 6: (4 comments) Tests look great, thanks for adding them. I had a few minor requests but can +2 after you've addressed those. http://gerrit.cloudera.org:8080/#/c/13545/6/be/src/scheduling/scheduler-test.cc File be/src/scheduling/scheduler-test.cc: http://gerrit.cloudera.org:8080/#/c/13545/6/be/src/scheduling/scheduler-test.cc@267 PS6, Line 267: // Set of Impala hosts Extra space indentation http://gerrit.cloudera.org:8080/#/c/13545/6/be/src/scheduling/scheduler-test.cc@282 PS6, Line 282: /// overlapping by chance is extremely low, so this test should only fail if the Can we control how the RNG used for the random replica selection is seeded? E.g. call srand() with a seed that we control before each run. Otherwise it might be hard to reproduce if we have occasional collisions (e.g. RNG isn't as good as we think and has more collisions than we'd expect). This isn't a blocker if it turns out to be tricky, but it's a nice-to-have. http://gerrit.cloudera.org:8080/#/c/13545/6/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/13545/6/common/thrift/PlanNodes.thrift@195 PS6, Line 195: // hash of the partition's path I think it would be useful to document how the hash should be computed (since we rely on it being computed the same way). I.e. that it's the standard Java String.hashCode() http://gerrit.cloudera.org:8080/#/c/13545/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/13545/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@951 PS6, Line 951: partition.getLocation().hashCode()); This was surprisingly subtle. I remembered that hashCode() was not guaranteed to be consistent across JVMs: https://martin.kleppmann.com/2012/06/18/java-hashcode-unsafe-for-distributed-systems.html. But it appears that the String hashCode is actually documented as being stable: https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#hashCode-- I think it is worth documenting, but might be best to document in the thrift spec instead of here. -- To view, visit http://gerrit.cloudera.org:8080/13545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b Gerrit-Change-Number: 13545 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Wed, 19 Jun 2019 16:48:41 +0000 Gerrit-HasComments: Yes