Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13545 )

Change subject: IMPALA-8630: Hash the full path when calculating consistent 
remote placement
......................................................................


Patch Set 6:

(4 comments)

Tests look great, thanks for adding them. I had a few minor requests but can +2 
after you've addressed those.

http://gerrit.cloudera.org:8080/#/c/13545/6/be/src/scheduling/scheduler-test.cc
File be/src/scheduling/scheduler-test.cc:

http://gerrit.cloudera.org:8080/#/c/13545/6/be/src/scheduling/scheduler-test.cc@267
PS6, Line 267:     // Set of Impala hosts
Extra space indentation


http://gerrit.cloudera.org:8080/#/c/13545/6/be/src/scheduling/scheduler-test.cc@282
PS6, Line 282: /// overlapping by chance is extremely low, so this test should 
only fail if the
Can we control how the RNG used for the random replica selection is seeded? 
E.g. call srand() with a seed that we control before each run. Otherwise it 
might be hard to reproduce if we have occasional collisions (e.g. RNG isn't as 
good as we think and has more collisions than we'd expect).

This isn't a blocker if it turns out to be tricky, but it's a nice-to-have.


http://gerrit.cloudera.org:8080/#/c/13545/6/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/13545/6/common/thrift/PlanNodes.thrift@195
PS6, Line 195:   // hash of the partition's path
I think it would be useful to document how the hash should be computed (since 
we rely on it being computed the same way). I.e. that it's the standard Java 
String.hashCode()


http://gerrit.cloudera.org:8080/#/c/13545/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/13545/6/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@951
PS6, Line 951:         partition.getLocation().hashCode());
This was surprisingly subtle. I remembered that hashCode() was not guaranteed 
to be consistent across JVMs: 
https://martin.kleppmann.com/2012/06/18/java-hashcode-unsafe-for-distributed-systems.html.
 But it appears that the String hashCode is actually documented as being 
stable: 
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#hashCode--

I think it is worth documenting, but might be best to document in the thrift 
spec instead of here.



--
To view, visit http://gerrit.cloudera.org:8080/13545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b
Gerrit-Change-Number: 13545
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Wed, 19 Jun 2019 16:48:41 +0000
Gerrit-HasComments: Yes

Reply via email to