----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51312/#review147362 -----------------------------------------------------------
Possible to add a test (or a few)? (I don' see one in the patch attached to the jira). Question: How long does it take for a persistent node to go away - in case of a JVM crash / node crash. Is it possible that a new process starts up within the duration it takes for the old node-slot entry to be removed? (It gets added to the end as a result) In terms of a permanent cluster size reduction - I think we should at least take some steps to handle this scenario, or an equivalent scenario where it takes a long time (minutes) for a node to come back. We have a force locality scheduling mode, this will effectively cause new queries to start in a state where they cannot complete. Check for this and fail the query?, or fallback to a next node for the split. The fallback to the next node is something that is going to come in soon anyway, to control locality if nodes are not available (current is always random as against 'random' being a configurable option :) ) llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java (line 104) <https://reviews.apache.org/r/51312/#comment214497> Move this into a separate path altogether, instead of listing and filtering at the same path each time? llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java (line 110) <https://reviews.apache.org/r/51312/#comment214498> woekersPath is no long defined, so the comment can be deleted. llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java (line 533) <https://reviews.apache.org/r/51312/#comment214487> There's a lot of similar code to read records within a loop, and a method to read records. Move into a single method? Can be done in a separate jiras as well. llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java (line 547) <https://reviews.apache.org/r/51312/#comment214501> The size of this collection is used to determine the #knownWorkers in HostAffinitySplitLocationProvider. The size at the moment represents active nodes. I don't think the intent is to return a fewer number of nodes than what existed if a node goes down? 1) Return n entries, and one entry would indicate it's not active (and this will need to be acted upon by all clients 2) Add a getNumInstances itnerface, which can differ from actual nodes available, along with a getNodeAtLocation interface? llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java (line 554) <https://reviews.apache.org/r/51312/#comment214496> Here, as well as other places, a node is only available when it's slot has been registered. Otherwise it should not be visible to clients. llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java (line 577) <https://reviews.apache.org/r/51312/#comment214495> Don't send back a node until it's slot registration has completed. We don't know what position it will take. This logic puts such nodes at the end. llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java (line 51) <https://reviews.apache.org/r/51312/#comment214489> Are there tests in curator for the said node, which we could borrow parts from? llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java (line 128) <https://reviews.apache.org/r/51312/#comment214472> Nit: else { slots.add( ..) } Makes it a little more readable, and maintanable for future changes. llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java (line 138) <https://reviews.apache.org/r/51312/#comment214477> Potential divide by 0? (new cluster) What's the purpose of the 2.0f/approxWorkerCount ? ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestHostAffinitySplitLocationProvider.java (line 94) <https://reviews.apache.org/r/51312/#comment214490> This isn't part of the patch uploaded to jira? - Siddharth Seth On Aug. 23, 2016, 1:41 a.m., Sergey Shelukhin wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/51312/ > ----------------------------------------------------------- > > (Updated Aug. 23, 2016, 1:41 a.m.) > > > Review request for hive and Prasanth_J. > > > Repository: hive-git > > > Description > ------- > > see jira > > > Diffs > ----- > > > llap-client/src/java/org/apache/hadoop/hive/llap/registry/ServiceInstanceSet.java > 99ead9b > > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapFixedRegistryImpl.java > e9456f2 > > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java > 64d2617 > > llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/SlotZnode.java > PRE-CREATION > > llap-client/src/java/org/apache/hadoop/hive/llap/security/LlapTokenClient.java > 921e050 > > llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java > 17ce69b > > llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java > efd774d > > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HostAffinitySplitLocationProvider.java > c06499e > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/Utils.java 8a4fc08 > > ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestHostAffinitySplitLocationProvider.java > d98a5ff > > Diff: https://reviews.apache.org/r/51312/diff/ > > > Testing > ------- > > > Thanks, > > Sergey Shelukhin > >