[jira] [Work logged] (HIVE-21912) Implement BlacklistingLlapMetricsListener

ASF GitHub Bot (JIRA) Thu, 04 Jul 2019 02:30:29 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-21912?focusedWorklogId=272075&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272075
 ]


ASF GitHub Bot logged work on HIVE-21912:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Jul/19 09:29
            Start Date: 04/Jul/19 09:29
    Worklog Time Spent: 10m 
      Work Description: pvary commented on pull request #698: HIVE-21912: 
Implement DisablingDaemonStatisticsHandler
URL: https://github.com/apache/hive/pull/698#discussion_r300313040
 
 

 ##########
 File path: 
llap-client/src/java/org/apache/hadoop/hive/llap/registry/impl/LlapZookeeperRegistryImpl.java
 ##########
 @@ -416,4 +423,65 @@ protected String getZkPathUser(Configuration conf) {
     // rather than relying on RegistryUtils.currentUser().
     return HiveConf.getVar(conf, ConfVars.LLAP_ZK_REGISTRY_USER, 
RegistryUtils.currentUser());
   }
+
+  /**
+   * Locks the Llap Cluster for configuration change by setting the next 
possible configuration
+   * change time. Until this time is reached the configuration should not be 
changed.
+   * @param nextMinConfigChangeTime The next time when the cluster can be 
reconfigured
+   * @return The result of the change (success if the lock is succeeded, and 
the next possible
+   * configuration change time
+   */
+  public ConfigChangeLockResult lockForConfigChange(long 
nextMinConfigChangeTime) {
+    try {
+      if (nextChangeTime == null) {
+        // Create the node with the 
/llap-sasl/hiveuser/hostname/config-change/next-change path without retry
+        nextChangeTime = new DistributedAtomicLong(zooKeeperClient,
+            String.join("/", workersPath.substring(0, 
workersPath.lastIndexOf('/')), CONFIG_CHANGE_PATH,
+                CONFIG_CHANGE_NODE), (i, j, sleeper) -> false);
+        nextChangeTime.initialize(0L);
+      }
+      AtomicValue<Long> current = nextChangeTime.get();
+      if (!current.succeeded()) {
+        LOG.debug("Can not get the current configuration lock time");
+        return new ConfigChangeLockResult(false, -1L);
+      }
+      if (current.postValue() >= nextMinConfigChangeTime) {
+        LOG.debug("Can not set {}. Current value is {}.", 
nextMinConfigChangeTime, current.postValue());
+        return new ConfigChangeLockResult(false, current.postValue());
+      }
+      current = nextChangeTime.compareAndSet(current.postValue(), 
nextMinConfigChangeTime);
+      if (!current.succeeded()) {
+        LOG.debug("Can not set {}. Current value is changed to {}.", 
nextMinConfigChangeTime, current.postValue());
+        return new ConfigChangeLockResult(false, current.postValue());
+      }
+      return new ConfigChangeLockResult(true, current.postValue());
+    } catch (Throwable t) {
+      LOG.info("Can not reserve configuration change lock", t);
+      return new ConfigChangeLockResult(false, -1L);
+    }
+  }
+
+  public static class ConfigChangeLockResult {
+    boolean success;
 
 Review comment:
   Yeah. Forget to finish my refactor.
   Thanks!
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 272075)
    Time Spent: 4.5h  (was: 4h 20m)

> Implement BlacklistingLlapMetricsListener
> -----------------------------------------
>
>                 Key: HIVE-21912
>                 URL: https://issues.apache.org/jira/browse/HIVE-21912
>             Project: Hive
>          Issue Type: Sub-task
>          Components: llap, Tez
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21912.patch, HIVE-21912.wip-2.patch, 
> HIVE-21912.wip.patch
>
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> We should implement a DaemonStatisticsHandler which:
>  * If a node average response time is bigger than 150% (configurable) of the 
> other nodes
>  * If the other nodes has enough empty executors to handle the requests
> Then disables the limping node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21912) Implement BlacklistingLlapMetricsListener

Reply via email to