[jira] [Work logged] (HIVE-26947) Hive compactor.Worker can respawn connections to HMS at extremely high frequency

ASF GitHub Bot (Jira) Mon, 23 Jan 2023 05:22:52 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-26947?focusedWorklogId=841112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-841112
 ]


ASF GitHub Bot logged work on HIVE-26947:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Jan/23 13:19
            Start Date: 23/Jan/23 13:19
    Worklog Time Spent: 10m 
      Work Description: akshat0395 commented on code in PR #3955:
URL: https://github.com/apache/hive/pull/3955#discussion_r1084047469


##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -3215,6 +3215,10 @@ public static enum ConfVars {
         "Time in seconds after which a compaction job will be declared failed 
and the\n" +
         "compaction re-queued."),
 
+    HIVE_COMPACTOR_WORKER_SLEEP_TIME("hive.compactor.worker.sleep.time", 
"10000ms",

Review Comment:
   The reason why these needs to be configurable is so that these values can be 
tweaked while testing as well, there are testcases like these 
https://github.com/apache/hive/blob/2031af314e70f3b8e07add13cb65416c29956181/ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java#L1190
   Which simulates the timeout scenarios for worker and when worker goes in 
timeout we send the thread to sleep for long duration of time. To avoid sending 
threads to go to sleep for long duration while testing we set lower values for 
these in tests.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 841112)
    Time Spent: 5h  (was: 4h 50m)

> Hive compactor.Worker can respawn connections to HMS at extremely high 
> frequency
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-26947
>                 URL: https://issues.apache.org/jira/browse/HIVE-26947
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Akshat Mathur
>            Assignee: Akshat Mathur
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 5h
>  Remaining Estimate: 0h
>
> After catching the exception generated by the findNextCompactionAndExecute() 
> task, HS2 appears to immediately rerun the task with no delay or backoff.  As 
> a result there are ~3500 connection attempts from HS2 to HMS over just a 5 
> second period in the HS2 log
> The compactor.Worker should wait between failed attempts and maybe do an 
> exponential backoff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-26947) Hive compactor.Worker can respawn connections to HMS at extremely high frequency

Reply via email to