[jira] [Commented] (YARN-11622) ResourceManager asynchronous switch from Standy to Active exception
[ https://issues.apache.org/jira/browse/YARN-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801195#comment-17801195 ] ASF GitHub Bot commented on YARN-11622: --- hiwangzhihui commented on code in PR #6352: URL: https://github.com/apache/hadoop/pull/6352#discussion_r1438289028 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java: ## @@ -1118,38 +1124,25 @@ protected void serviceStop() throws Exception { } } -/** Review Comment: The StopBug warning is to expect RMFatalToStandbyRunner to submit and wait for its execution result. If waiting for the execution result synchronously results in a "cyclic dependency" issue; However, in the call method of TransitionToActiveStandbyRunnern, both execution results and exceptions have been uniformly processed and log printed. The RMFatalToStandbyRunner execution results only have two results: ① successful execution ② Execution exception failed, RM process exited. My opinion is this warning can be ignored in this scene, As adding a thread to wait for the result would be redundant. @slfan1989 How does view and handle this warning? I would like to hear your opinion again. > ResourceManager asynchronous switch from Standy to Active exception > --- > > Key: YARN-11622 > URL: https://issues.apache.org/jira/browse/YARN-11622 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha4, 3.1.1, 3.3.0 >Reporter: wangzhihui >Assignee: wangzhihui >Priority: Major > Labels: pull-request-available > Attachments: rm_ha_solution.png, yuque_diagram (1).jpg, > yuque_diagram.jpg > > > h1. Two exception cases: > h2. The first case: > *The exception desc:* > {code:java} > 14:52:57,426 FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(203)) > - Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.access$1200(ResourceManager.java:610) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.handleTransitionToStandByInNewThread(ResourceManager.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.access$1100(ResourceManager.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher.handle(ResourceManager.java:902) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher.handle(ResourceManager.java:892) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:748){{}} * {code} > > * ActiveStandbyElector and ZKRMStateStore triggered toStandy event at > 14:52:57, > Two asynchronous events are respectively referred to as Thread_ 1、Thread_ 2. > * As shown in the following figure, Thread_1 during the toStandby process , > reinitializes the activeServices to null. At this point, Thread_2 will use > the "activeServices" when executing the handleTransitionToStandByInNewThread > method ultimately resulting in a NullPointerException and the Reosurcemanager > server exit. > !yuque_diagram.jpg|width=629,height=100! > h2. The second case: > *The exception desc:* > {code:java} > 06:17:35,913 WARN ha.ActiveStandbyElector > (ActiveStandbyElector.java:becomeActive(900)) - Exception handling the > winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:543) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:558) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation > failed > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.ref
[jira] [Commented] (YARN-11622) ResourceManager asynchronous switch from Standy to Active exception
[ https://issues.apache.org/jira/browse/YARN-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801193#comment-17801193 ] ASF GitHub Bot commented on YARN-11622: --- hiwangzhihui commented on code in PR #6352: URL: https://github.com/apache/hadoop/pull/6352#discussion_r1438289028 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java: ## @@ -1118,38 +1124,25 @@ protected void serviceStop() throws Exception { } } -/** Review Comment: The SotopBug warning is to expect RMFatalToStandbyRunner to submit and wait for its execution result. If waiting for the execution result synchronously results in a "cyclic dependency" issue; However, in the call method of TransitionToActiveStandbyRunnern, both execution results and exceptions have been uniformly processed and log printed. The RMFatalToStandbyRunner execution results only have two results: ① successful execution ② Execution exception failed, RM process exited. My opinion is this warning can be ignored in this scene, As adding a thread to wait for the result would be redundant. @slfan1989 How does view and handle this warning? I would like to hear your opinion again. > ResourceManager asynchronous switch from Standy to Active exception > --- > > Key: YARN-11622 > URL: https://issues.apache.org/jira/browse/YARN-11622 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha4, 3.1.1, 3.3.0 >Reporter: wangzhihui >Assignee: wangzhihui >Priority: Major > Labels: pull-request-available > Attachments: rm_ha_solution.png, yuque_diagram (1).jpg, > yuque_diagram.jpg > > > h1. Two exception cases: > h2. The first case: > *The exception desc:* > {code:java} > 14:52:57,426 FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(203)) > - Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.access$1200(ResourceManager.java:610) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.handleTransitionToStandByInNewThread(ResourceManager.java:941) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.access$1100(ResourceManager.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher.handle(ResourceManager.java:902) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher.handle(ResourceManager.java:892) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:748){{}} * {code} > > * ActiveStandbyElector and ZKRMStateStore triggered toStandy event at > 14:52:57, > Two asynchronous events are respectively referred to as Thread_ 1、Thread_ 2. > * As shown in the following figure, Thread_1 during the toStandby process , > reinitializes the activeServices to null. At this point, Thread_2 will use > the "activeServices" when executing the handleTransitionToStandByInNewThread > method ultimately resulting in a NullPointerException and the Reosurcemanager > server exit. > !yuque_diagram.jpg|width=629,height=100! > h2. The second case: > *The exception desc:* > {code:java} > 06:17:35,913 WARN ha.ActiveStandbyElector > (ActiveStandbyElector.java:becomeActive(900)) - Exception handling the > winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:543) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:558) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation > failed > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.re
[jira] [Commented] (YARN-11637) Improve PolicyGenerator readability and Support FairScheduler
[ https://issues.apache.org/jira/browse/YARN-11637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801184#comment-17801184 ] ASF GitHub Bot commented on YARN-11637: --- hadoop-yetus commented on PR #6389: URL: https://github.com/apache/hadoop/pull/6389#issuecomment-1872116166 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 17m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 17s | | trunk passed | | +1 :green_heart: | compile | 0m 26s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 25s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 25s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 29s | | trunk passed | | +1 :green_heart: | javadoc | 0m 32s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 25s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 0m 45s | | trunk passed | | +1 :green_heart: | shadedclient | 32m 8s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 18s | | the patch passed | | +1 :green_heart: | compile | 0m 18s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 18s | | the patch passed | | +1 :green_heart: | compile | 0m 17s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 17s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 13s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 20s | | the patch passed | | +1 :green_heart: | javadoc | 0m 20s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 18s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 0m 44s | | the patch passed | | +1 :green_heart: | shadedclient | 31m 57s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 0m 51s | | hadoop-yarn-server-globalpolicygenerator in the patch passed. | | +1 :green_heart: | asflicense | 0m 35s | | The patch does not generate ASF License warnings. | | | | 134m 28s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6389/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6389 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux b3c1746b8b56 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 162d770296b6f41420dcc805bfc99fdb4b8b46e1 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6389/2/testReport/ | | Max. process+thread count | 706 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-globalpolicygenerator | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6389/2/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.