[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876662#comment-16876662 ] Hadoop QA commented on YARN-9655: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-3.0 Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 8m 45s{color} | {color:red} root in branch-3.0 failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} branch-3.0 passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 15s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 47s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in branch-3.0 has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} branch-3.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 9s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 36s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 48m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:e402791 | | JIRA Issue | YARN-9655 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12973368/YARN-9655.branch-3.0.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ac5568e1b493 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.0 / 9daa45f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/24339/artifact/out/branch-mvninstall-root.txt | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/24339/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html | | Test Results |
[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876635#comment-16876635 ] hunshenshi commented on YARN-9655: -- I upload patch for branch-2.9 and branch-3.0,please review. Thanks [~cheersyang] > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9655.branch-2.9.patch, YARN-9655.branch-3.0.patch > > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi updated YARN-9655: - Attachment: YARN-9655.branch-3.0.patch > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9655.branch-2.9.patch, YARN-9655.branch-3.0.patch > > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876632#comment-16876632 ] Weiwei Yang commented on YARN-9655: --- Thanks [~hunhun], re-opened the issue to trigger jenkins job. > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9655.branch-2.9.patch > > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reopened YARN-9655: --- > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9655.branch-2.9.patch > > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi updated YARN-9655: - Attachment: YARN-9655.branch-2.9.patch > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9655.branch-2.9.patch > > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876626#comment-16876626 ] hunshenshi commented on YARN-9655: -- OK,I will check > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876620#comment-16876620 ] Weiwei Yang commented on YARN-9655: --- I just pushed this to trunk, cherry-picked to branch-3.2, branch-3.1. Thanks for the contribution [~hunhun]. FederationInterceptor was added in 2.9, does this issue also exist in branch-2.9 and branch-3.0? If they do, then we need to provide a patch for branch-2.9, branch-2 and branch-3.0. [~hunhun], please let me know, thanks. > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-9655: -- Fix Version/s: 3.1.3 > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9601) Potential NPE in ZookeeperFederationStateStore#getPoliciesConfigurations
[ https://issues.apache.org/jira/browse/YARN-9601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi reassigned YARN-9601: Assignee: hunshenshi > Potential NPE in ZookeeperFederationStateStore#getPoliciesConfigurations > > > Key: YARN-9601 > URL: https://issues.apache.org/jira/browse/YARN-9601 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > > Potential NPE in ZookeeperFederationStateStore#getPoliciesConfigurations > The code of ZookeeperFederationStateStore#getPoliciesConfigurations > {code:java} > for (String child : zkManager.getChildren(policiesZNode)) { > SubClusterPolicyConfiguration policy = getPolicy(child); > result.add(policy); > } > {code} > The result of `getPolicy` may be null, so policy should be checked > The new code > {code:java} > for (String child : zkManager.getChildren(policiesZNode)) { > SubClusterPolicyConfiguration policy = getPolicy(child); > // policy maybe null, should check > if (policy == null) { > LOG.warn("Policy for queue: {} does not exist.", child); > continue; > } > result.add(policy); > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9643) Federation: Add subClusterID in nodes page of Router web
[ https://issues.apache.org/jira/browse/YARN-9643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi reassigned YARN-9643: Assignee: hunshenshi > Federation: Add subClusterID in nodes page of Router web > > > Key: YARN-9643 > URL: https://issues.apache.org/jira/browse/YARN-9643 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Attachments: nodes.png > > > In nodes page of router web, there only are node info, No cluster id > corresponding to the node. > [http://127.0.0.1:8089/cluster/nodes|http://192.168.169.72:8089/cluster/nodes] > !nodes.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-9655: -- Fix Version/s: 3.2.1 > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0, 3.2.1 > > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang resolved YARN-9655. --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0 > > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority
[ https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876596#comment-16876596 ] Hudson commented on YARN-9655: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16849 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16849/]) YARN-9655. AllocateResponse in FederationInterceptor lost (wwei: rev 570eee30e5ab5cf37b1a758934987cbf61140f6a) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/FederationInterceptor.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestFederationInterceptor.java > AllocateResponse in FederationInterceptor lost applicationPriority > --- > > Key: YARN-9655 > URL: https://issues.apache.org/jira/browse/YARN-9655 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > > In YARN Federation mode using FederationInterceptor, when submitting > application, am will report an error. > {code:java} > 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. > java.lang.NullPointerException at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that applicationPriority is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9662) Preemption not working on NodeLabels
[ https://issues.apache.org/jira/browse/YARN-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amithsha updated YARN-9662: --- Description: Preemption on node labels is not working when the utilization is 100%. Example adhocp0,adhocp1,adhocp3 Queues mapped to nodelabels of label_adhoc_nm. With a share of 60,30,10 as actual capacity and 100 as maximum capacity for all. When a jobA on adhocp3 consumes 100% of its maximum capacity and a jobB submitted on adhocp0 no containers running on adhocp3 queue got preempted. This is already reported by another user https://issues.apache.org/jira/browse/YARN-7685 Note : Jobs with more than actual capacity and less than the maximum capacity are able to preempt the containers. was: Preemption on node labels is not working at 100% utilisation. Example adhocp0,adhocp1,adhocp3 mapped to nodelabels of label_adhoc_nm. With a share of 60,30,10 as actual capacity and 100 as maximum capacity for all. When a jobA on adhocp3 consumes 100% of its maximum capacity and a jobB submitted on adhocp0 no containers running on adhocp3 got preempted. This is already reported by another user https://issues.apache.org/jira/browse/YARN-7685 > Preemption not working on NodeLabels > > > Key: YARN-9662 > URL: https://issues.apache.org/jira/browse/YARN-9662 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Amithsha >Priority: Major > > Preemption on node labels is not working when the utilization is 100%. > Example > adhocp0,adhocp1,adhocp3 Queues mapped to nodelabels of label_adhoc_nm. > With a share of 60,30,10 as actual capacity and 100 as maximum capacity for > all. > When a jobA on adhocp3 consumes 100% of its maximum capacity and a jobB > submitted on adhocp0 no containers running on adhocp3 queue got preempted. > > This is already reported by another user > https://issues.apache.org/jira/browse/YARN-7685 > Note : > Jobs with more than actual capacity and less than the maximum capacity are > able to preempt the containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9662) Preemption not working on NodeLabels
Amithsha created YARN-9662: -- Summary: Preemption not working on NodeLabels Key: YARN-9662 URL: https://issues.apache.org/jira/browse/YARN-9662 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.9.0 Reporter: Amithsha Preemption on node labels is not working at 100% utilisation. Example adhocp0,adhocp1,adhocp3 mapped to nodelabels of label_adhoc_nm. With a share of 60,30,10 as actual capacity and 100 as maximum capacity for all. When a jobA on adhocp3 consumes 100% of its maximum capacity and a jobB submitted on adhocp0 no containers running on adhocp3 got preempted. This is already reported by another user https://issues.apache.org/jira/browse/YARN-7685 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9473) [Umbrella] Support Vector Engine ( a new accelerator hardware) based on pluggable device framework
[ https://issues.apache.org/jira/browse/YARN-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved YARN-9473. Resolution: Fixed Fix Version/s: 3.3.0 > [Umbrella] Support Vector Engine ( a new accelerator hardware) based on > pluggable device framework > -- > > Key: YARN-9473 > URL: https://issues.apache.org/jira/browse/YARN-9473 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Zhankun Tang >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0 > > > As the heterogeneous computation trend rises, new acceleration hardware like > GPU, FPGA is used to satisfy various requirements. > And a new hardware Vector Engine (VE) which released by NEC company is > another example. The VE is like GPU but has different characteristics. It's > suitable for machine learning and HPC due to better memory bandwidth and no > PCIe bottleneck. > Please Check here for more VE details: > [https://www.nextplatform.com/2017/11/22/deep-dive-necs-aurora-vector-engine/] > [https://www.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf] > As we know, YARN-8851 is a pluggable device framework which provides an easy > way to develop a plugin for such new accelerators. This JIRA proposes to > develop a new VE plugin based on that framework and be implemented as current > GPU's "NvidiaGPUPluginForRuntimeV2" plugin. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9473) [Umbrella] Support Vector Engine ( a new accelerator hardware) based on pluggable device framework
[ https://issues.apache.org/jira/browse/YARN-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876454#comment-16876454 ] Peter Bacsko commented on YARN-9473: Subtasks have been committed to trunk - closing this ticket. > [Umbrella] Support Vector Engine ( a new accelerator hardware) based on > pluggable device framework > -- > > Key: YARN-9473 > URL: https://issues.apache.org/jira/browse/YARN-9473 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Zhankun Tang >Assignee: Peter Bacsko >Priority: Major > > As the heterogeneous computation trend rises, new acceleration hardware like > GPU, FPGA is used to satisfy various requirements. > And a new hardware Vector Engine (VE) which released by NEC company is > another example. The VE is like GPU but has different characteristics. It's > suitable for machine learning and HPC due to better memory bandwidth and no > PCIe bottleneck. > Please Check here for more VE details: > [https://www.nextplatform.com/2017/11/22/deep-dive-necs-aurora-vector-engine/] > [https://www.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf] > As we know, YARN-8851 is a pluggable device framework which provides an easy > way to develop a plugin for such new accelerators. This JIRA proposes to > develop a new VE plugin based on that framework and be implemented as current > GPU's "NvidiaGPUPluginForRuntimeV2" plugin. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9660) Enhance documentation of Docker on YARN support
[ https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko reassigned YARN-9660: -- Assignee: Peter Bacsko > Enhance documentation of Docker on YARN support > --- > > Key: YARN-9660 > URL: https://issues.apache.org/jira/browse/YARN-9660 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, nodemanager >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Right now, using Docker on YARN has some hard requirements. If these > requirements are not met, then launching the containers will fail and and > error message will be printed. Depending on how familiar the user is with > Docker, it might or might not be easy for them to understand what went wrong > and how to fix the underlying problem. > It would be important to explicitly document these requirements along with > the error messages. > *#1: CGroups handler cannot be systemd* > If docker deamon runs with systemd cgroups handler, we receive the following > error upon launching a container: > {noformat} > Container id: container_1561638268473_0006_01_02 > Exit code: 7 > Exception message: Launch container failed > Shell error output: /usr/bin/docker-current: Error response from daemon: > cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". > See '/usr/bin/docker-current run --help'. > Shell output: main : command provided 4 > main : run as user is johndoe > main : requested yarn user is johndoe > {noformat} > Solution: switch to cgroupfs. Doing so can be OS-specific, but we can > document a {{systemcl}} example. > > *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container* > Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. > It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and > there's only {{/bin/sh}}. > If we try to use these kind of images, we'll see the following error message: > {noformat} > Container id: container_1561638268473_0015_01_02 > Exit code: 7 > Exception message: Launch container failed > Shell error output: /usr/bin/docker-current: Error response from daemon: oci > runtime error: container_linux.go:235: starting container process caused > "exec: \"bash\": executable file not found in $PATH". > Shell output: main : command provided 4 > main : run as user is johndoe > main : requested yarn user is johndoe > {noformat} > > *#3: {{find}} command must be available on the {{$PATH}}* > It seems obvious that we have the {{find}} command, but even very popular > images like {{fedora}} requires that we install it separately. > If we don't have {{find}} available, then {{launcher_container.sh}} fails > with: > {noformat} > [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. > Error file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: > line 44: find: command not found > Last 4096 bytes of stderr.txt : > [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. > Error file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: > line 44: find: command not found > Last 4096 bytes of stderr.txt : > {noformat} > *#4 Add cmd-line example of how to tag local images* > This is actually documented under "Privileged Container Security > Consideration", but an one-liner would be helpful. I had trouble running a > local docker image and tagging it appropriately. Just an example like > {{docker tag local_ubuntu local/ubuntu:latest}} is already very informative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9661) Fix typos in LocalityMulticastAMRMProxyPolicy and AbstractConfigurableFederationPolicy
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876403#comment-16876403 ] Hudson commented on YARN-9661: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16844 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16844/]) YARN-9661:Fix typo in LocalityMulticastAMRMProxyPolicy.java and (elgoiri: rev b1dafc3506de4bb827138493d5cc25da704f5609) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/AbstractConfigurableFederationPolicy.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/amrmproxy/LocalityMulticastAMRMProxyPolicy.java > Fix typos in LocalityMulticastAMRMProxyPolicy and > AbstractConfigurableFederationPolicy > -- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0 > > > There are some typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9660) Enhance documentation of Docker on YARN support
[ https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876401#comment-16876401 ] Eric Yang commented on YARN-9660: - 1. +1 for systemctl document addition. +1 for producing more user friendly error message from container-executor in a separate ticket. 2 and 3. bash and find are required when running docker without ENTRYPOINT support. I think this issue can be resolved if YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE is default to true. However, I also understand the current default is for making Docker container to be more like Yarn container that allow existing BigData workload to run without modification. The documentation can probably explain the reason that bash and find are required for Yarn container, but optional when entrypoint mode is activated. 4. +1 for the current proposal. > Enhance documentation of Docker on YARN support > --- > > Key: YARN-9660 > URL: https://issues.apache.org/jira/browse/YARN-9660 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, nodemanager >Reporter: Peter Bacsko >Priority: Major > > Right now, using Docker on YARN has some hard requirements. If these > requirements are not met, then launching the containers will fail and and > error message will be printed. Depending on how familiar the user is with > Docker, it might or might not be easy for them to understand what went wrong > and how to fix the underlying problem. > It would be important to explicitly document these requirements along with > the error messages. > *#1: CGroups handler cannot be systemd* > If docker deamon runs with systemd cgroups handler, we receive the following > error upon launching a container: > {noformat} > Container id: container_1561638268473_0006_01_02 > Exit code: 7 > Exception message: Launch container failed > Shell error output: /usr/bin/docker-current: Error response from daemon: > cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". > See '/usr/bin/docker-current run --help'. > Shell output: main : command provided 4 > main : run as user is johndoe > main : requested yarn user is johndoe > {noformat} > Solution: switch to cgroupfs. Doing so can be OS-specific, but we can > document a {{systemcl}} example. > > *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container* > Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. > It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and > there's only {{/bin/sh}}. > If we try to use these kind of images, we'll see the following error message: > {noformat} > Container id: container_1561638268473_0015_01_02 > Exit code: 7 > Exception message: Launch container failed > Shell error output: /usr/bin/docker-current: Error response from daemon: oci > runtime error: container_linux.go:235: starting container process caused > "exec: \"bash\": executable file not found in $PATH". > Shell output: main : command provided 4 > main : run as user is johndoe > main : requested yarn user is johndoe > {noformat} > > *#3: {{find}} command must be available on the {{$PATH}}* > It seems obvious that we have the {{find}} command, but even very popular > images like {{fedora}} requires that we install it separately. > If we don't have {{find}} available, then {{launcher_container.sh}} fails > with: > {noformat} > [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. > Error file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: > line 44: find: command not found > Last 4096 bytes of stderr.txt : > [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. > Error file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: > line 44: find: command not found > Last 4096 bytes of stderr.txt : > {noformat} > *#4 Add cmd-line example of how to tag local images* > This is actually documented under "Privileged Container Security > Consideration", but an one-liner would be helpful. I had trouble running a > local docker image and tagging it appropriately. Just an example like > {{docker tag local_ubuntu local/ubuntu:latest}} is already very informative. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9661) Fix typos in LocalityMulticastAMRMProxyPolicy and AbstractConfigurableFederationPolicy
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876384#comment-16876384 ] Íñigo Goiri commented on YARN-9661: --- Merged the PR. > Fix typos in LocalityMulticastAMRMProxyPolicy and > AbstractConfigurableFederationPolicy > -- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0 > > > There are some typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9661) Fix typos in LocalityMulticastAMRMProxyPolicy and AbstractConfigurableFederationPolicy
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri resolved YARN-9661. --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 > Fix typos in LocalityMulticastAMRMProxyPolicy and > AbstractConfigurableFederationPolicy > -- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > Fix For: 3.3.0 > > > There are some typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9661) Fix typos in LocalityMulticastAMRMProxyPolicy and AbstractConfigurableFederationPolicy
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-9661: -- Summary: Fix typos in LocalityMulticastAMRMProxyPolicy and AbstractConfigurableFederationPolicy (was: Fix typos in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy) > Fix typos in LocalityMulticastAMRMProxyPolicy and > AbstractConfigurableFederationPolicy > -- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > > There are some typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9661) Fix typos in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-9661: -- Summary: Fix typos in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy (was: Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy) > Fix typos in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy > --- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > > There are some typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-9661: -- Summary: Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy (was: Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java) > Fix typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy > -- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > > There are some typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9480) createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl
[ https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876210#comment-16876210 ] Zhankun Tang commented on YARN-9480: [~yoelee], added [~Yunyao Zhang]. Thanks [~Weiwei Yang] ! > createAppDir() in LogAggregationService shouldn't block dispatcher thread of > ContainerManagerImpl > - > > Key: YARN-9480 > URL: https://issues.apache.org/jira/browse/YARN-9480 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: liyakun >Assignee: liyakun >Priority: Major > > At present, when startContainers(), if NM does not contain the application, > it will enter the step of INIT_APPLICATION. In the application init step, > createAppDir() will be executed, and it is a blocking operation. > createAppDir() is an operation that needs to interact with an external file > system. This operation is affected by the SLA of the external file system. > Once the external file system has a high latency, the NM dispatcher thread of > ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM > stuck here for more than an hour.) > I think it would be more reasonable to move createAppDir() to the actual time > of uploading log (in other threads). And according to the logRetentionPolicy, > many of the containers may not get to this step, which will save a lot of > interactions with external file system. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876205#comment-16876205 ] Hadoop QA commented on YARN-9661: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 8s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 36s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 47s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1042/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1042 | | JIRA Issue | YARN-9661 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 222a3b80dc9c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 1e727cf | | Default Java | 1.8.0_212 | | Test Results | https://builds.apache.org/job/hadoop-multibranch/job/PR-1042/1/testReport/ | | Max. process+thread count | 412 (vs. ulimit of 5500) | | modules | C:
[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL
[ https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876199#comment-16876199 ] Hadoop QA commented on YARN-9629: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 382 unchanged - 1 fixed = 382 total (was 383) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 48s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 0s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}120m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9629 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12973335/YARN-9629.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux fbbac63a0bcb 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64
[jira] [Commented] (YARN-9250) hadoop-yarn-server-nodemanager build failed: make failed with error code 2
[ https://issues.apache.org/jira/browse/YARN-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876185#comment-16876185 ] hunshenshi commented on YARN-9250: -- [~linlong] you can use -X see more error info, maybe it will help you. > hadoop-yarn-server-nodemanager build failed: make failed with error code 2 > -- > > Key: YARN-9250 > URL: https://issues.apache.org/jira/browse/YARN-9250 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: charlie mao >Priority: Blocker > > when i compile hadoop-3.2.0 release,i encountered the following errors: > [ERROR] Failed to execute goal > org.apache.hadoop:hadoop-maven-plugins:3.2.0:cmake-compile (cmake-compile) on > project hadoop-yarn-server-nodemanager: make failed with error code 2 -> > [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.hadoop:hadoop-maven-plugins:3.2.0:cmake-compile > (cmake-compile) on project hadoop-yarn-server-nodemanager: make failed with > error code 2 > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:212) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) > at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) > at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) > at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307) > at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193) > at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106) > at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863) > at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288) > at org.apache.maven.cli.MavenCli.main(MavenCli.java:199) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) > at > org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) > at > org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) > at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) > Caused by: org.apache.maven.plugin.MojoExecutionException: make failed with > error code 2 > at > org.apache.hadoop.maven.plugin.cmakebuilder.CompileMojo.runMake(CompileMojo.java:231) > at > org.apache.hadoop.maven.plugin.cmakebuilder.CompileMojo.execute(CompileMojo.java:98) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207) > ... 20 more > [ERROR] > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :hadoop-yarn-server-nodemanager > > my compiling environment: > jdk 1.8.0_181 > maven:3.3.9(/3.6.0) > cmake version 3.12.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi reassigned YARN-9661: Assignee: hunshenshi > Fix typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java > --- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Assignee: hunshenshi >Priority: Major > > There are some typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi updated YARN-9661: - Component/s: federation > Fix typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java > --- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Affects Versions: 3.2.0 >Reporter: hunshenshi >Priority: Major > > There are some typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi updated YARN-9661: - Component/s: yarn > Fix typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java > --- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Priority: Major > > There are some typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL
[ https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876171#comment-16876171 ] Szilard Nemeth commented on YARN-9629: -- Hi [~adam.antal]! +1 for the latest patch! > Support configurable MIN_LOG_ROLLING_INTERVAL > - > > Key: YARN-9629 > URL: https://issues.apache.org/jira/browse/YARN-9629 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Attachments: YARN-9629.001.patch, YARN-9629.002.patch, > YARN-9629.003.patch, YARN-9629.004.patch, YARN-9629.005.patch > > > One of the log-aggregation parameter, the minimum valid value for > {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is > MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in > YARN-2583. > It has been empirically set as 1 hour, as lower values would too frequently > put the NodeManagers under pressure. For bigger clusters that is indeed a > valid limitation, but for smaller clusters it makes sense and a valid > customer usecase to use lower values, even like not so lower 30 mins. At this > point this can only be achieved by setting > {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be > kept as debug purposes. > I'm suggesting to make this min configurable, although a warning should be > logged in the NodeManager startup when this value is lower than 1 hour. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi updated YARN-9661: - Description: There are some typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java (was: There are some typo in ) > Fix typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java > --- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: hunshenshi >Priority: Major > > There are some typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java
[ https://issues.apache.org/jira/browse/YARN-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi updated YARN-9661: - Description: There are some typo in > Fix typo in LocalityMulticastAMRMProxyPolicy.java and > AbstractConfigurableFederationPolicy.java > --- > > Key: YARN-9661 > URL: https://issues.apache.org/jira/browse/YARN-9661 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: hunshenshi >Priority: Major > > There are some typo in -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9661) Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java
hunshenshi created YARN-9661: Summary: Fix typo in LocalityMulticastAMRMProxyPolicy.java and AbstractConfigurableFederationPolicy.java Key: YARN-9661 URL: https://issues.apache.org/jira/browse/YARN-9661 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.2.0 Reporter: hunshenshi -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9660) Enhance documentation of Docker on YARN support
[ https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876165#comment-16876165 ] Szilard Nemeth commented on YARN-9660: -- Hi [~pbacsko]! Thanks for these improvement proposal of the documentation! I think it's obvious that all of these points about docker image requirements should be documented properly with some examples on compatible images. 1. As we discussed offline, the active cgroup handler could be easily printable with running "docker info". I guess this is an OS-independent way to detect the active handler. However, if we want to detect it, we need to run docker info before running any container and we would also rely on the output of docker info. I don't know how likely the output of docker info changes, but in the end, it's a dependency anyway. 2. and 3. I think we could detect this easily by creating some "image-validation" phase where we would check the availability of the bash commands that we are utilizing with the container executor script. If we agree on having such a validation phase, point #1 also could be the part of the validation process. All in all, I'm voting for updating the doc and having as many validations as possible, as it makes the Docker feature more easy and straightforward to use. > Enhance documentation of Docker on YARN support > --- > > Key: YARN-9660 > URL: https://issues.apache.org/jira/browse/YARN-9660 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, nodemanager >Reporter: Peter Bacsko >Priority: Major > > Right now, using Docker on YARN has some hard requirements. If these > requirements are not met, then launching the containers will fail and and > error message will be printed. Depending on how familiar the user is with > Docker, it might or might not be easy for them to understand what went wrong > and how to fix the underlying problem. > It would be important to explicitly document these requirements along with > the error messages. > *#1: CGroups handler cannot be systemd* > If docker deamon runs with systemd cgroups handler, we receive the following > error upon launching a container: > {noformat} > Container id: container_1561638268473_0006_01_02 > Exit code: 7 > Exception message: Launch container failed > Shell error output: /usr/bin/docker-current: Error response from daemon: > cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". > See '/usr/bin/docker-current run --help'. > Shell output: main : command provided 4 > main : run as user is johndoe > main : requested yarn user is johndoe > {noformat} > Solution: switch to cgroupfs. Doing so can be OS-specific, but we can > document a {{systemcl}} example. > > *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container* > Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. > It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and > there's only {{/bin/sh}}. > If we try to use these kind of images, we'll see the following error message: > {noformat} > Container id: container_1561638268473_0015_01_02 > Exit code: 7 > Exception message: Launch container failed > Shell error output: /usr/bin/docker-current: Error response from daemon: oci > runtime error: container_linux.go:235: starting container process caused > "exec: \"bash\": executable file not found in $PATH". > Shell output: main : command provided 4 > main : run as user is johndoe > main : requested yarn user is johndoe > {noformat} > > *#3: {{find}} command must be available on the {{$PATH}}* > It seems obvious that we have the {{find}} command, but even very popular > images like {{fedora}} requires that we install it separately. > If we don't have {{find}} available, then {{launcher_container.sh}} fails > with: > {noformat} > [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. > Error file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: > line 44: find: command not found > Last 4096 bytes of stderr.txt : > [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. > Error file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: > line 44: find: command not found > Last 4096 bytes of stderr.txt : > {noformat} > *#4 Add cmd-line example of how to tag local images* > This is actually documented under "Privileged Container Security > Consideration", but an one-liner would be helpful. I had trouble running a > local docker image
[jira] [Updated] (YARN-9660) Enhance documentation of Docker on YARN support
[ https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9660: --- Description: Right now, using Docker on YARN has some hard requirements. If these requirements are not met, then launching the containers will fail and and error message will be printed. Depending on how familiar the user is with Docker, it might or might not be easy for them to understand what went wrong and how to fix the underlying problem. It would be important to explicitly document these requirements along with the error messages. *#1: CGroups handler cannot be systemd* If docker deamon runs with systemd cgroups handler, we receive the following error upon launching a container: {noformat} Container id: container_1561638268473_0006_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". See '/usr/bin/docker-current run --help'. Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document a {{systemcl}} example. *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container* Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and there's only {{/bin/sh}}. If we try to use these kind of images, we'll see the following error message: {noformat} Container id: container_1561638268473_0015_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "exec: \"bash\": executable file not found in $PATH". Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} *#3: {{find}} command must be available on the {{$PATH}}* It seems obvious that we have the {{find}} command, but even very popular images like {{fedora}} requires that we install it separately. If we don't have {{find}} available, then {{launcher_container.sh}} fails with: {noformat} [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: line 44: find: command not found Last 4096 bytes of stderr.txt : [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: line 44: find: command not found Last 4096 bytes of stderr.txt : {noformat} *#4 Add cmd-line example of how to tag local images* This is actually documented under "Privileged Container Security Consideration", but an one-liner would be helpful. I had trouble running a local docker image and tagging it appropriately. Just an example like {{docker tag local_ubuntu local/ubuntu:latest}} is already very informative. was: Right now, using Docker on YARN has some hard requirements. If these requirements are not met, then launching the containers will fail and and error message will be printed. Depending on how familiar the user is with Docker, it might or might not be easy for them to understand what went wrong and how to fix the underlying problem. It would be important to explicitly document these requirements along with the error messages. *#1: CGroups handler cannot be systemd* If docker deamon runs with systemd cgroups handler, we receive the following error upon launching a container: {noformat} Container id: container_1561638268473_0006_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". See '/usr/bin/docker-current run --help'. Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document a {{systemcl}} example. *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container* Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and there's only {{/bin/sh}}. If we try to use these kind of images, we'll see the following error message: {noformat} Container id:
[jira] [Updated] (YARN-9660) Enhance documentation of Docker on YARN support
[ https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9660: --- Description: Right now, using Docker on YARN has some hard requirements. If these requirements are not met, then launching the containers will fail and and error message will be printed. Depending on how familiar the user is with Docker, it might or might not be easy for them to understand what went wrong and how to fix the underlying problem. It would be important to explicitly document these requirements along with the error messages. *#1: CGroups handler cannot be systemd* If docker deamon runs with systemd cgroups handler, we receive the following error upon launching a container: {noformat} Container id: container_1561638268473_0006_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". See '/usr/bin/docker-current run --help'. Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document a {{systemcl}} example. *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container* Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and there's only {{/bin/sh}}. If we try to use these kind of images, we'll see the following error message: {noformat} Container id: container_1561638268473_0015_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "exec: \"bash\": executable file not found in $PATH". Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} *#3: {{find}} command must be available on the {{$PATH}}* It seems obvious that we have the {{find}} command, but even very popular images like {{fedora}} requires that we install it separately. If we don't have {{find}} available, then {{launcher_container.sh}} fails with: {noformat} [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: line 44: find: command not found Last 4096 bytes of stderr.txt : [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: line 44: find: command not found Last 4096 bytes of stderr.txt : {noformat} #4 Add cmd-line example of how to tag local images This is actually documented under "Privileged Container Security Consideration", but an one-liner would be helpful. I had trouble running a local docker image and tagging it appropriately. Just an example like {{docker tag local_ubuntu local/ubuntu:latest}} is already very informative. was: Right now, using Docker on YARN has some hard requirements. If these requirements are not met, then launching the containers will fail and and error message will be printed. Depending on how familiar the user is with Docker, it might or might not be easy for them to understand what went wrong and how to fix the underlying problem. It would be important to explicitly document these requirements along with the error messages. *#1: CGroups handler cannot be systemd* If docker deamon runs with systemd cgroups handler, we receive the following error upon launching a container: {noformat} Container id: container_1561638268473_0006_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". See '/usr/bin/docker-current run --help'. Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document a {{systemcl}} example. *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container* Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and there's only {{/bin/sh}}. If we try to use these kind of images, we'll see the following error message: {noformat} Container id:
[jira] [Commented] (YARN-9660) Enhance documentation of Docker on YARN support
[ https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876144#comment-16876144 ] Peter Bacsko commented on YARN-9660: cc [~shaneku...@gmail.com] [~eyang] [~snemeth] - what do you guys think? I believe some of these could be detected and even printed to the user. The hard-coded {{/bin/bash}} could be overridable in {{UnixShellScriptBuilder}}. We have options here. > Enhance documentation of Docker on YARN support > --- > > Key: YARN-9660 > URL: https://issues.apache.org/jira/browse/YARN-9660 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, nodemanager >Reporter: Peter Bacsko >Priority: Major > > Right now, using Docker on YARN has some hard requirements. If these > requirements are not met, then launching the containers will fail and and > error message will be printed. Depending on how familiar the user is with > Docker, it might or might not be easy for them to understand what went wrong > and how to fix the underlying problem. > It would be important to explicitly document these requirements along with > the error messages. > *#1: CGroups handler cannot be systemd* > If docker deamon runs with systemd cgroups handler, we receive the following > error upon launching a container: > {noformat} > Container id: container_1561638268473_0006_01_02 > Exit code: 7 > Exception message: Launch container failed > Shell error output: /usr/bin/docker-current: Error response from daemon: > cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". > See '/usr/bin/docker-current run --help'. > Shell output: main : command provided 4 > main : run as user is johndoe > main : requested yarn user is johndoe > {noformat} > Solution: switch to cgroupfs. Doing so can be OS-specific, but we can > document a {{systemcl}} example. > > *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container* > Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. > It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and > there's only {{/bin/sh}}. > If we try to use these kind of images, we'll see the following error message: > {noformat} > Container id: container_1561638268473_0015_01_02 > Exit code: 7 > Exception message: Launch container failed > Shell error output: /usr/bin/docker-current: Error response from daemon: oci > runtime error: container_linux.go:235: starting container process caused > "exec: \"bash\": executable file not found in $PATH". > Shell output: main : command provided 4 > main : run as user is johndoe > main : requested yarn user is johndoe > {noformat} > > *#3: {{find}} command must be available on the {{$PATH}}* > It seems obvious that we have the {{find}} command, but even very popular > images like {{fedora}} requires that we install it separately. > If we don't have {{find}} available, then {{launcher_container.sh}} fails > with: > {noformat} > [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. > Error file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: > line 44: find: command not found > Last 4096 bytes of stderr.txt : > [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. > Error file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: > line 44: find: command not found > Last 4096 bytes of stderr.txt : > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9660) Enhance documentation of Docker on YARN support
[ https://issues.apache.org/jira/browse/YARN-9660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9660: --- Description: Right now, using Docker on YARN has some hard requirements. If these requirements are not met, then launching the containers will fail and and error message will be printed. Depending on how familiar the user is with Docker, it might or might not be easy for them to understand what went wrong and how to fix the underlying problem. It would be important to explicitly document these requirements along with the error messages. *#1: CGroups handler cannot be systemd* If docker deamon runs with systemd cgroups handler, we receive the following error upon launching a container: {noformat} Container id: container_1561638268473_0006_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". See '/usr/bin/docker-current run --help'. Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document a {{systemcl}} example. *#2: {{/bin/bash}} must be present on the {{$PATH}} inside the container* Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and there's only {{/bin/sh}}. If we try to use these kind of images, we'll see the following error message: {noformat} Container id: container_1561638268473_0015_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "exec: \"bash\": executable file not found in $PATH". Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} *#3: {{find}} command must be available on the {{$PATH}}* It seems obvious that we have the {{find}} command, but even very popular images like {{fedora}} requires that we install it separately. If we don't have {{find}} available, then {{launcher_container.sh}} fails with: {noformat} [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: line 44: find: command not found Last 4096 bytes of stderr.txt : [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: line 44: find: command not found Last 4096 bytes of stderr.txt : {noformat} was: Right now, using Docker on YARN has some hard requirements. If these requirements are not met, then launching the containers will fail and and error message will be printed. Depending on how familiar the user is with Docker, it might or might not be easy for them to understand what went wrong and how to fix the underlying problem. It would be important to explicitly document these requirements along with the error messages. #1: CGroups handler cannot be systemd If docker deamon runs with systemd cgroups handler, we receive the following error upon launching a container: {noformat} Container id: container_1561638268473_0006_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". See '/usr/bin/docker-current run --help'. Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document a {{systemcl}} example. #2: {{/bin/bash}} must be present on the {{$PATH}} inside the container Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and there's only {{/bin/sh}}. If we try to use these kind of images, we'll see the following error message: {noformat} Container id: container_1561638268473_0015_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "exec: \"bash\": executable file not found in $PATH". Shell output: main : command provided 4 main : run as user is johndoe main :
[jira] [Created] (YARN-9660) Enhance documentation of Docker on YARN support
Peter Bacsko created YARN-9660: -- Summary: Enhance documentation of Docker on YARN support Key: YARN-9660 URL: https://issues.apache.org/jira/browse/YARN-9660 Project: Hadoop YARN Issue Type: Bug Components: documentation, nodemanager Reporter: Peter Bacsko Right now, using Docker on YARN has some hard requirements. If these requirements are not met, then launching the containers will fail and and error message will be printed. Depending on how familiar the user is with Docker, it might or might not be easy for them to understand what went wrong and how to fix the underlying problem. It would be important to explicitly document these requirements along with the error messages. #1: CGroups handler cannot be systemd If docker deamon runs with systemd cgroups handler, we receive the following error upon launching a container: {noformat} Container id: container_1561638268473_0006_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice". See '/usr/bin/docker-current run --help'. Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document a {{systemcl}} example. #2: {{/bin/bash}} must be present on the {{$PATH}} inside the container Some smaller images like "busybox" or "alpine" does not have {{/bin/bash}}. It's because all commands under {{/bin}} are linked to {{/bin/busybox}} and there's only {{/bin/sh}}. If we try to use these kind of images, we'll see the following error message: {noformat} Container id: container_1561638268473_0015_01_02 Exit code: 7 Exception message: Launch container failed Shell error output: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "exec: \"bash\": executable file not found in $PATH". Shell output: main : command provided 4 main : run as user is johndoe main : requested yarn user is johndoe {noformat} #3: {{find}} command must be available on the {{$PATH}} It seems obvious that we have the {{find}} command, but even very popular images like {{fedora}} requires that we install it separately. If we don't have {{find}} available, then {{launcher_container.sh}} fails with: {noformat} 2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: line 44: find: command not found Last 4096 bytes of stderr.txt : [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_02/launch_container.sh: line 44: find: command not found Last 4096 bytes of stderr.txt : {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL
[ https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876128#comment-16876128 ] Adam Antal commented on YARN-9629: -- Ah indeed, thanks for the suggestion [~snemeth]. I must have missed it. Fixed it in patch v5. > Support configurable MIN_LOG_ROLLING_INTERVAL > - > > Key: YARN-9629 > URL: https://issues.apache.org/jira/browse/YARN-9629 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Attachments: YARN-9629.001.patch, YARN-9629.002.patch, > YARN-9629.003.patch, YARN-9629.004.patch, YARN-9629.005.patch > > > One of the log-aggregation parameter, the minimum valid value for > {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is > MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in > YARN-2583. > It has been empirically set as 1 hour, as lower values would too frequently > put the NodeManagers under pressure. For bigger clusters that is indeed a > valid limitation, but for smaller clusters it makes sense and a valid > customer usecase to use lower values, even like not so lower 30 mins. At this > point this can only be achieved by setting > {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be > kept as debug purposes. > I'm suggesting to make this min configurable, although a warning should be > logged in the NodeManager startup when this value is lower than 1 hour. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL
[ https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-9629: - Attachment: YARN-9629.005.patch > Support configurable MIN_LOG_ROLLING_INTERVAL > - > > Key: YARN-9629 > URL: https://issues.apache.org/jira/browse/YARN-9629 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Attachments: YARN-9629.001.patch, YARN-9629.002.patch, > YARN-9629.003.patch, YARN-9629.004.patch, YARN-9629.005.patch > > > One of the log-aggregation parameter, the minimum valid value for > {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is > MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in > YARN-2583. > It has been empirically set as 1 hour, as lower values would too frequently > put the NodeManagers under pressure. For bigger clusters that is indeed a > valid limitation, but for smaller clusters it makes sense and a valid > customer usecase to use lower values, even like not so lower 30 mins. At this > point this can only be achieved by setting > {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be > kept as debug purposes. > I'm suggesting to make this min configurable, although a warning should be > logged in the NodeManager startup when this value is lower than 1 hour. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3221) Applications should be able to 're-register'
[ https://issues.apache.org/jira/browse/YARN-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876118#comment-16876118 ] wangxiangchun commented on YARN-3221: - I encountered the same problem in yarn federation . when I enable the amrmProxy Ha ,I fail the first app attempt ,and it go to the sencond app attempt ,it has to register the UAM ,then this problem comes. > Applications should be able to 're-register' > - > > Key: YARN-3221 > URL: https://issues.apache.org/jira/browse/YARN-3221 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Sidharta Seethana >Priority: Major > > Today, it is not possible for YARN applications to 're-register' in > failure/restart scenarios. This is especially problematic for Unmanaged > applications - when restarts (normal or otherwise) or other failures > necessitate the re-creation of the AMRMClient (along with a reset of the > internal RPC counter). The YARN RM disallows an attempt to register again > (with the same saved token) with the following exception shown below. This > should be fixed. > {quote} > rmClient.RegisterApplicationMaster > org.apache.hadoop.yarn.exceptions.InvalidApplicationMasterRequestException:Application > Master is already registered : application_1424304845861_0002 > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:264) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9658) UT failures in TestLeafQueue
[ https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876089#comment-16876089 ] Hadoop QA commented on YARN-9658: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 58s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 84m 10s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9658 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12973311/YARN-9658.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5f274ba9d9f3 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1e727cf | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24336/testReport/ | | Max. process+thread count | 916 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24336/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > UT failures in TestLeafQueue >
[jira] [Commented] (YARN-9521) RM failed to start due to system services
[ https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876065#comment-16876065 ] Hadoop QA commented on YARN-9521: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 47s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 58s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 45s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s{color} | {color:green} hadoop-yarn-services-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 78m 49s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9521 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12973315/YARN-9521.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f05d9d77fb39 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1e727cf | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | Test Results |
[jira] [Commented] (YARN-9521) RM failed to start due to system services
[ https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876007#comment-16876007 ] kyungwan nam commented on YARN-9521: I attached a new patch which ApiServiceClient.actionCleanUp will be performed with ugi.doAs() > RM failed to start due to system services > - > > Key: YARN-9521 > URL: https://issues.apache.org/jira/browse/YARN-9521 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: kyungwan nam >Priority: Major > Attachments: YARN-9521.001.patch, YARN-9521.002.patch > > > when starting RM, listing system services directory has failed as follows. > {code} > 2019-04-30 17:18:25,441 INFO client.SystemServiceManagerImpl > (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory > is configured to /services > 2019-04-30 17:18:25,467 INFO client.SystemServiceManagerImpl > (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation > initialized to yarn (auth:SIMPLE) > 2019-04-30 17:18:25,467 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in > state STARTED > org.apache.hadoop.service.ServiceStateException: java.io.IOException: > Filesystem closed > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:203) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501) > Caused by: java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1200) > at > org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179) > at > org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187) > at > org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375) > at > org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282) > at > org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > ... 13 more > {code} > it looks like due to the usage of filesystem cache. > this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to > yarn-site -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9521) RM failed to start due to system services
[ https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876006#comment-16876006 ] kyungwan nam commented on YARN-9521: after some further digging, I think that I figure out the cause of this issue more correctly. normally, when yarn-service API is requested, a new ugi is created and it is performed inside of the ugi.doAs() when calling FileSystem.get() inside of the ugi.doAs(), it always create a new FileSystem. because the ugi is used for the key of the FileSystem.CACHE. (YARN-3336 would be helpful to understand this) so in this case, does not close a FileSystem from the FileSystem.CACHE {code} UserGroupInformation ugi = getProxyUser(request); LOG.info("POST: createService = {} user = {}", service, ugi); if(service.getState()==ServiceState.STOPPED) { ugi.doAs(new PrivilegedExceptionAction() { @Override public Void run() throws YarnException, IOException { ServiceClient sc = getServiceClient(); try { sc.init(YARN_CONFIG); sc.start(); sc.actionBuild(service); } finally { sc.close(); } return null; } }); {code} on the other hand, ApiServiceClient.actionCleanUp which is called at RMAppImpl.appAdminClientCleanUp is performed as the RM loginUser instead of doAs() in this case, FileSystem.get() can return cached one which SystemServiceManagerImpl, FileSystemNodeLabelsStore refer {code} @Override public int actionCleanUp(String appName, String userName) throws IOException, YarnException { ServiceClient sc = new ServiceClient(); sc.init(getConfig()); sc.start(); int result = sc.actionCleanUp(appName, userName); sc.close(); return result; } {code} > RM failed to start due to system services > - > > Key: YARN-9521 > URL: https://issues.apache.org/jira/browse/YARN-9521 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: kyungwan nam >Priority: Major > Attachments: YARN-9521.001.patch, YARN-9521.002.patch > > > when starting RM, listing system services directory has failed as follows. > {code} > 2019-04-30 17:18:25,441 INFO client.SystemServiceManagerImpl > (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory > is configured to /services > 2019-04-30 17:18:25,467 INFO client.SystemServiceManagerImpl > (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation > initialized to yarn (auth:SIMPLE) > 2019-04-30 17:18:25,467 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in > state STARTED > org.apache.hadoop.service.ServiceStateException: java.io.IOException: > Filesystem closed > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:203) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501) > Caused by: java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233) > at >
[jira] [Updated] (YARN-9521) RM failed to start due to system services
[ https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kyungwan nam updated YARN-9521: --- Attachment: YARN-9521.002.patch > RM failed to start due to system services > - > > Key: YARN-9521 > URL: https://issues.apache.org/jira/browse/YARN-9521 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.2 >Reporter: kyungwan nam >Priority: Major > Attachments: YARN-9521.001.patch, YARN-9521.002.patch > > > when starting RM, listing system services directory has failed as follows. > {code} > 2019-04-30 17:18:25,441 INFO client.SystemServiceManagerImpl > (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory > is configured to /services > 2019-04-30 17:18:25,467 INFO client.SystemServiceManagerImpl > (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation > initialized to yarn (auth:SIMPLE) > 2019-04-30 17:18:25,467 INFO service.AbstractService > (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in > state STARTED > org.apache.hadoop.service.ServiceStateException: java.io.IOException: > Filesystem closed > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:203) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501) > Caused by: java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1200) > at > org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179) > at > org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187) > at > org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375) > at > org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282) > at > org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > ... 13 more > {code} > it looks like due to the usage of filesystem cache. > this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to > yarn-site -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered
[ https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875997#comment-16875997 ] Tao Yang commented on YARN-9623: [~cheersyang], I have created YARN-9658 to fix these UT failures. > Auto adjust max queue length of app activities to make sure activities on all > nodes can be covered > -- > > Key: YARN-9623 > URL: https://issues.apache.org/jira/browse/YARN-9623 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9623.001.patch, YARN-9623.002.patch > > > Currently we can use configuration entry > "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to > control max queue length of app activities, but in some scenarios , this > configuration may need to be updated in a growing cluster. Moreover, it's > better for users to ignore that conf therefor it should be auto adjusted > internally. > There are some differences among different scheduling modes: > * multi-node placement disabled > ** Heartbeat driven scheduling: max queue length of app activities should > not less than the number of nodes, considering nodes can not be always in > order, we should make some room for misorder, for example, we can guarantee > that max queue length should not be less than 1.2 * numNodes > ** Async scheduling: every async scheduling thread goes through all nodes in > order, in this mode, we should guarantee that max queue length should be > numThreads * numNodes. > * multi-node placement enabled: activities on all nodes can be involved in a > single app allocation, therefor there's no need to adjust for this mode. > To sum up, we can adjust the max queue length of app activities like this: > {code} > int configuredMaxQueueLength; > int maxQueueLength; > serviceInit(){ > ... > configuredMaxQueueLength = ...; //read configured max queue length > maxQueueLength = configuredMaxQueueLength; //take configured value as > default > } > CleanupThread#run(){ > ... > if (multiNodeDisabled) { > if (asyncSchedulingEnabled) { >maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * > numNodes); > } else { >maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes); > } > } else if (maxQueueLength != configuredMaxQueueLength) { > maxQueueLength = configuredMaxQueueLength; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9658) UT failures in TestLeafQueue
[ https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9658: --- Affects Version/s: 3.3.0 > UT failures in TestLeafQueue > > > Key: YARN-9658 > URL: https://issues.apache.org/jira/browse/YARN-9658 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Minor > Attachments: YARN-9658.001.patch > > > In ActivitiesManager, if there's no yarn configuration in mock RMContext, > cleanup interval can't be initialized to 5 seconds by default, causing the > cleanup thread keeps running repeatedly without interval which may bring > problems to mockito framework, it caused OOM in this case, internally many > throwable objects were generated by incomplete mock. > Add configuration for mock RMContext to fix failures in TestLeafQueue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9658) UT failures in TestLeafQueue
[ https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9658: --- Attachment: YARN-9658.001.patch > UT failures in TestLeafQueue > > > Key: YARN-9658 > URL: https://issues.apache.org/jira/browse/YARN-9658 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Minor > Attachments: YARN-9658.001.patch > > > In ActivitiesManager, if there's no yarn configuration in mock RMContext, > cleanup interval can't be initialized to 5 seconds by default, causing the > cleanup thread keeps running repeatedly without interval which may bring > problems to mockito framework, it caused OOM in this case, internally many > throwable objects were generated by incomplete mock. > Add configuration for mock RMContext to fix failures in TestLeafQueue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9658) UT failures in TestLeafQueue
[ https://issues.apache.org/jira/browse/YARN-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9658: --- Description: In ActivitiesManager, if there's no yarn configuration in mock RMContext, cleanup interval can't be initialized to 5 seconds by default, causing the cleanup thread keeps running repeatedly without interval which may bring problems to mockito framework, it caused OOM in this case, internally many throwable objects were generated by incomplete mock. Add configuration for mock RMContext to fix failures in TestLeafQueue. was: In ActivitiesManager, if there's no yarn configuration in mock RMContext, cleanup interval can't be initialized to 5 seconds by default, causing the cleanup thread keeps running repeatedly without interval which may bring problems to mockito framework, it caused OOM in this case, internally many throwable objects were generated by incomplete mock. Add a default value for ActivitiesManager#activitiesCleanupIntervalMs to avoid cleanup thread running repeatedly without interval. > UT failures in TestLeafQueue > > > Key: YARN-9658 > URL: https://issues.apache.org/jira/browse/YARN-9658 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Minor > Attachments: YARN-9658.001.patch > > > In ActivitiesManager, if there's no yarn configuration in mock RMContext, > cleanup interval can't be initialized to 5 seconds by default, causing the > cleanup thread keeps running repeatedly without interval which may bring > problems to mockito framework, it caused OOM in this case, internally many > throwable objects were generated by incomplete mock. > Add configuration for mock RMContext to fix failures in TestLeafQueue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9659) yarn application cannot be killed after updating info for attempt failed
[ https://issues.apache.org/jira/browse/YARN-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangqw updated YARN-9659: -- Description: Affected by HDFS: {code:java} Not enough replicas was chosen. Reason:{NOT_ENOUGH_STORAGE_SPACE=2} {code} updating info for attempt failed: {code:java} 2019-06-28 10:36:57,917 INFO recovery.FileSystemRMStateStore (FileSystemRMStateStore.java:updateApplicationAttemptStateInternal(464)) - Updating info for attempt: appattempt_1561517363839_0013_01 at: /tmp/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1561 517363839_0013/appattempt_1561517363839_0013_01 2019-06-28 10:36:57,931 INFO hdfs.DataStreamer (DataStreamer.java:createBlockOutputStream(1789)) - Exception in createBlockOutputStream blk_1088382064_14942741 java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 10.0.96.36:50010 at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110) at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1778) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716) {code} in RM log: {code:java} 2019-06-28 10:36:57,953 INFO recovery.FileSystemRMStateStore (FileSystemRMStateStore.java:runWithRetries(743)) - Maxed out FS retries. Giving up! ... ... 2019-06-28 10:49:28,746 INFO util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(148)) - Expired:appattempt_1561517363839_0013_01 Timed out after 600 secs {code} Now application cannot be killed: {code:java} 19/07/01 15:22:55 INFO impl.YarnClientImpl: Waiting for application application_1561517363839_0013 to be killed. {code} And when access container info page in rm web ui, error 500 returned. RM log: {code:java} 2019-06-28 10:24:00,176 ERROR webapp.Dispatcher (Dispatcher.java:service(171)) - error handling URI: /cluster/appattempt/appattempt_1561517363839_0011_04 java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor253.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ... ... Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.FINAL_SAVING at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createAttemptHeadRoomTable(RMAppAttemptBlock.java:197) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:151) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:243) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58){code} Notice related issue has been patched: YARN-8183 was: Affected by HDFS: {code:java} Not enough replicas was chosen. Reason:{NOT_ENOUGH_STORAGE_SPACE=2} {code} updating info for attempt failed: {code:java} 2019-06-28 10:36:57,917 INFO recovery.FileSystemRMStateStore (FileSystemRMStateStore.java:updateApplicationAttemptStateInternal(464)) - Updating info for attempt: appattempt_1561517363839_0013_01 at: /tmp/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1561 517363839_0013/appattempt_1561517363839_0013_01 2019-06-28 10:36:57,931 INFO hdfs.DataStreamer (DataStreamer.java:createBlockOutputStream(1789)) - Exception in createBlockOutputStream blk_1088382064_14942741 java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 10.0.96.36:50010 at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134) at
[jira] [Created] (YARN-9659) yarn application cannot be killed after updating info for attempt failed
zhangqw created YARN-9659: - Summary: yarn application cannot be killed after updating info for attempt failed Key: YARN-9659 URL: https://issues.apache.org/jira/browse/YARN-9659 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.1 Environment: Hadoop 3.1.1 release Centos 7.1 Reporter: zhangqw Affected by HDFS: {code:java} Not enough replicas was chosen. Reason:{NOT_ENOUGH_STORAGE_SPACE=2} {code} updating info for attempt failed: {code:java} 2019-06-28 10:36:57,917 INFO recovery.FileSystemRMStateStore (FileSystemRMStateStore.java:updateApplicationAttemptStateInternal(464)) - Updating info for attempt: appattempt_1561517363839_0013_01 at: /tmp/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1561 517363839_0013/appattempt_1561517363839_0013_01 2019-06-28 10:36:57,931 INFO hdfs.DataStreamer (DataStreamer.java:createBlockOutputStream(1789)) - Exception in createBlockOutputStream blk_1088382064_14942741 java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 10.0.96.36:50010 at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110) at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1778) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716) {code} in RM log: {code:java} 2019-06-28 10:36:57,953 INFO recovery.FileSystemRMStateStore (FileSystemRMStateStore.java:runWithRetries(743)) - Maxed out FS retries. Giving up! ... ... 2019-06-28 10:49:28,746 INFO util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(148)) - Expired:appattempt_1561517363839_0013_01 Timed out after 600 secs {code} Now application cannot be killed: {code:java} 19/07/01 15:22:55 INFO impl.YarnClientImpl: Waiting for application application_1561517363839_0013 to be killed. {code} And when access container info page in rm web ui, error 500 returned. RM log: {code:java} 2019-06-28 10:24:00,176 ERROR webapp.Dispatcher (Dispatcher.java:service(171)) - error handling URI: /cluster/appattempt/appattempt_1561517363839_0011_04 java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor253.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ... ... Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.FINAL_SAVING at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createAttemptHeadRoomTable(RMAppAttemptBlock.java:197) at org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:151) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:243) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58){code} Notice related issue has patched: YARN-8183 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9658) UT failures in TestLeafQueue
Tao Yang created YARN-9658: -- Summary: UT failures in TestLeafQueue Key: YARN-9658 URL: https://issues.apache.org/jira/browse/YARN-9658 Project: Hadoop YARN Issue Type: Bug Reporter: Tao Yang Assignee: Tao Yang In ActivitiesManager, if there's no yarn configuration in mock RMContext, cleanup interval can't be initialized to 5 seconds by default, causing the cleanup thread keeps running repeatedly without interval which may bring problems to mockito framework, it caused OOM in this case, internally many throwable objects were generated by incomplete mock. Add a default value for ActivitiesManager#activitiesCleanupIntervalMs to avoid cleanup thread running repeatedly without interval. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered
[ https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875986#comment-16875986 ] Weiwei Yang commented on YARN-9623: --- Hi [~Tao Yang], pls create a new issue to fix this failure. Thanks > Auto adjust max queue length of app activities to make sure activities on all > nodes can be covered > -- > > Key: YARN-9623 > URL: https://issues.apache.org/jira/browse/YARN-9623 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9623.001.patch, YARN-9623.002.patch > > > Currently we can use configuration entry > "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to > control max queue length of app activities, but in some scenarios , this > configuration may need to be updated in a growing cluster. Moreover, it's > better for users to ignore that conf therefor it should be auto adjusted > internally. > There are some differences among different scheduling modes: > * multi-node placement disabled > ** Heartbeat driven scheduling: max queue length of app activities should > not less than the number of nodes, considering nodes can not be always in > order, we should make some room for misorder, for example, we can guarantee > that max queue length should not be less than 1.2 * numNodes > ** Async scheduling: every async scheduling thread goes through all nodes in > order, in this mode, we should guarantee that max queue length should be > numThreads * numNodes. > * multi-node placement enabled: activities on all nodes can be involved in a > single app allocation, therefor there's no need to adjust for this mode. > To sum up, we can adjust the max queue length of app activities like this: > {code} > int configuredMaxQueueLength; > int maxQueueLength; > serviceInit(){ > ... > configuredMaxQueueLength = ...; //read configured max queue length > maxQueueLength = configuredMaxQueueLength; //take configured value as > default > } > CleanupThread#run(){ > ... > if (multiNodeDisabled) { > if (asyncSchedulingEnabled) { >maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * > numNodes); > } else { >maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes); > } > } else if (maxQueueLength != configuredMaxQueueLength) { > maxQueueLength = configuredMaxQueueLength; > } > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL
[ https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875976#comment-16875976 ] Szilard Nemeth commented on YARN-9629: -- Hi [~adam.antal]! Thanks for the update! The code changes you made for patch004 regarding documentation in yarn-default.xml looks good to me. One minor thing left: Could you please remove any "suggested" references (variable names, log messages) in method org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService#calculateRollingMonitorInterval? Thanks! > Support configurable MIN_LOG_ROLLING_INTERVAL > - > > Key: YARN-9629 > URL: https://issues.apache.org/jira/browse/YARN-9629 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Minor > Attachments: YARN-9629.001.patch, YARN-9629.002.patch, > YARN-9629.003.patch, YARN-9629.004.patch > > > One of the log-aggregation parameter, the minimum valid value for > {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is > MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in > YARN-2583. > It has been empirically set as 1 hour, as lower values would too frequently > put the NodeManagers under pressure. For bigger clusters that is indeed a > valid limitation, but for smaller clusters it makes sense and a valid > customer usecase to use lower values, even like not so lower 30 mins. At this > point this can only be achieved by setting > {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be > kept as debug purposes. > I'm suggesting to make this min configurable, although a warning should be > logged in the NodeManager startup when this value is lower than 1 hour. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9657) AbstractLivelinessMonitor add serviceName to PingChecker thread
Bibin A Chundatt created YARN-9657: -- Summary: AbstractLivelinessMonitor add serviceName to PingChecker thread Key: YARN-9657 URL: https://issues.apache.org/jira/browse/YARN-9657 Project: Hadoop YARN Issue Type: Improvement Reporter: Bibin A Chundatt -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org