[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339830#comment-14339830 ] Hadoop QA commented on YARN-3122: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701280/YARN-3122.005.patch against trunk revision 8ca0d95. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6776//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6776//console This message is automatically generated. > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, > YARN-3122.prelim.patch, YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3204) Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
[ https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339811#comment-14339811 ] Chengbing Liu commented on YARN-3204: - {code} -this.reservedAppSchedulable = (FSAppAttempt) application; + if(application instanceof FSAppAttempt){ + this.reservedAppSchedulable = (FSAppAttempt) application; +} {code} Would it be better if we throw an exception if the condition is not met? {code} Set planQueues = new HashSet(); for (FSQueue fsQueue : queueMgr.getQueues()) { String queueName = fsQueue.getName(); - if (allocConf.isReservable(queueName)) { + boolean isReservable = false; + synchronized(this){ + isReservable = allocConf.isReservable(queueName); + } + if (isReservable) { planQueues.add(queueName); } } {code} I think we should synchronize the whole function, since {{allocConf}} may be reloaded during this loop. A dedicated lock is better than {{FairScheduler.this}} to me. > Fix new findbug warnings in > hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair) > -- > > Key: YARN-3204 > URL: https://issues.apache.org/jira/browse/YARN-3204 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: YARN-3204-001.patch, YARN-3204-002.patch > > > Please check following findbug report.. > https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339799#comment-14339799 ] Rohith commented on YARN-3222: -- bq. NODE_USABLE event is sent regardless the reconnected node is healthy or not healthy, which is incorrect, right ? Yes, I think it was assumed like if new node is reconnecting then NM is healthy. It is better to retain the old state i.e UNHEALTHY and in the next 1st heartbeat NodeStatus can be moved from Unhealthy to Running. I see another potential issue that if old node is retaining then RMnode has to be updated {{totalCapability}} with new RMNode resource. But in flow, {{totalCapability}} is not updated. This result , scheduler has updated resources value but RMNode has stale memory. Any client getting RMnode capabilit from RMnode would end up in wrong node resource value. {code} if (noRunningApps) { // some code rmNode.context.getDispatcher().getEventHandler().handle( new NodeRemovedSchedulerEvent(rmNode)); if (rmNode.getHttpPort() == newNode.getHttpPort()) { if (rmNode.getState() != NodeState.UNHEALTHY) { // Only add new node if old state is not UNHEALTHY rmNode.context.getDispatcher().getEventHandler().handle( new NodeAddedSchedulerEvent(newNode)); // NEW NODE CAPABILITY SHOULD BE UPDATED TO OLD NODE } } else { // Reconnected node differs, so replace old node and start new node rmNode.context.getDispatcher().getEventHandler().handle( new RMNodeStartedEvent(newNode.getNodeID(), null, null)); // No need to update totalCapability since old node is replaced with new node. } } {code} > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode
[ https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339796#comment-14339796 ] Varun Saxena commented on YARN-2962: [~kasha] / [~ka...@cloudera.com], for this I WILL assume that state store will be formatted before making the config change ? Backward compatibility for running apps after config change (on RM restart) will be difficult. As we may have to try all the possible appid formats. > ZKRMStateStore: Limit the number of znodes under a znode > > > Key: YARN-2962 > URL: https://issues.apache.org/jira/browse/YARN-2962 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Varun Saxena >Priority: Critical > > We ran into this issue where we were hitting the default ZK server message > size configs, primarily because the message had too many znodes even though > they individually they were all small. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3122: --- Attachment: YARN-3122.005.patch The updated patch looks mostly good to me. I like that we are mimicking top; users will it easier to reason about this. I had a few nit picks that I have put into v5 patch - rename CpuTimeTracker#getCpuUsagePercent and changes to comments. [~adhoot] - can you please review and verify the changes. One last concern - we use 0 for when we cannot calculate the percentage. Shouldn't we use UNAVAILABLE instead? > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, > YARN-3122.prelim.patch, YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339780#comment-14339780 ] Gururaj Shetty commented on YARN-3168: -- Hi [~aw] All your comments are incorporated. Kindly review the latest patch attached. > Convert site documentation from apt to markdown > --- > > Key: YARN-3168 > URL: https://issues.apache.org/jira/browse/YARN-3168 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.7.0 >Reporter: Allen Wittenauer >Assignee: Gururaj Shetty > Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, > YARN-3168.20150225.2.patch, YARN-3168.20150227.3.patch > > > YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339774#comment-14339774 ] Hadoop QA commented on YARN-2820: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701267/YARN-2820.006.patch against trunk revision 8ca0d95. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6775//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6775//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6775//console This message is automatically generated. > Do retry in FileSystemRMStateStore for better error recovery when > update/store failure due to IOException. > -- > > Key: YARN-2820 > URL: https://issues.apache.org/jira/browse/YARN-2820 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0, 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2820.000.patch, YARN-2820.001.patch, > YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, > YARN-2820.005.patch, YARN-2820.006.patch > > > Do retry in FileSystemRMStateStore for better error recovery when > update/store failure due to IOException. > When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We > saw the following IOexception cause the RM shutdown. > {code} > 2014-10-29 23:49:12,202 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Updating info for attempt: appattempt_1409135750325_109118_01 at: > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01 > 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:46,283 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Error updating info for attempt: appattempt_1409135750325_109118_01 > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > 2014-10-29 23:49:46,284 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: > Error storing/updating appAttempt: appattempt_1409135750325_109118_01 > 2014-10-29 23:49:46,916 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: > Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) > > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) > > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) > at > org.apache.hadoop.yarn.s
[jira] [Updated] (YARN-3168) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gururaj Shetty updated YARN-3168: - Attachment: YARN-3168.20150227.3.patch > Convert site documentation from apt to markdown > --- > > Key: YARN-3168 > URL: https://issues.apache.org/jira/browse/YARN-3168 > Project: Hadoop YARN > Issue Type: Improvement > Components: documentation >Affects Versions: 2.7.0 >Reporter: Allen Wittenauer >Assignee: Gururaj Shetty > Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, > YARN-3168.20150225.2.patch, YARN-3168.20150227.3.patch > > > YARN analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3262) Surface application outstanding resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339753#comment-14339753 ] Hadoop QA commented on YARN-3262: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701265/YARN-3262.4.patch against trunk revision 8ca0d95. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6773//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6773//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6773//console This message is automatically generated. > Surface application outstanding resource requests table > --- > > Key: YARN-3262 > URL: https://issues.apache.org/jira/browse/YARN-3262 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, > YARN-3262.4.patch, resource requests.png > > > It would be useful to surface the outstanding resource requests table on the > application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339744#comment-14339744 ] Abin Shahab commented on YARN-2981: --- [~raviprak] [~vinodkv] [~vvasudev] [~ywskycn] please review > DockerContainerExecutor must support a Cluster-wide default Docker image > > > Key: YARN-2981 > URL: https://issues.apache.org/jira/browse/YARN-2981 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-2981.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path
[ https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339730#comment-14339730 ] Hadoop QA commented on YARN-3269: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701264/YARN-3269.2.patch against trunk revision 8ca0d95. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.TestPBLocalizerRPC org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6774//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6774//console This message is automatically generated. > Yarn.nodemanager.remote-app-log-dir could not be configured to fully > qualified path > --- > > Key: YARN-3269 > URL: https://issues.apache.org/jira/browse/YARN-3269 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3269.1.patch, YARN-3269.2.patch > > > Log aggregation currently is always relative to the default file system, not > an arbitrary file system identified by URI. So we can't put an arbitrary > fully-qualified URI into yarn.nodemanager.remote-app-log-dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339716#comment-14339716 ] zhihai xu commented on YARN-2820: - [~ozawa], thanks for your thorough review, I am really appreciated. I uploaded a new patch YARN-2820.005.patch, which addressed all your comments, It also put fsIn.close in try-with-resources at loadRMDTSecretManagerState, which is similar as fsOut.close at storeRMDTMasterKeyState. please review it, thanks zhihai > Do retry in FileSystemRMStateStore for better error recovery when > update/store failure due to IOException. > -- > > Key: YARN-2820 > URL: https://issues.apache.org/jira/browse/YARN-2820 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0, 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2820.000.patch, YARN-2820.001.patch, > YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, > YARN-2820.005.patch, YARN-2820.006.patch > > > Do retry in FileSystemRMStateStore for better error recovery when > update/store failure due to IOException. > When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We > saw the following IOexception cause the RM shutdown. > {code} > 2014-10-29 23:49:12,202 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Updating info for attempt: appattempt_1409135750325_109118_01 at: > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01 > 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:46,283 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Error updating info for attempt: appattempt_1409135750325_109118_01 > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > 2014-10-29 23:49:46,284 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: > Error storing/updating appAttempt: appattempt_1409135750325_109118_01 > 2014-10-29 23:49:46,916 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: > Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) > > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) > > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:744) > {code} > As discussed at YARN-1778, TestFSRMStateStore failure is also due to > IOException in storeApplicationStateInternal. > Stack trace from TestFSRMStateStore failure: > {code} > 2015-02-03 00:09:19,092 INFO [Thre
[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339715#comment-14339715 ] Zhijie Shen commented on YARN-3125: --- Li, thanks for working on the patch. Looking at the test code of TestDistributedShell. MiniYarnCluster will be started for each individual test case. Therefore, we can potentially avoid conflict by configuration, and don't need to hard code the service address. For those test cases about v1 timeline service, you set enableAHS = true, while for the test cases about v2, you add the aux service configuration. In this way, either v1 or v2 timeline service will be set up, but not both. To distinguish the different test cases, you can try the following thing: {code} @Rule public TestName name = new TestName(); {code} Using the test name to switching the setup of MiniYarnCluster in setup(). Another minor issue. Instead of using {code} private static final String TIMELINE_AUX_SERVICE_CLASS = "org.apache.hadoop.yarn.server.timelineservice.aggregator" + ".PerNodeAggregatorServer"; {code} You can use {{PerNodeAggregatorServer.class.getName()}} directly. > [Event producers] Change distributed shell to use new timeline service > -- > > Key: YARN-3125 > URL: https://issues.apache.org/jira/browse/YARN-3125 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Junping Du > Attachments: YARN-3125.patch, YARN-3125_UT-022615.patch, > YARN-3125v2.patch, YARN-3125v3.patch > > > We can start with changing distributed shell to use new timeline service once > the framework is completed, in which way we can quickly verify the next gen > is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2820: Attachment: YARN-2820.006.patch > Do retry in FileSystemRMStateStore for better error recovery when > update/store failure due to IOException. > -- > > Key: YARN-2820 > URL: https://issues.apache.org/jira/browse/YARN-2820 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0, 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2820.000.patch, YARN-2820.001.patch, > YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, > YARN-2820.005.patch, YARN-2820.006.patch > > > Do retry in FileSystemRMStateStore for better error recovery when > update/store failure due to IOException. > When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We > saw the following IOexception cause the RM shutdown. > {code} > 2014-10-29 23:49:12,202 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Updating info for attempt: appattempt_1409135750325_109118_01 at: > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01 > 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:46,283 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Error updating info for attempt: appattempt_1409135750325_109118_01 > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > 2014-10-29 23:49:46,284 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: > Error storing/updating appAttempt: appattempt_1409135750325_109118_01 > 2014-10-29 23:49:46,916 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: > Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) > > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) > > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:744) > {code} > As discussed at YARN-1778, TestFSRMStateStore failure is also due to > IOException in storeApplicationStateInternal. > Stack trace from TestFSRMStateStore failure: > {code} > 2015-02-03 00:09:19,092 INFO [Thread-110] recovery.TestFSRMStateStore > (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception > org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still > not started >at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876) >at
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339703#comment-14339703 ] Tsuyoshi Ozawa commented on YARN-2820: -- Good catch! Yes, we should retry there also. > Do retry in FileSystemRMStateStore for better error recovery when > update/store failure due to IOException. > -- > > Key: YARN-2820 > URL: https://issues.apache.org/jira/browse/YARN-2820 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0, 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2820.000.patch, YARN-2820.001.patch, > YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, > YARN-2820.005.patch > > > Do retry in FileSystemRMStateStore for better error recovery when > update/store failure due to IOException. > When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We > saw the following IOexception cause the RM shutdown. > {code} > 2014-10-29 23:49:12,202 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Updating info for attempt: appattempt_1409135750325_109118_01 at: > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01 > 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:46,283 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Error updating info for attempt: appattempt_1409135750325_109118_01 > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > 2014-10-29 23:49:46,284 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: > Error storing/updating appAttempt: appattempt_1409135750325_109118_01 > 2014-10-29 23:49:46,916 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: > Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) > > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) > > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:744) > {code} > As discussed at YARN-1778, TestFSRMStateStore failure is also due to > IOException in storeApplicationStateInternal. > Stack trace from TestFSRMStateStore failure: > {code} > 2015-02-03 00:09:19,092 INFO [Thread-110] recovery.TestFSRMStateStore > (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception > org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still > not started >at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.c
[jira] [Updated] (YARN-3262) Surface application outstanding resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3262: -- Attachment: YARN-3262.4.patch fixed the test failures > Surface application outstanding resource requests table > --- > > Key: YARN-3262 > URL: https://issues.apache.org/jira/browse/YARN-3262 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, > YARN-3262.4.patch, resource requests.png > > > It would be useful to surface the outstanding resource requests table on the > application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path
[ https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3269: Attachment: YARN-3269.2.patch modify one of logaggregationService testcases to use the fully qualified path > Yarn.nodemanager.remote-app-log-dir could not be configured to fully > qualified path > --- > > Key: YARN-3269 > URL: https://issues.apache.org/jira/browse/YARN-3269 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3269.1.patch, YARN-3269.2.patch > > > Log aggregation currently is always relative to the default file system, not > an arbitrary file system identified by URI. So we can't put an arbitrary > fully-qualified URI into yarn.nodemanager.remote-app-log-dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339657#comment-14339657 ] Chengbing Liu commented on YARN-3266: - The findbugs warnings are unrelated, caused by YARN-3181 and handled by YARN-3204. > RMContext inactiveNodes should have NodeId as map key > - > > Key: YARN-3266 > URL: https://issues.apache.org/jira/browse/YARN-3266 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: YARN-3266.01.patch, YARN-3266.02.patch > > > Under the default NM port configuration, which is 0, we have observed in the > current version, "lost nodes" count is greater than the length of the lost > node list. This will happen when we consecutively restart the same NM twice: > * NM started at port 10001 > * NM restarted at port 10002 > * NM restarted at port 10003 > * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} has 1 element > * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} still has 1 element > Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), > {{inactiveNodes}} should be of type {{ConcurrentMap}}. If > this will break the current API, then the key string should include the NM's > port as well. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3262) Surface application outstanding resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339656#comment-14339656 ] Hadoop QA commented on YARN-3262: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701240/YARN-3262.3.patch against trunk revision 8ca0d95. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6772//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6772//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6772//console This message is automatically generated. > Surface application outstanding resource requests table > --- > > Key: YARN-3262 > URL: https://issues.apache.org/jira/browse/YARN-3262 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, > resource requests.png > > > It would be useful to surface the outstanding resource requests table on the > application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339625#comment-14339625 ] Hadoop QA commented on YARN-1809: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701229/YARN-1809.13.patch against trunk revision bfbf076. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6770//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6770//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6770//console This message is automatically generated. > Synchronize RM and Generic History Service Web-UIs > -- > > Key: YARN-1809 > URL: https://issues.apache.org/jira/browse/YARN-1809 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhijie Shen >Assignee: Xuan Gong > Attachments: YARN-1809.1.patch, YARN-1809.10.patch, > YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, > YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, > YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, > YARN-1809.9.patch > > > After YARN-953, the web-UI of generic history service is provide more > information than that of RM, the details about app attempt and container. > It's good to provide similar web-UIs, but retrieve the data from separate > source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-3273: Assignee: Rohith > Improve web UI to facilitate scheduling analysis and debugging > -- > > Key: YARN-3273 > URL: https://issues.apache.org/jira/browse/YARN-3273 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Rohith > > Job may be stuck for reasons such as: > - hitting queue capacity > - hitting user-limit, > - hitting AM-resource-percentage > The first queueCapacity is already shown on the UI. > We may surface things like: > - what is user's current usage and user-limit; > - what is the AM resource usage and limit; > - what is the application's current HeadRoom; > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339613#comment-14339613 ] Hadoop QA commented on YARN-2981: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701239/YARN-2981.patch against trunk revision 8ca0d95. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6771//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6771//console This message is automatically generated. > DockerContainerExecutor must support a Cluster-wide default Docker image > > > Key: YARN-2981 > URL: https://issues.apache.org/jira/browse/YARN-2981 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-2981.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.
[ https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339592#comment-14339592 ] zhihai xu commented on YARN-2820: - That is good finding, I double-checked all the FS operations in FileSystemRMStateStore: With your above finding, there is one more missing: which is in closeInternal {code} fs.close(); {code} I will upload a new patch shortly to include retries for all these missing cases. > Do retry in FileSystemRMStateStore for better error recovery when > update/store failure due to IOException. > -- > > Key: YARN-2820 > URL: https://issues.apache.org/jira/browse/YARN-2820 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0, 2.6.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2820.000.patch, YARN-2820.001.patch, > YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, > YARN-2820.005.patch > > > Do retry in FileSystemRMStateStore for better error recovery when > update/store failure due to IOException. > When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We > saw the following IOexception cause the RM shutdown. > {code} > 2014-10-29 23:49:12,202 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Updating info for attempt: appattempt_1409135750325_109118_01 at: > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01 > 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not > complete > /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/ > appattempt_1409135750325_109118_01.new.tmp retrying... > 2014-10-29 23:49:46,283 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore: > Error updating info for attempt: appattempt_1409135750325_109118_01 > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > 2014-10-29 23:49:46,284 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: > Error storing/updating appAttempt: appattempt_1409135750325_109118_01 > 2014-10-29 23:49:46,916 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: > Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > java.io.IOException: Unable to close file because the last block does not > have enough number of replicas. > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132) > > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) > > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) > > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:744) > {code} > As discussed at YARN-1778, TestFSRMStateStore failure is also due to > IOException in storeApplicationStateInternal. > Stack trace from TestFSRMStateStore failure: > {code} > 2015-02-03 00:09:19,092 INFO [Thread-110] recovery.TestFSRMStateStore > (TestFSRMStateStore.ja
[jira] [Created] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
Jian He created YARN-3273: - Summary: Improve web UI to facilitate scheduling analysis and debugging Key: YARN-3273 URL: https://issues.apache.org/jira/browse/YARN-3273 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Job may be stuck for reasons such as: - hitting queue capacity - hitting user-limit, - hitting AM-resource-percentage The first queueCapacity is already shown on the UI. We may surface things like: - what is user's current usage and user-limit; - what is the AM resource usage and limit; - what is the application's current HeadRoom; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3272) Surface container locality info
[ https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3272: -- Issue Type: Improvement (was: Bug) > Surface container locality info > > > Key: YARN-3272 > URL: https://issues.apache.org/jira/browse/YARN-3272 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Jian He > > We can surface the container locality info on the web UI. This is useful to > debug "why my applications are progressing slow", especially when locality is > bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3125: Attachment: YARN-3125_UT-022615.patch Based on [~djp]'s v3 patch, I wrote a simple unit test for distributed shell that helps us verify timeline v2 integration. I added this test into TestDistributedShell, as a test for timeline V2. On my machine this single new test passed, and I can see the successful info from the test logs. So in general our prototype works. However, we do have some (potentially quick, but may be important) problems on running v1 and v2 timeline servers together. On the server side, in this UT, both v1 and v2 servers are launched, with v2 server bind to a predefined port. On the client side, now I've disabled the v1 URL in timeline client. Probably we'd like a switch in our client to set timeline version? I believe now we need to take care of the compatibility issues... > [Event producers] Change distributed shell to use new timeline service > -- > > Key: YARN-3125 > URL: https://issues.apache.org/jira/browse/YARN-3125 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Junping Du > Attachments: YARN-3125.patch, YARN-3125_UT-022615.patch, > YARN-3125v2.patch, YARN-3125v3.patch > > > We can start with changing distributed shell to use new timeline service once > the framework is completed, in which way we can quickly verify the next gen > is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3272) Surface container locality info
Jian He created YARN-3272: - Summary: Surface container locality info Key: YARN-3272 URL: https://issues.apache.org/jira/browse/YARN-3272 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He We can surface the container locality info on the web UI. This is useful to debug "why my applications are progressing slow", especially when locality is bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3262) Surface application outstanding resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3262: -- Summary: Surface application outstanding resource requests table (was: Suface application outstanding resource requests table) > Surface application outstanding resource requests table > --- > > Key: YARN-3262 > URL: https://issues.apache.org/jira/browse/YARN-3262 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, > resource requests.png > > > It would be useful to surface the outstanding resource requests table on the > application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3262) Suface application outstanding resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3262: -- Summary: Suface application outstanding resource requests table (was: Suface application resource requests table) > Suface application outstanding resource requests table > -- > > Key: YARN-3262 > URL: https://issues.apache.org/jira/browse/YARN-3262 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, > resource requests.png > > > It would be useful to surface the outstanding resource requests table on the > application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3262) Suface application resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3262: -- Attachment: YARN-3262.3.patch thanks for reviewing the patch, Wangda ! Addressed all the comments > Suface application resource requests table > -- > > Key: YARN-3262 > URL: https://issues.apache.org/jira/browse/YARN-3262 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, > resource requests.png > > > It would be useful to surface the outstanding resource requests table on the > application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image
[ https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-2981: -- Attachment: YARN-2981.patch This introduced a cluster-default docker image, and limits memory, cpu, and user for the container. > DockerContainerExecutor must support a Cluster-wide default Docker image > > > Key: YARN-2981 > URL: https://issues.apache.org/jira/browse/YARN-2981 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abin Shahab >Assignee: Abin Shahab > Attachments: YARN-2981.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options
[ https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339547#comment-14339547 ] Hudson commented on YARN-3255: -- FAILURE: Integrated in Hadoop-trunk-Commit #7215 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7215/]) YARN-3255. RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options. Contributed by Konstantin Shvachko. (shv: rev 8ca0d957c4b1076e801e1cdce5b44aa805de889c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java > RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support > generic options > --- > > Key: YARN-3255 > URL: https://issues.apache.org/jira/browse/YARN-3255 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.5.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 2.7.0 > > Attachments: YARN-3255-01.patch, YARN-3255-02.patch, > YARN-3255-branch-2.patch > > > Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore > generic options, like {{-conf}} and {{-fs}}. It would be good to have the > ability to pass generic options in order to specify configuration files or > the NameNode location, when the services start through {{main()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options
[ https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved YARN-3255. --- Resolution: Fixed Fix Version/s: 2.7.0 I just committed this. Thank you guys for prompt reviews. > RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support > generic options > --- > > Key: YARN-3255 > URL: https://issues.apache.org/jira/browse/YARN-3255 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.5.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 2.7.0 > > Attachments: YARN-3255-01.patch, YARN-3255-02.patch, > YARN-3255-branch-2.patch > > > Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore > generic options, like {{-conf}} and {{-fs}}. It would be good to have the > ability to pass generic options in order to specify configuration files or > the NameNode location, when the services start through {{main()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options
[ https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated YARN-3255: -- Attachment: YARN-3255-branch-2.patch Patch for branch-2. Minor difference in import section with trunk. > RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support > generic options > --- > > Key: YARN-3255 > URL: https://issues.apache.org/jira/browse/YARN-3255 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.5.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Attachments: YARN-3255-01.patch, YARN-3255-02.patch, > YARN-3255-branch-2.patch > > > Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore > generic options, like {{-conf}} and {{-fs}}. It would be good to have the > ability to pass generic options in order to specify configuration files or > the NameNode location, when the services start through {{main()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3251) Fix CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-3251. -- Resolution: Fixed Fix Version/s: 2.6.1 Hadoop Flags: Reviewed Just compiled and ran all tests in CapacityScheduler, committed to branch-2.6. Thanks [~cwelch] and also reviews from [~jlowe], [~sunilg] and [~vinodkv]. > Fix CapacityScheduler deadlock when computing absolute max avail capacity > (short term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Fix For: 2.6.1 > > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, > YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) Fix CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3251: - Summary: Fix CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1) (was: CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)) > Fix CapacityScheduler deadlock when computing absolute max avail capacity > (short term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, > YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1809: Attachment: YARN-1809.13.patch Upload a new patch to address zhijie's comment. Test the patch in both secure and un-secure cluster. > Synchronize RM and Generic History Service Web-UIs > -- > > Key: YARN-1809 > URL: https://issues.apache.org/jira/browse/YARN-1809 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhijie Shen >Assignee: Xuan Gong > Attachments: YARN-1809.1.patch, YARN-1809.10.patch, > YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, > YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, > YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, > YARN-1809.9.patch > > > After YARN-953, the web-UI of generic history service is provide more > information than that of RM, the details about app attempt and container. > It's good to provide similar web-UIs, but retrieve the data from separate > source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile
[ https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339483#comment-14339483 ] Hadoop QA commented on YARN-3080: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701213/YARN-3080.patch against trunk revision bfbf076. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6769//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6769//console This message is automatically generated. > The DockerContainerExecutor could not write the right pid to container pidFile > -- > > Key: YARN-3080 > URL: https://issues.apache.org/jira/browse/YARN-3080 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Beckham007 >Assignee: Abin Shahab > Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, > YARN-3080.patch > > > The docker_container_executor_session.sh is like this: > {quote} > #!/usr/bin/env bash > echo `/usr/bin/docker inspect --format {{.State.Pid}} > container_1421723685222_0008_01_02` > > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp > /bin/mv -f > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp > > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid > /usr/bin/docker run --rm --name container_1421723685222_0008_01_02 -e > GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e > GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e > GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e > GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M > --cpu-shares=1024 -v > /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02 > -v > /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02 > -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash > "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh" > {quote} > The DockerContainerExecutor use docker inspect before docker run, so the > docker inspect couldn't get the right pid for the docker, signalContainer() > and nm restart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339479#comment-14339479 ] Wangda Tan commented on YARN-3251: -- Checking this into branch-2.6 > CapacityScheduler deadlock when computing absolute max avail capacity (short > term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, > YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3262) Suface application resource requests table
[ https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339471#comment-14339471 ] Wangda Tan commented on YARN-3262: -- Hi [~jianhe], Thanks for working on this, it will be very helpful! Took a look at your patch, overall looks good to me, 2 minor comments: 1) getAllResourceRequests could be a method in AbstractYarnScheduler, I feel some other places will use that, and we don't have to duplicate the implementation every where. 2) You can add a "total-outstanding-resource" in app page as well, it should be sum of all ANY resource-request.capacity. > Suface application resource requests table > -- > > Key: YARN-3262 > URL: https://issues.apache.org/jira/browse/YARN-3262 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-3262.1.patch, YARN-3262.2.patch, resource > requests.png > > > It would be useful to surface the outstanding resource requests table on the > application web page to facilitate scheduling analysis and debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile
[ https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-3080: -- Attachment: YARN-3080.patch Updated callable to runnable. > The DockerContainerExecutor could not write the right pid to container pidFile > -- > > Key: YARN-3080 > URL: https://issues.apache.org/jira/browse/YARN-3080 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Beckham007 >Assignee: Abin Shahab > Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, > YARN-3080.patch > > > The docker_container_executor_session.sh is like this: > {quote} > #!/usr/bin/env bash > echo `/usr/bin/docker inspect --format {{.State.Pid}} > container_1421723685222_0008_01_02` > > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp > /bin/mv -f > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp > > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid > /usr/bin/docker run --rm --name container_1421723685222_0008_01_02 -e > GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e > GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e > GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e > GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M > --cpu-shares=1024 -v > /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02 > -v > /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02 > -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash > "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh" > {quote} > The DockerContainerExecutor use docker inspect before docker run, so the > docker inspect couldn't get the right pid for the docker, signalContainer() > and nm restart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339418#comment-14339418 ] Hadoop QA commented on YARN-3231: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701186/YARN-3231.v4.patch against trunk revision c6d5b37. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6767//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6767//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6767//console This message is automatically generated. > FairScheduler changing queueMaxRunningApps on the fly will cause all pending > job stuck > -- > > Key: YARN-3231 > URL: https://issues.apache.org/jira/browse/YARN-3231 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, > YARN-3231.v3.patch, YARN-3231.v4.patch > > > When a queue is piling up with a lot of pending jobs due to the > maxRunningApps limit. We want to increase this property on the fly to make > some of the pending job active. However, once we increase the limit, all > pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options
[ https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339394#comment-14339394 ] Tsuyoshi Ozawa commented on YARN-3255: -- The warnings by findbugs are not related to the modification. Checking this in. > RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support > generic options > --- > > Key: YARN-3255 > URL: https://issues.apache.org/jira/browse/YARN-3255 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.5.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Attachments: YARN-3255-01.patch, YARN-3255-02.patch > > > Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore > generic options, like {{-conf}} and {{-fs}}. It would be good to have the > ability to pass generic options in order to specify configuration files or > the NameNode location, when the services start through {{main()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li reassigned YARN-3267: -- Assignee: Chang Li > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339386#comment-14339386 ] Hadoop QA commented on YARN-3122: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701190/YARN-3122.004.patch against trunk revision 1047c88. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6768//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6768//console This message is automatically generated. > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.prelim.patch, > YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339382#comment-14339382 ] Zhijie Shen commented on YARN-3087: --- +1. the last patch looks good to me. Will commit > [Aggregator implementation] the REST server (web server) for per-node > aggregator does not work if it runs inside node manager > - > > Key: YARN-3087 > URL: https://issues.apache.org/jira/browse/YARN-3087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > Fix For: YARN-2928 > > Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, > YARN-3087-022615.patch > > > This is related to YARN-3030. YARN-3030 sets up a per-node timeline > aggregator and the associated REST server. It runs fine as a standalone > process, but does not work if it runs inside the node manager due to possible > collisions of servlet mapping. > Exception: > {noformat} > org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for > v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3267: --- Assignee: (was: Chang Li) > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3122: Attachment: YARN-3122.004.patch Modified CPU usage to be percent per core and the corresponding metric also to be percent per core. Thus 2 cores used up should report as 200% Added doc comments > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.prelim.patch, > YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3231: -- Attachment: YARN-3231.v4.patch > FairScheduler changing queueMaxRunningApps on the fly will cause all pending > job stuck > -- > > Key: YARN-3231 > URL: https://issues.apache.org/jira/browse/YARN-3231 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, > YARN-3231.v3.patch, YARN-3231.v4.patch > > > When a queue is piling up with a lot of pending jobs due to the > maxRunningApps limit. We want to increase this property on the fly to make > some of the pending job active. However, once we increase the limit, all > pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order
[ https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339270#comment-14339270 ] Jian He commented on YARN-3222: --- looks good to me. while looking at this, may found another bug; NODE_USABLE event is sent regardless the reconnected node is healthy or not healthy, which is incorrect, right ? {code} rmNode.context.getDispatcher().getEventHandler().handle( new NodesListManagerEvent( NodesListManagerEventType.NODE_USABLE, rmNode)); {code} > RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential > order > --- > > Key: YARN-3222 > URL: https://issues.apache.org/jira/browse/YARN-3222 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Rohith >Assignee: Rohith >Priority: Critical > Attachments: 0001-YARN-3222.patch > > > When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the > scheduler in a events node_added,node_removed or node_resource_update. These > events should be notified in an sequential order i.e node_added event and > next node_resource_update events. > But if the node is reconnected with different http port, the oder of > scheduler events are node_removed --> node_resource_update --> node_added > which causes scheduler does not find the node and throw NPE and RM exit. > Node_Resource_update event should be always should be triggered via > RMNodeEventType.RESOURCE_UPDATE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339264#comment-14339264 ] Karthik Kambatla commented on YARN-3231: Filed YARN-3271 to move these tests. I am okay with moving these too as part of that. I will be glad to review that JIRA too, should anyone want to pick it up. bq. For 6.3, I don't think there is a problem with "maxRunnableApps for a user or queue is decreased". Would be nice to add the tests even if there is no problem. Seems like a logical extension of what the latest patch is doing here. > FairScheduler changing queueMaxRunningApps on the fly will cause all pending > job stuck > -- > > Key: YARN-3231 > URL: https://issues.apache.org/jira/browse/YARN-3231 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, > YARN-3231.v3.patch > > > When a queue is piling up with a lot of pending jobs due to the > maxRunningApps limit. We want to increase this property on the fly to make > some of the pending job active. However, once we increase the limit, all > pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability
Karthik Kambatla created YARN-3271: -- Summary: FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability Key: YARN-3271 URL: https://issues.apache.org/jira/browse/YARN-3271 Project: Hadoop YARN Issue Type: Improvement Reporter: Karthik Kambatla -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339260#comment-14339260 ] Hadoop QA commented on YARN-2777: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701151/YARN-2777.002.patch against trunk revision dce8b9c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapred.TestMiniMRClientCluster org.apache.hadoop.mapred.TestMRTimelineEventHandling org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler org.apache.hadoop.mapred.TestJobCleanup The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapred.TestMRIntermediateDataEncryption org.apache.hadoop.mapred.lib.Tests org.apache.hadoop.mapred.TestCombineOutputCollector org.apache.hadoop.mapred.lib.TestMultipleInTests org.apache.hadoop.mapreduce.Tests org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6761//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6761//console This message is automatically generated. > Mark the end of individual log in aggregated log > > > Key: YARN-2777 > URL: https://issues.apache.org/jira/browse/YARN-2777 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Varun Saxena > Labels: log-aggregation > Attachments: YARN-2777.001.patch, YARN-2777.002.patch > > > Below is snippet of aggregated log showing hbase master log: > {code} > LogType: hbase-hbase-master-ip-172-31-34-167.log > LogUploadTime: 29-Oct-2014 22:31:55 > LogLength: 24103045 > Log Contents: > Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 > ... > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) > at org.apache.hadoop.hbase.Chore.run(Chore.java:80) > at java.lang.Thread.run(Thread.java:745) > LogType: hbase-hbase-master-ip-172-31-34-167.out > {code} > Since logs from various daemons are aggregated in one log file, it would be > desirable to mark the end of one log before starting with the next. > e.g. with such a line: > {code} > End of LogType: hbase-hbase-master-ip-172-31-34-167.log > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339246#comment-14339246 ] Zhijie Shen commented on YARN-3125: --- Thanks for the patch, Junping! It looks good to me. Per offline discussion, we should add an integration test in TestDistributedShell. > [Event producers] Change distributed shell to use new timeline service > -- > > Key: YARN-3125 > URL: https://issues.apache.org/jira/browse/YARN-3125 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Junping Du > Attachments: YARN-3125.patch, YARN-3125v2.patch, YARN-3125v3.patch > > > We can start with changing distributed shell to use new timeline service once > the framework is completed, in which way we can quickly verify the next gen > is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339248#comment-14339248 ] Hadoop QA commented on YARN-3087: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701178/YARN-3087-022615.patch against trunk revision c6d5b37. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6766//console This message is automatically generated. > [Aggregator implementation] the REST server (web server) for per-node > aggregator does not work if it runs inside node manager > - > > Key: YARN-3087 > URL: https://issues.apache.org/jira/browse/YARN-3087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > Fix For: YARN-2928 > > Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, > YARN-3087-022615.patch > > > This is related to YARN-3030. YARN-3030 sets up a per-node timeline > aggregator and the associated REST server. It runs fine as a standalone > process, but does not work if it runs inside the node manager due to possible > collisions of servlet mapping. > Exception: > {noformat} > org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for > v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339237#comment-14339237 ] Hadoop QA commented on YARN-3122: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701167/YARN-3122.003.patch against trunk revision 2214dab. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6765//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6765//console This message is automatically generated. > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339233#comment-14339233 ] Li Lu commented on YARN-3087: - Hi [~djp], thanks for the comments! I agree that we may want to use generic types to solve the problem. Similar code also appear in v1 timeline object model, so maybe we'd like to fix both together? If that's the case we may open a separate JIRA to trace this. > [Aggregator implementation] the REST server (web server) for per-node > aggregator does not work if it runs inside node manager > - > > Key: YARN-3087 > URL: https://issues.apache.org/jira/browse/YARN-3087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > Fix For: YARN-2928 > > Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, > YARN-3087-022615.patch > > > This is related to YARN-3030. YARN-3030 sets up a per-node timeline > aggregator and the associated REST server. It runs fine as a standalone > process, but does not work if it runs inside the node manager due to possible > collisions of servlet mapping. > Exception: > {noformat} > org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for > v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3087: Attachment: YARN-3087-022615.patch Updated my patch according to [~zjshen]'s comments. Addressed points 1-3. Point 4 is caused by a limitation of HttpServer2 for now. We may want to decide if we want to fix that on our side, or add support to this use case on the HttpServer2 side. For now, I think we can temporarily use our current way to make the prototype work. > [Aggregator implementation] the REST server (web server) for per-node > aggregator does not work if it runs inside node manager > - > > Key: YARN-3087 > URL: https://issues.apache.org/jira/browse/YARN-3087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > Fix For: YARN-2928 > > Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, > YARN-3087-022615.patch > > > This is related to YARN-3030. YARN-3030 sets up a per-node timeline > aggregator and the associated REST server. It runs fine as a standalone > process, but does not work if it runs inside the node manager due to possible > collisions of servlet mapping. > Exception: > {noformat} > org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for > v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339214#comment-14339214 ] Siqi Li commented on YARN-3231: --- Hi [~ka...@cloudera.com], thanks for your feedback. I have updated a new patch which addressed all your comment except 6.1 and 6.3. For 6.1, it seems that there are other test cases that also might be qualified for moving to TestAppRunnability, it would be good to do a larger refactor of TestFairScheduler into TestAppRunnability. For 6.3, I don't think there is a problem with "maxRunnableApps for a user or queue is decreased". > FairScheduler changing queueMaxRunningApps on the fly will cause all pending > job stuck > -- > > Key: YARN-3231 > URL: https://issues.apache.org/jira/browse/YARN-3231 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, > YARN-3231.v3.patch > > > When a queue is piling up with a lot of pending jobs due to the > maxRunningApps limit. We want to increase this property on the fly to make > some of the pending job active. However, once we increase the limit, all > pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339201#comment-14339201 ] Hadoop QA commented on YARN-3231: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701160/YARN-3231.v3.patch against trunk revision f0c980a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6763//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6763//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6763//console This message is automatically generated. > FairScheduler changing queueMaxRunningApps on the fly will cause all pending > job stuck > -- > > Key: YARN-3231 > URL: https://issues.apache.org/jira/browse/YARN-3231 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, > YARN-3231.v3.patch > > > When a queue is piling up with a lot of pending jobs due to the > maxRunningApps limit. We want to increase this property on the fly to make > some of the pending job active. However, once we increase the limit, all > pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339196#comment-14339196 ] Hadoop QA commented on YARN-3270: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701163/YARN-3270.patch against trunk revision 2214dab. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6764//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6764//console This message is automatically generated. > node label expression not getting set in ApplicationSubmissionContext > - > > Key: YARN-3270 > URL: https://issues.apache.org/jira/browse/YARN-3270 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Priority: Minor > Attachments: YARN-3270.patch > > > One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not > setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path
[ https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339170#comment-14339170 ] Vinod Kumar Vavilapalli commented on YARN-3269: --- Can you modify one of the tests to use a fully qualified patch, in order to 'prove' that this patch works? > Yarn.nodemanager.remote-app-log-dir could not be configured to fully > qualified path > --- > > Key: YARN-3269 > URL: https://issues.apache.org/jira/browse/YARN-3269 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3269.1.patch > > > Log aggregation currently is always relative to the default file system, not > an arbitrary file system identified by URI. So we can't put an arbitrary > fully-qualified URI into yarn.nodemanager.remote-app-log-dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage
[ https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3122: Attachment: YARN-3122.003.patch Addressed feedback > Metrics for container's actual CPU usage > > > Key: YARN-3122 > URL: https://issues.apache.org/jira/browse/YARN-3122 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3122.001.patch, YARN-3122.002.patch, > YARN-3122.003.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch > > > It would be nice to capture resource usage per container, for a variety of > reasons. This JIRA is to track CPU usage. > YARN-2965 tracks the resource usage on the node, and the two implementations > should reuse code as much as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339140#comment-14339140 ] Hadoop QA commented on YARN-3251: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701150/YARN-3251.2.patch against trunk revision dce8b9c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6760//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6760//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6760//console This message is automatically generated. > CapacityScheduler deadlock when computing absolute max avail capacity (short > term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, > YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
Rohit Agarwal created YARN-3270: --- Summary: node label expression not getting set in ApplicationSubmissionContext Key: YARN-3270 URL: https://issues.apache.org/jira/browse/YARN-3270 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rohit Agarwal Priority: Minor One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Agarwal updated YARN-3270: Attachment: YARN-3270.patch Attached the patch. > node label expression not getting set in ApplicationSubmissionContext > - > > Key: YARN-3270 > URL: https://issues.apache.org/jira/browse/YARN-3270 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Priority: Minor > Attachments: YARN-3270.patch > > > One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not > setting the {{appLabelExpression}} passed to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path
[ https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339084#comment-14339084 ] Hadoop QA commented on YARN-3269: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701154/YARN-3269.1.patch against trunk revision dce8b9c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6762//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6762//console This message is automatically generated. > Yarn.nodemanager.remote-app-log-dir could not be configured to fully > qualified path > --- > > Key: YARN-3269 > URL: https://issues.apache.org/jira/browse/YARN-3269 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3269.1.patch > > > Log aggregation currently is always relative to the default file system, not > an arbitrary file system identified by URI. So we can't put an arbitrary > fully-qualified URI into yarn.nodemanager.remote-app-log-dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.
[ https://issues.apache.org/jira/browse/YARN-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-3268: --- Assignee: (was: Chang Li) > timelineserver rest api returns html page for 404 when a bad endpoint is used. > -- > > Key: YARN-3268 > URL: https://issues.apache.org/jira/browse/YARN-3268 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran > > the timelineserver returns a 404 page instead of giving a REST response. this > interferes with the end user pages which try to retrieve data using REST api. > this could be due to lack of a 404 handler > ex. > http://timelineserver:8188/badnamespace/v1/timeline/someentity -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3231: -- Attachment: YARN-3231.v3.patch > FairScheduler changing queueMaxRunningApps on the fly will cause all pending > job stuck > -- > > Key: YARN-3231 > URL: https://issues.apache.org/jira/browse/YARN-3231 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, > YARN-3231.v3.patch > > > When a queue is piling up with a lot of pending jobs due to the > maxRunningApps limit. We want to increase this property on the fly to make > some of the pending job active. However, once we increase the limit, all > pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li reassigned YARN-3267: -- Assignee: Chang Li > Timelineserver applies the ACL rules after applying the limit on the number > of records > -- > > Key: YARN-3267 > URL: https://issues.apache.org/jira/browse/YARN-3267 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Prakash Ramachandran >Assignee: Chang Li > > While fetching the entities from timelineserver, the limit is applied on the > entities to be fetched from leveldb, the ACL filters are applied after this > (TimelineDataManager.java::getEntities). > this could mean that even if there are entities available which match the > query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.
[ https://issues.apache.org/jira/browse/YARN-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li reassigned YARN-3268: -- Assignee: Chang Li > timelineserver rest api returns html page for 404 when a bad endpoint is used. > -- > > Key: YARN-3268 > URL: https://issues.apache.org/jira/browse/YARN-3268 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Chang Li > > the timelineserver returns a 404 page instead of giving a REST response. this > interferes with the end user pages which try to retrieve data using REST api. > this could be due to lack of a 404 handler > ex. > http://timelineserver:8188/badnamespace/v1/timeline/someentity -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339060#comment-14339060 ] Zhijie Shen commented on YARN-3087: --- Thanks for the patch, Li! Some detailed comments about the patch: 1. HierarchicalTimelineEntity is abstract, maybe not necessary. {code} // required by JAXB HierarchicalTimelineEntity() { super(); } {code} 2. Can we mark JAXB methods \@Private? 3. I think rootUnwrapping should be true to be consistent with YarnJacksonJaxbJsonProvider. It seems JAXBContextResolver is never used (I think the reason is that we are using YarnJacksonJaxbJsonProvider), maybe we want to remove the class. {code} this.context = new JSONJAXBContext(JSONConfiguration.natural().rootUnwrapping(false) .build(), cTypes) {code} 4. Does it mean if we want to add a filter, we need to hard code here? So "hadoop.http.filter.initializers" no longer work? Is it possible to provide some similar mechanism to replace what "hadoop.http.filter.initializers" does if it doesn't work. {code} 121 // TODO: replace this by an authentification filter in future. 122 HashMap options = new HashMap(); 123 String username = conf.get(HADOOP_HTTP_STATIC_USER, 124 DEFAULT_HADOOP_HTTP_STATIC_USER); 125 options.put(HADOOP_HTTP_STATIC_USER, username); 126 HttpServer2.defineFilter(timelineRestServer.getWebAppContext(), 127 "static_user_filter_timeline", 128 StaticUserWebFilter.StaticUserFilter.class.getName(), 129 options, new String[] {"/*"}); {code} > [Aggregator implementation] the REST server (web server) for per-node > aggregator does not work if it runs inside node manager > - > > Key: YARN-3087 > URL: https://issues.apache.org/jira/browse/YARN-3087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > Fix For: YARN-2928 > > Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch > > > This is related to YARN-3030. YARN-3030 sets up a per-node timeline > aggregator and the associated REST server. It runs fine as a standalone > process, but does not work if it runs inside the node manager due to possible > collisions of servlet mapping. > Exception: > {noformat} > org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for > v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339015#comment-14339015 ] Junping Du commented on YARN-3087: -- Agree with Vinod that if this is required from JAXB API then we don't have to cast it. Thanks [~gtCarrera9] for explanation on this! Patch looks good to me in overall. One comments is: we have many similar logic to cast a MAP to HashMap like below: {code} -this.relatedEntities = relatedEntities; +if (relatedEntities != null && !(relatedEntities instanceof HashMap)) { + this.relatedEntities = new HashMap>(relatedEntities); +} else { + this.relatedEntities = (HashMap>) relatedEntities; +} {code} May be we can use Generics to consolidate them. > [Aggregator implementation] the REST server (web server) for per-node > aggregator does not work if it runs inside node manager > - > > Key: YARN-3087 > URL: https://issues.apache.org/jira/browse/YARN-3087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > Fix For: YARN-2928 > > Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch > > > This is related to YARN-3030. YARN-3030 sets up a per-node timeline > aggregator and the associated REST server. It runs fine as a standalone > process, but does not work if it runs inside the node manager due to possible > collisions of servlet mapping. > Exception: > {noformat} > org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for > v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes
[ https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339012#comment-14339012 ] Vinod Kumar Vavilapalli commented on YARN-3025: --- Coming in very late, apologies. Some comments: - Echoing Bikas's first comment: Today the AMs are expected to maintain their own scheduling state. With this you are changing that - part of the scheduling state will be remembered but the remaining isn't. We should clearly draw a line somewhere, what is it? - [~zjshen] did a very good job of dividing the persistence concerns, but what is the guarantee that is given to the app writers? "I'll return the list of blacklisted nodes whenever I can, but shoot I died, so I can't help you much" is not going to cut it. If we want reliable notifications, we should build a protocol between AM and RM about the persistence of the blacklisted node list - too much of a complexity if you ask me. Why not leave it to the apps? - The blacklist information is per application-attempt, and scheduler will forget previous application-attempts today. So as I understand it, the patch doesn't work. > Provide API for retrieving blacklisted nodes > > > Key: YARN-3025 > URL: https://issues.apache.org/jira/browse/YARN-3025 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt > > > We have the following method which updates blacklist: > {code} > public synchronized void updateBlacklist(List blacklistAdditions, > List blacklistRemovals) { > {code} > Upon AM failover, there should be an API which returns the blacklisted nodes > so that the new AM can make consistent decisions. > The new API can be: > {code} > public synchronized List getBlacklistedNodes() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path
[ https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3269: Attachment: YARN-3269.1.patch > Yarn.nodemanager.remote-app-log-dir could not be configured to fully > qualified path > --- > > Key: YARN-3269 > URL: https://issues.apache.org/jira/browse/YARN-3269 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-3269.1.patch > > > Log aggregation currently is always relative to the default file system, not > an arbitrary file system identified by URI. So we can't put an arbitrary > fully-qualified URI into yarn.nodemanager.remote-app-log-dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path
Xuan Gong created YARN-3269: --- Summary: Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path Key: YARN-3269 URL: https://issues.apache.org/jira/browse/YARN-3269 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Log aggregation currently is always relative to the default file system, not an arbitrary file system identified by URI. So we can't put an arbitrary fully-qualified URI into yarn.nodemanager.remote-app-log-dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338975#comment-14338975 ] Ted Yu commented on YARN-2777: -- lgtm > Mark the end of individual log in aggregated log > > > Key: YARN-2777 > URL: https://issues.apache.org/jira/browse/YARN-2777 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Varun Saxena > Labels: log-aggregation > Attachments: YARN-2777.001.patch, YARN-2777.002.patch > > > Below is snippet of aggregated log showing hbase master log: > {code} > LogType: hbase-hbase-master-ip-172-31-34-167.log > LogUploadTime: 29-Oct-2014 22:31:55 > LogLength: 24103045 > Log Contents: > Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 > ... > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) > at org.apache.hadoop.hbase.Chore.run(Chore.java:80) > at java.lang.Thread.run(Thread.java:745) > LogType: hbase-hbase-master-ip-172-31-34-167.out > {code} > Since logs from various daemons are aggregated in one log file, it would be > desirable to mark the end of one log before starting with the next. > e.g. with such a line: > {code} > End of LogType: hbase-hbase-master-ip-172-31-34-167.log > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338968#comment-14338968 ] Varun Saxena commented on YARN-2777: [~tedyu], made the change. Kindly review > Mark the end of individual log in aggregated log > > > Key: YARN-2777 > URL: https://issues.apache.org/jira/browse/YARN-2777 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Varun Saxena > Labels: log-aggregation > Attachments: YARN-2777.001.patch, YARN-2777.002.patch > > > Below is snippet of aggregated log showing hbase master log: > {code} > LogType: hbase-hbase-master-ip-172-31-34-167.log > LogUploadTime: 29-Oct-2014 22:31:55 > LogLength: 24103045 > Log Contents: > Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 > ... > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) > at org.apache.hadoop.hbase.Chore.run(Chore.java:80) > at java.lang.Thread.run(Thread.java:745) > LogType: hbase-hbase-master-ip-172-31-34-167.out > {code} > Since logs from various daemons are aggregated in one log file, it would be > desirable to mark the end of one log before starting with the next. > e.g. with such a line: > {code} > End of LogType: hbase-hbase-master-ip-172-31-34-167.log > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338967#comment-14338967 ] Varun Saxena commented on YARN-2777: [~tedyu], made the change. Kindly review > Mark the end of individual log in aggregated log > > > Key: YARN-2777 > URL: https://issues.apache.org/jira/browse/YARN-2777 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Varun Saxena > Labels: log-aggregation > Attachments: YARN-2777.001.patch, YARN-2777.002.patch > > > Below is snippet of aggregated log showing hbase master log: > {code} > LogType: hbase-hbase-master-ip-172-31-34-167.log > LogUploadTime: 29-Oct-2014 22:31:55 > LogLength: 24103045 > Log Contents: > Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 > ... > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) > at org.apache.hadoop.hbase.Chore.run(Chore.java:80) > at java.lang.Thread.run(Thread.java:745) > LogType: hbase-hbase-master-ip-172-31-34-167.out > {code} > Since logs from various daemons are aggregated in one log file, it would be > desirable to mark the end of one log before starting with the next. > e.g. with such a line: > {code} > End of LogType: hbase-hbase-master-ip-172-31-34-167.log > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-2777: --- Attachment: YARN-2777.002.patch > Mark the end of individual log in aggregated log > > > Key: YARN-2777 > URL: https://issues.apache.org/jira/browse/YARN-2777 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Varun Saxena > Labels: log-aggregation > Attachments: YARN-2777.001.patch, YARN-2777.002.patch > > > Below is snippet of aggregated log showing hbase master log: > {code} > LogType: hbase-hbase-master-ip-172-31-34-167.log > LogUploadTime: 29-Oct-2014 22:31:55 > LogLength: 24103045 > Log Contents: > Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 > ... > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) > at org.apache.hadoop.hbase.Chore.run(Chore.java:80) > at java.lang.Thread.run(Thread.java:745) > LogType: hbase-hbase-master-ip-172-31-34-167.out > {code} > Since logs from various daemons are aggregated in one log file, it would be > desirable to mark the end of one log before starting with the next. > e.g. with such a line: > {code} > End of LogType: hbase-hbase-master-ip-172-31-34-167.log > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI
[ https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338950#comment-14338950 ] Vinod Kumar Vavilapalli commented on YARN-3248: --- YARN-3025 is related to my first comment above. > Display count of nodes blacklisted by apps in the web UI > > > Key: YARN-3248 > URL: https://issues.apache.org/jira/browse/YARN-3248 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Screenshot.jpg, apache-yarn-3248.0.patch > > > It would be really useful when debugging app performance and failure issues > to get a count of the nodes blacklisted by individual apps displayed in the > web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.2.patch Attaching an analogue of the most recent patch against trunk. I do not believe that we will be committing this at this point as [~leftnoteasy] is working on a more significant change which will remove the need for it, but I wanted to make it available just in case. For clarity, patch against trunk is YARN-3251.2.patch and the patch to commit against 2.6 is YARN-3251.2-6-0.4.patch. > CapacityScheduler deadlock when computing absolute max avail capacity (short > term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, > YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338945#comment-14338945 ] Vinod Kumar Vavilapalli commented on YARN-3087: --- bq. The current solution is a workaround for JAXB resolver, which cannot return an interface (Map) type. This work around is consistent with the v1 version of our ATS object model (in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/, such as TimelineEvent.java, TimelineEntity.java), fixed in YARN-2804. If it's needed, maybe we'd like to keep the declarations to be Map, and do the cast in the jaxb getter? Casting it everytime will be expensive. Let's keep it as the patch currently does - we are not exposing the fact that it is a HashMap to external world, only to Jersey. > [Aggregator implementation] the REST server (web server) for per-node > aggregator does not work if it runs inside node manager > - > > Key: YARN-3087 > URL: https://issues.apache.org/jira/browse/YARN-3087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > Fix For: YARN-2928 > > Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch > > > This is related to YARN-3030. YARN-3030 sets up a per-node timeline > aggregator and the associated REST server. It runs fine as a standalone > process, but does not work if it runs inside the node manager due to possible > collisions of servlet mapping. > Exception: > {noformat} > org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for > v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338914#comment-14338914 ] Hadoop QA commented on YARN-2693: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701122/0006-YARN-2693.patch against trunk revision dce8b9c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1152 javac compiler warnings (more than the trunk's current 1151 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 7 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6758//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6758//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6758//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6758//console This message is automatically generated. > Priority Label Manager in RM to manage application priority based on > configuration > -- > > Key: YARN-2693 > URL: https://issues.apache.org/jira/browse/YARN-2693 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, > 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, > 0006-YARN-2693.patch > > > Focus of this JIRA is to have a centralized service to handle priority labels. > Support operations such as > * Add/Delete priority label to a specified queue > * Manage integer mapping associated with each priority label > * Support managing default priority label of a given queue > * Expose interface to RM to validate priority label > TO have simplified interface, Priority Manager will support only > configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.2-6-0.4.patch Minor, switch to "Internal", seems to be more common in the codebase > CapacityScheduler deadlock when computing absolute max avail capacity (short > term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, > YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager
[ https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338867#comment-14338867 ] Li Lu commented on YARN-3087: - Hi [~djp], thanks for the feedback! I totally understand your concern here. The current solution is a workaround for JAXB resolver, which cannot return an interface (Map) type. This work around is consistent with the v1 version of our ATS object model (in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/, such as TimelineEvent.java, TimelineEntity.java), fixed in YARN-2804. If it's needed, maybe we'd like to keep the declarations to be Map, and do the cast in the jaxb getter? > [Aggregator implementation] the REST server (web server) for per-node > aggregator does not work if it runs inside node manager > - > > Key: YARN-3087 > URL: https://issues.apache.org/jira/browse/YARN-3087 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Li Lu > Fix For: YARN-2928 > > Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch > > > This is related to YARN-3030. YARN-3030 sets up a per-node timeline > aggregator and the associated REST server. It runs fine as a standalone > process, but does not work if it runs inside the node manager due to possible > collisions of servlet mapping. > Exception: > {noformat} > org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for > v2 not found > at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232) > at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338860#comment-14338860 ] Wangda Tan commented on YARN-3251: -- if opposite opinions -> if no opposite opinions > CapacityScheduler deadlock when computing absolute max avail capacity (short > term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, > YARN-3251.2-6-0.3.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338858#comment-14338858 ] Wangda Tan commented on YARN-3251: -- LGTM +1, I will commit the patch to branch-2.6 this afternoon if opposite opinions. Thanks! > CapacityScheduler deadlock when computing absolute max avail capacity (short > term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, > YARN-3251.2-6-0.3.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338848#comment-14338848 ] Craig Welch commented on YARN-3251: --- Sorry if that wasn't clear, to reduce risk removed the minor changes in CSQueueUtils > CapacityScheduler deadlock when computing absolute max avail capacity (short > term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, > YARN-3251.2-6-0.3.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.
Prakash Ramachandran created YARN-3268: -- Summary: timelineserver rest api returns html page for 404 when a bad endpoint is used. Key: YARN-3268 URL: https://issues.apache.org/jira/browse/YARN-3268 Project: Hadoop YARN Issue Type: Bug Reporter: Prakash Ramachandran the timelineserver returns a 404 page instead of giving a REST response. this interferes with the end user pages which try to retrieve data using REST api. this could be due to lack of a 404 handler ex. http://timelineserver:8188/badnamespace/v1/timeline/someentity -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3251: -- Attachment: YARN-3251.2-6-0.3.patch Removing the csqueueutils > CapacityScheduler deadlock when computing absolute max avail capacity (short > term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, > YARN-3251.2-6-0.3.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck
[ https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338830#comment-14338830 ] Karthik Kambatla commented on YARN-3231: Thanks for reporting and working on this, [~l201514]. The approach looks generally good. Few comments (some nits): # Rename {{updateRunnabilityonRefreshQueues}} to {{updateRunnabilityOnReload}}? And, add a javadoc for when it should be called and what it does. # javadoc for the newly added private method and the significance of the new integer param. # Call the above method from AllocationReloadListner#onReload after all the other queue configs are updated. # The comment here no longer applies. Remove it? {code} // No more than one app per list will be able to be made runnable, so // we can stop looking after we've found that many if (noLongerPendingApps.size() >= maxRunnableApps) { break; } {code} # Indentation: {code} updateAppsRunnability(appsNowMaybeRunnable, appsNowMaybeRunnable.size()); {code} # Newly added tests: ## If it is not too much trouble, can we move them to a new test class (TestAppRunnability?) mostly because TestFairScheduler has so many tests already. ## Is it possible to reuse the code between these tests? ## Should we add tests for when the maxRunnableApps for a user or queue is decreased? If you think this might need additional work in the logic as well, I am open to filing a follow up JIRA and addressing it there. > FairScheduler changing queueMaxRunningApps on the fly will cause all pending > job stuck > -- > > Key: YARN-3231 > URL: https://issues.apache.org/jira/browse/YARN-3231 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch > > > When a queue is piling up with a lot of pending jobs due to the > maxRunningApps limit. We want to increase this property on the fly to make > some of the pending job active. However, once we increase the limit, all > pending jobs were not assigned any resource, and were stuck forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
Prakash Ramachandran created YARN-3267: -- Summary: Timelineserver applies the ACL rules after applying the limit on the number of records Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3260) NPE if AM attempts to register before RM processes launch event
[ https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338813#comment-14338813 ] Naganarasimha G R commented on YARN-3260: - Hi [~jlowe], Had a look at the code and some approaches which i can think of are : * ApplicationMasterService.registerAppAttempt(ApplicationAttemptId) to be called in RMAppAttemptImpl.AMLaunchedTransition instead of RMAppAttemptImpl.AttemptStartedTransition and ensuring that ClientToAMToken and registerering with ApplicationMasterService in the same block. By doing this we can throw InvalidApplicationMasterRequestException if AM tries to register to AMS before RMAppAttemptImpl processes RMAppAttempt LAUNCHED event. * Was thinking of having MultiThreadedDispatcher for processing APP and AppAttempt events similar to the one in SystemMetricsPublisher.MultiThreadedDispatcher with additional modification that instead of having {{ "(event.hashCode() & Integer.MAX_VALUE) % dispatchers.size();"}} we can think of doing it based on applicationId. This can speed up the processing of App events ... Was not able to see any other cleaner direct fix for this issue, so was wondering whether we need to start looking at the reason for "clusters was running behind on processing AsyncDispatcher events". Were these events were getting delayed to any particular reason? > NPE if AM attempts to register before RM processes launch event > --- > > Key: YARN-3260 > URL: https://issues.apache.org/jira/browse/YARN-3260 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Naganarasimha G R > > The RM on one of our clusters was running behind on processing > AsyncDispatcher events, and this caused AMs to fail to register due to an > NPE. The AM was launched and attempting to register before the > RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token > had not been generated yet. The NPE occurred because the > ApplicationMasterService tried to encode the missing token. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)
[ https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338782#comment-14338782 ] Hadoop QA commented on YARN-3251: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700977/YARN-3251.2-6-0.2.patch against trunk revision dce8b9c. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6759//console This message is automatically generated. > CapacityScheduler deadlock when computing absolute max avail capacity (short > term fix for 2.6.1) > > > Key: YARN-3251 > URL: https://issues.apache.org/jira/browse/YARN-3251 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Craig Welch >Priority: Blocker > Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch > > > The ResourceManager can deadlock in the CapacityScheduler when computing the > absolute max available capacity for user limits and headroom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options
[ https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338760#comment-14338760 ] Hadoop QA commented on YARN-3255: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700931/YARN-3255-02.patch against trunk revision 773b651. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6757//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6757//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6757//console This message is automatically generated. > RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support > generic options > --- > > Key: YARN-3255 > URL: https://issues.apache.org/jira/browse/YARN-3255 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.5.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Attachments: YARN-3255-01.patch, YARN-3255-02.patch > > > Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore > generic options, like {{-conf}} and {{-fs}}. It would be good to have the > ability to pass generic options in order to specify configuration files or > the NameNode location, when the services start through {{main()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log
[ https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338746#comment-14338746 ] Ted Yu commented on YARN-2777: -- @Varun: {code} 713 out.println("End of LogType:"); 714 out.println(fileType); {code} Can you put the above two onto the same line ? Thanks > Mark the end of individual log in aggregated log > > > Key: YARN-2777 > URL: https://issues.apache.org/jira/browse/YARN-2777 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Varun Saxena > Labels: log-aggregation > Attachments: YARN-2777.001.patch > > > Below is snippet of aggregated log showing hbase master log: > {code} > LogType: hbase-hbase-master-ip-172-31-34-167.log > LogUploadTime: 29-Oct-2014 22:31:55 > LogLength: 24103045 > Log Contents: > Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167 > ... > at > org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124) > at org.apache.hadoop.hbase.Chore.run(Chore.java:80) > at java.lang.Thread.run(Thread.java:745) > LogType: hbase-hbase-master-ip-172-31-34-167.out > {code} > Since logs from various daemons are aggregated in one log file, it would be > desirable to mark the end of one log before starting with the next. > e.g. with such a line: > {code} > End of LogType: hbase-hbase-master-ip-172-31-34-167.log > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes
[ https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338736#comment-14338736 ] Ted Yu commented on YARN-3025: -- Ping [~zjshen] > Provide API for retrieving blacklisted nodes > > > Key: YARN-3025 > URL: https://issues.apache.org/jira/browse/YARN-3025 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt > > > We have the following method which updates blacklist: > {code} > public synchronized void updateBlacklist(List blacklistAdditions, > List blacklistRemovals) { > {code} > Upon AM failover, there should be an API which returns the blacklisted nodes > so that the new AM can make consistent decisions. > The new API can be: > {code} > public synchronized List getBlacklistedNodes() > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration
[ https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2693: -- Attachment: 0006-YARN-2693.patch Attaching a minimal version of Application Priority manager where only configuration support is present. YARN-3250 on longer run will handle admin cli and REST support. > Priority Label Manager in RM to manage application priority based on > configuration > -- > > Key: YARN-2693 > URL: https://issues.apache.org/jira/browse/YARN-2693 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, > 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, > 0006-YARN-2693.patch > > > Focus of this JIRA is to have a centralized service to handle priority labels. > Support operations such as > * Add/Delete priority label to a specified queue > * Manage integer mapping associated with each priority label > * Support managing default priority label of a given queue > * Expose interface to RM to validate priority label > TO have simplified interface, Priority Manager will support only > configuration file in contrast with admin cli and REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338687#comment-14338687 ] Naganarasimha G R commented on YARN-3039: - Hi [~djp] bq. Another idea (from Vinod in offline discussion) is to add a blocking call in AMRMClient to get aggregator address directly from RM +1 for this approach. Also if NM uses this new blocking call in AMRMClient to get aggregator address then there might not be any race conditions for posting AM container's life cycle events by NM immediately after creation of appAggregator through Aux service. bq. In addition, if adding a new API in AMRMClient can be accepted, NM will use TimelineClient too so can handle service discovery automatically. Are we just adding a method to get the aggregator address aggregator address ? or what other API's are planned ? bq. NM will notify RM that this new appAggregator is ready for use in next heartbeat to RM (missing in this patch). bq. RM verify the out of service for this app aggregator first and kick off rebind appAggregator to another NM's perNodeAggregatorService in next heartbeat comes. I beleive the idea of using AUX service was to to decouple NM and Timeline service. If NM will notify RM about new appAggregator creation (based on AUX service) then basically NM should be aware of PerNodeAggregatorServer is configured as AUX service, and and if it supports rebinding appAggregator for failure then it should be able to communicate with this Auxservice too, whether would this be clean approach? I also feel we need to support to start per app aggregator only if app requests for it (Zhijie also had mentioned abt this). If not we can make use of one default aggregator for all these kind of apps launched in NM, which is just used to post container entities from different NM's for these apps. Any discussions happened wrt RM having its own Aggregator ? I feel it would be better for RM to have it as it need not depend on any NM's to post any entities > [Aggregator wireup] Implement ATS app-appgregator service discovery > --- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Junping Du > Attachments: Service Binding for applicationaggregator of ATS > (draft).pdf, YARN-3039-no-test.patch > > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)