[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339830#comment-14339830
 ] 

Hadoop QA commented on YARN-3122:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701280/YARN-3122.005.patch
  against trunk revision 8ca0d95.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6776//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6776//console

This message is automatically generated.

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.prelim.patch, YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3204) Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)

2015-02-26 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339811#comment-14339811
 ] 

Chengbing Liu commented on YARN-3204:
-

{code}
-this.reservedAppSchedulable = (FSAppAttempt) application;
+ if(application instanceof FSAppAttempt){
+   this.reservedAppSchedulable = (FSAppAttempt) application;
+}
{code}
Would it be better if we throw an exception if the condition is not met?

{code}
 Set planQueues = new HashSet();
 for (FSQueue fsQueue : queueMgr.getQueues()) {
   String queueName = fsQueue.getName();
-  if (allocConf.isReservable(queueName)) {
+  boolean isReservable = false;
+  synchronized(this){
+ isReservable = allocConf.isReservable(queueName);
+  }
+  if (isReservable) {
 planQueues.add(queueName);
   }
 }
{code}
I think we should synchronize the whole function, since {{allocConf}} may be 
reloaded during this loop. A dedicated lock is better than 
{{FairScheduler.this}} to me.

> Fix new findbug warnings in 
> hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
> --
>
> Key: YARN-3204
> URL: https://issues.apache.org/jira/browse/YARN-3204
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3204-001.patch, YARN-3204-002.patch
>
>
> Please check following findbug report..
> https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-26 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339799#comment-14339799
 ] 

Rohith commented on YARN-3222:
--

bq. NODE_USABLE event is sent regardless the reconnected node is healthy or not 
healthy, which is incorrect, right ?
Yes, I think it was assumed like if new node is reconnecting then NM is 
healthy. It is better to retain the old state i.e UNHEALTHY and in the next 1st 
heartbeat NodeStatus can be moved from Unhealthy to Running.

I see another potential issue that if old node is retaining then RMnode has to 
be updated {{totalCapability}} with new RMNode resource.  But in flow, 
{{totalCapability}} is not updated. This result , scheduler has updated 
resources value but RMNode has stale memory. Any client getting RMnode 
capabilit from RMnode would end up in wrong node resource value.
{code}
if (noRunningApps) {
// some code
rmNode.context.getDispatcher().getEventHandler().handle(
new NodeRemovedSchedulerEvent(rmNode));

if (rmNode.getHttpPort() == newNode.getHttpPort()) {
   if (rmNode.getState() != NodeState.UNHEALTHY) {
// Only add new node if old state is not UNHEALTHY
rmNode.context.getDispatcher().getEventHandler().handle(
new NodeAddedSchedulerEvent(newNode));  // NEW NODE CAPABILITY 
SHOULD BE UPDATED TO OLD NODE
  }
} else {
  // Reconnected node differs, so replace old node and start new node
rmNode.context.getDispatcher().getEventHandler().handle(
new RMNodeStartedEvent(newNode.getNodeID(), null, null)); // No 
need to update totalCapability since old node is replaced with new node.
}
  }
{code}

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-02-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339796#comment-14339796
 ] 

Varun Saxena commented on YARN-2962:


[~kasha] / [~ka...@cloudera.com], for this I WILL assume that state store will 
be formatted before making the config change ?
Backward compatibility for running apps after config change (on RM restart) 
will be difficult. As we may have to try all the possible appid formats.


> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3122:
---
Attachment: YARN-3122.005.patch

The updated patch looks mostly good to me. I like that we are mimicking top; 
users will it easier to reason about this.

I had a few nit picks that I have put into v5 patch - rename 
CpuTimeTracker#getCpuUsagePercent and changes to comments. [~adhoot] - can you 
please review and verify the changes.

One last concern - we use 0 for when we cannot calculate the percentage. 
Shouldn't we use UNAVAILABLE instead? 

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
> YARN-3122.prelim.patch, YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown

2015-02-26 Thread Gururaj Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339780#comment-14339780
 ] 

Gururaj Shetty commented on YARN-3168:
--

Hi [~aw]

All your comments are incorporated. Kindly review the latest patch attached.

> Convert site documentation from apt to markdown
> ---
>
> Key: YARN-3168
> URL: https://issues.apache.org/jira/browse/YARN-3168
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Allen Wittenauer
>Assignee: Gururaj Shetty
> Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, 
> YARN-3168.20150225.2.patch, YARN-3168.20150227.3.patch
>
>
> YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339774#comment-14339774
 ] 

Hadoop QA commented on YARN-2820:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701267/YARN-2820.006.patch
  against trunk revision 8ca0d95.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6775//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6775//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6775//console

This message is automatically generated.

> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.s

[jira] [Updated] (YARN-3168) Convert site documentation from apt to markdown

2015-02-26 Thread Gururaj Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gururaj Shetty updated YARN-3168:
-
Attachment: YARN-3168.20150227.3.patch

> Convert site documentation from apt to markdown
> ---
>
> Key: YARN-3168
> URL: https://issues.apache.org/jira/browse/YARN-3168
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Allen Wittenauer
>Assignee: Gururaj Shetty
> Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, 
> YARN-3168.20150225.2.patch, YARN-3168.20150227.3.patch
>
>
> YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3262) Surface application outstanding resource requests table

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339753#comment-14339753
 ] 

Hadoop QA commented on YARN-3262:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701265/YARN-3262.4.patch
  against trunk revision 8ca0d95.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6773//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6773//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6773//console

This message is automatically generated.

> Surface application outstanding resource requests table
> ---
>
> Key: YARN-3262
> URL: https://issues.apache.org/jira/browse/YARN-3262
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, 
> YARN-3262.4.patch, resource requests.png
>
>
> It would be useful to surface the outstanding resource requests table on the 
> application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-02-26 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339744#comment-14339744
 ] 

Abin Shahab commented on YARN-2981:
---

[~raviprak] [~vinodkv] [~vvasudev] [~ywskycn] please review

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339730#comment-14339730
 ] 

Hadoop QA commented on YARN-3269:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701264/YARN-3269.2.patch
  against trunk revision 8ca0d95.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.TestPBLocalizerRPC
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6774//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6774//console

This message is automatically generated.

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3269.1.patch, YARN-3269.2.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-26 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339716#comment-14339716
 ] 

zhihai xu commented on YARN-2820:
-

[~ozawa], thanks for your thorough review, I am really appreciated.
I uploaded a new patch YARN-2820.005.patch, which addressed all your comments, 
It also put fsIn.close in try-with-resources at loadRMDTSecretManagerState, 
which is similar as fsOut.close at storeRMDTMasterKeyState.
please review it, thanks zhihai

> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:744) 
> {code}
> As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
> IOException in storeApplicationStateInternal.
> Stack trace from TestFSRMStateStore failure:
> {code}
>  2015-02-03 00:09:19,092 INFO  [Thre

[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-26 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339715#comment-14339715
 ] 

Zhijie Shen commented on YARN-3125:
---

Li, thanks for working on the patch. Looking at the test code of 
TestDistributedShell. MiniYarnCluster will be started for each individual test 
case. Therefore, we can potentially avoid conflict by configuration, and don't 
need to hard code the service address. For those test cases about v1 timeline 
service, you set enableAHS = true, while for the test cases about v2, you add 
the aux service configuration. In this way, either v1 or v2 timeline service 
will be set up, but not both. To distinguish the different test cases, you can 
try the following thing:
{code}
@Rule public TestName name = new TestName();
{code}
Using the test name to switching the setup of MiniYarnCluster in setup().

Another minor issue. Instead of using
{code}
  private static final String TIMELINE_AUX_SERVICE_CLASS =
  "org.apache.hadoop.yarn.server.timelineservice.aggregator"
  + ".PerNodeAggregatorServer";
{code}
You can use {{PerNodeAggregatorServer.class.getName()}} directly.

> [Event producers] Change distributed shell to use new timeline service
> --
>
> Key: YARN-3125
> URL: https://issues.apache.org/jira/browse/YARN-3125
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Junping Du
> Attachments: YARN-3125.patch, YARN-3125_UT-022615.patch, 
> YARN-3125v2.patch, YARN-3125v3.patch
>
>
> We can start with changing distributed shell to use new timeline service once 
> the framework is completed, in which way we can quickly verify the next gen 
> is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-26 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2820:

Attachment: YARN-2820.006.patch

> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch, YARN-2820.006.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:744) 
> {code}
> As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
> IOException in storeApplicationStateInternal.
> Stack trace from TestFSRMStateStore failure:
> {code}
>  2015-02-03 00:09:19,092 INFO  [Thread-110] recovery.TestFSRMStateStore 
> (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception
>  org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still 
> not started
>at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876)
>at 

[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-26 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339703#comment-14339703
 ] 

Tsuyoshi Ozawa commented on YARN-2820:
--

Good catch! Yes, we should retry there also.

> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:744) 
> {code}
> As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
> IOException in storeApplicationStateInternal.
> Stack trace from TestFSRMStateStore failure:
> {code}
>  2015-02-03 00:09:19,092 INFO  [Thread-110] recovery.TestFSRMStateStore 
> (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception
>  org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still 
> not started
>at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.c

[jira] [Updated] (YARN-3262) Surface application outstanding resource requests table

2015-02-26 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3262:
--
Attachment: YARN-3262.4.patch

fixed the test failures

> Surface application outstanding resource requests table
> ---
>
> Key: YARN-3262
> URL: https://issues.apache.org/jira/browse/YARN-3262
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, 
> YARN-3262.4.patch, resource requests.png
>
>
> It would be useful to surface the outstanding resource requests table on the 
> application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-02-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3269:

Attachment: YARN-3269.2.patch

modify one of logaggregationService testcases to use the fully qualified path

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3269.1.patch, YARN-3269.2.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-02-26 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339657#comment-14339657
 ] 

Chengbing Liu commented on YARN-3266:
-

The findbugs warnings are unrelated, caused by YARN-3181 and handled by 
YARN-3204.

> RMContext inactiveNodes should have NodeId as map key
> -
>
> Key: YARN-3266
> URL: https://issues.apache.org/jira/browse/YARN-3266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: YARN-3266.01.patch, YARN-3266.02.patch
>
>
> Under the default NM port configuration, which is 0, we have observed in the 
> current version, "lost nodes" count is greater than the length of the lost 
> node list. This will happen when we consecutively restart the same NM twice:
> * NM started at port 10001
> * NM restarted at port 10002
> * NM restarted at port 10003
> * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} has 1 element
> * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} still has 1 element
> Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
> {{inactiveNodes}} should be of type {{ConcurrentMap}}. If 
> this will break the current API, then the key string should include the NM's 
> port as well.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3262) Surface application outstanding resource requests table

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339656#comment-14339656
 ] 

Hadoop QA commented on YARN-3262:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701240/YARN-3262.3.patch
  against trunk revision 8ca0d95.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6772//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6772//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6772//console

This message is automatically generated.

> Surface application outstanding resource requests table
> ---
>
> Key: YARN-3262
> URL: https://issues.apache.org/jira/browse/YARN-3262
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, 
> resource requests.png
>
>
> It would be useful to surface the outstanding resource requests table on the 
> application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339625#comment-14339625
 ] 

Hadoop QA commented on YARN-1809:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701229/YARN-1809.13.patch
  against trunk revision bfbf076.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6770//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6770//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6770//console

This message is automatically generated.

> Synchronize RM and Generic History Service Web-UIs
> --
>
> Key: YARN-1809
> URL: https://issues.apache.org/jira/browse/YARN-1809
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
> YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, 
> YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, 
> YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, 
> YARN-1809.9.patch
>
>
> After YARN-953, the web-UI of generic history service is provide more 
> information than that of RM, the details about app attempt and container. 
> It's good to provide similar web-UIs, but retrieve the data from separate 
> source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-02-26 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-3273:


Assignee: Rohith

> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339613#comment-14339613
 ] 

Hadoop QA commented on YARN-2981:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701239/YARN-2981.patch
  against trunk revision 8ca0d95.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6771//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6771//console

This message is automatically generated.

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-26 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339592#comment-14339592
 ] 

zhihai xu commented on YARN-2820:
-

That is good finding, I double-checked all the FS operations in 
FileSystemRMStateStore: With your above finding, there is one more missing:
which is in closeInternal
{code}
fs.close();
{code} 
I will upload a new patch shortly to include retries for all these missing 
cases.

> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> --
>
> Key: YARN-2820
> URL: https://issues.apache.org/jira/browse/YARN-2820
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0, 2.6.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
> YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
> YARN-2820.005.patch
>
>
> Do retry in FileSystemRMStateStore for better error recovery when 
> update/store failure due to IOException.
> When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
> saw the following IOexception cause the RM shutdown.
> {code}
> 2014-10-29 23:49:12,202 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Updating info for attempt: appattempt_1409135750325_109118_01 at: 
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01
> 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
> complete
> /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
> appattempt_1409135750325_109118_01.new.tmp retrying...
> 2014-10-29 23:49:46,283 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
> Error updating info for attempt: appattempt_1409135750325_109118_01
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas.
> 2014-10-29 23:49:46,284 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
> Error storing/updating appAttempt: appattempt_1409135750325_109118_01
> 2014-10-29 23:49:46,916 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
> Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause: 
> java.io.IOException: Unable to close file because the last block does not 
> have enough number of replicas. 
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
>  
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
>  
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:744) 
> {code}
> As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
> IOException in storeApplicationStateInternal.
> Stack trace from TestFSRMStateStore failure:
> {code}
>  2015-02-03 00:09:19,092 INFO  [Thread-110] recovery.TestFSRMStateStore 
> (TestFSRMStateStore.ja

[jira] [Created] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging

2015-02-26 Thread Jian He (JIRA)
Jian He created YARN-3273:
-

 Summary: Improve web UI to facilitate scheduling analysis and 
debugging
 Key: YARN-3273
 URL: https://issues.apache.org/jira/browse/YARN-3273
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He


Job may be stuck for reasons such as:
- hitting queue capacity 
- hitting user-limit, 
- hitting AM-resource-percentage 

The  first queueCapacity is already shown on the UI.
We may surface things like:
- what is user's current usage and user-limit; 
- what is the AM resource usage and limit;
- what is the application's current HeadRoom;
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3272) Surface container locality info

2015-02-26 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3272:
--
Issue Type: Improvement  (was: Bug)

> Surface container locality info 
> 
>
> Key: YARN-3272
> URL: https://issues.apache.org/jira/browse/YARN-3272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
>
> We can surface the container locality info on the web UI. This is useful to 
> debug "why my applications are progressing slow", especially when locality is 
> bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-26 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3125:

Attachment: YARN-3125_UT-022615.patch

Based on [~djp]'s v3 patch, I wrote a simple unit test for distributed shell 
that helps us verify timeline v2 integration. I added this test into 
TestDistributedShell, as a test for timeline V2. On my machine this single new 
test passed, and I can see the successful info from the test logs. So in 
general our prototype works. 

However, we do have some (potentially quick, but may be important) problems on 
running v1 and v2 timeline servers together. On the server side, in this UT, 
both v1 and v2 servers are launched, with v2 server bind to a predefined port. 
On the client side, now I've disabled the v1 URL in timeline client. Probably 
we'd like a switch in our client to set timeline version? I believe now we need 
to take care of the compatibility issues... 

> [Event producers] Change distributed shell to use new timeline service
> --
>
> Key: YARN-3125
> URL: https://issues.apache.org/jira/browse/YARN-3125
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Junping Du
> Attachments: YARN-3125.patch, YARN-3125_UT-022615.patch, 
> YARN-3125v2.patch, YARN-3125v3.patch
>
>
> We can start with changing distributed shell to use new timeline service once 
> the framework is completed, in which way we can quickly verify the next gen 
> is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3272) Surface container locality info

2015-02-26 Thread Jian He (JIRA)
Jian He created YARN-3272:
-

 Summary: Surface container locality info 
 Key: YARN-3272
 URL: https://issues.apache.org/jira/browse/YARN-3272
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He


We can surface the container locality info on the web UI. This is useful to 
debug "why my applications are progressing slow", especially when locality is 
bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3262) Surface application outstanding resource requests table

2015-02-26 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3262:
--
Summary: Surface application outstanding resource requests table  (was: 
Suface application outstanding resource requests table)

> Surface application outstanding resource requests table
> ---
>
> Key: YARN-3262
> URL: https://issues.apache.org/jira/browse/YARN-3262
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, 
> resource requests.png
>
>
> It would be useful to surface the outstanding resource requests table on the 
> application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3262) Suface application outstanding resource requests table

2015-02-26 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3262:
--
Summary: Suface application outstanding resource requests table  (was: 
Suface application resource requests table)

> Suface application outstanding resource requests table
> --
>
> Key: YARN-3262
> URL: https://issues.apache.org/jira/browse/YARN-3262
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, 
> resource requests.png
>
>
> It would be useful to surface the outstanding resource requests table on the 
> application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3262) Suface application resource requests table

2015-02-26 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3262:
--
Attachment: YARN-3262.3.patch

thanks for reviewing the patch, Wangda !
Addressed all the comments

> Suface application resource requests table
> --
>
> Key: YARN-3262
> URL: https://issues.apache.org/jira/browse/YARN-3262
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, 
> resource requests.png
>
>
> It would be useful to surface the outstanding resource requests table on the 
> application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-02-26 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-2981:
--
Attachment: YARN-2981.patch

This introduced a cluster-default docker image, and limits memory, cpu, and 
user for the container.

> DockerContainerExecutor must support a Cluster-wide default Docker image
> 
>
> Key: YARN-2981
> URL: https://issues.apache.org/jira/browse/YARN-2981
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abin Shahab
>Assignee: Abin Shahab
> Attachments: YARN-2981.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339547#comment-14339547
 ] 

Hudson commented on YARN-3255:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7215 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7215/])
YARN-3255. RM, NM, JobHistoryServer, and WebAppProxyServer's main() should 
support generic options. Contributed by Konstantin Shvachko. (shv: rev 
8ca0d957c4b1076e801e1cdce5b44aa805de889c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServer.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


> RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support 
> generic options
> ---
>
> Key: YARN-3255
> URL: https://issues.apache.org/jira/browse/YARN-3255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 2.7.0
>
> Attachments: YARN-3255-01.patch, YARN-3255-02.patch, 
> YARN-3255-branch-2.patch
>
>
> Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
> generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
> ability to pass generic options in order to specify configuration files or 
> the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options

2015-02-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved YARN-3255.
---
   Resolution: Fixed
Fix Version/s: 2.7.0

I just committed this.
Thank you guys for prompt reviews.

> RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support 
> generic options
> ---
>
> Key: YARN-3255
> URL: https://issues.apache.org/jira/browse/YARN-3255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 2.7.0
>
> Attachments: YARN-3255-01.patch, YARN-3255-02.patch, 
> YARN-3255-branch-2.patch
>
>
> Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
> generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
> ability to pass generic options in order to specify configuration files or 
> the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options

2015-02-26 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-3255:
--
Attachment: YARN-3255-branch-2.patch

Patch for branch-2. Minor difference in import section with trunk.

> RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support 
> generic options
> ---
>
> Key: YARN-3255
> URL: https://issues.apache.org/jira/browse/YARN-3255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: YARN-3255-01.patch, YARN-3255-02.patch, 
> YARN-3255-branch-2.patch
>
>
> Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
> generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
> ability to pass generic options in order to specify configuration files or 
> the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3251) Fix CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-3251.
--
   Resolution: Fixed
Fix Version/s: 2.6.1
 Hadoop Flags: Reviewed

Just compiled and ran all tests in CapacityScheduler, committed to branch-2.6. 

Thanks [~cwelch] and also reviews from [~jlowe], [~sunilg] and [~vinodkv].

> Fix CapacityScheduler deadlock when computing absolute max avail capacity 
> (short term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Fix For: 2.6.1
>
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
> YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) Fix CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3251:
-
Summary: Fix CapacityScheduler deadlock when computing absolute max avail 
capacity (short term fix for 2.6.1)  (was: CapacityScheduler deadlock when 
computing absolute max avail capacity (short term fix for 2.6.1))

> Fix CapacityScheduler deadlock when computing absolute max avail capacity 
> (short term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
> YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2015-02-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1809:

Attachment: YARN-1809.13.patch

Upload a new patch to address zhijie's comment. Test the patch in both secure 
and un-secure cluster.

> Synchronize RM and Generic History Service Web-UIs
> --
>
> Key: YARN-1809
> URL: https://issues.apache.org/jira/browse/YARN-1809
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1809.1.patch, YARN-1809.10.patch, 
> YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, 
> YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, 
> YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, 
> YARN-1809.9.patch
>
>
> After YARN-953, the web-UI of generic history service is provide more 
> information than that of RM, the details about app attempt and container. 
> It's good to provide similar web-UIs, but retrieve the data from separate 
> source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339483#comment-14339483
 ] 

Hadoop QA commented on YARN-3080:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701213/YARN-3080.patch
  against trunk revision bfbf076.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6769//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6769//console

This message is automatically generated.

> The DockerContainerExecutor could not write the right pid to container pidFile
> --
>
> Key: YARN-3080
> URL: https://issues.apache.org/jira/browse/YARN-3080
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Beckham007
>Assignee: Abin Shahab
> Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, 
> YARN-3080.patch
>
>
> The docker_container_executor_session.sh is like this:
> {quote}
> #!/usr/bin/env bash
> echo `/usr/bin/docker inspect --format {{.State.Pid}} 
> container_1421723685222_0008_01_02` > 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
> /bin/mv -f 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
>  
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
> /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
> GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
> GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
> GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
> GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
> --cpu-shares=1024 -v 
> /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
>  -v 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
>  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
> "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh"
> {quote}
> The DockerContainerExecutor use docker inspect before docker run, so the 
> docker inspect couldn't get the right pid for the docker, signalContainer() 
> and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339479#comment-14339479
 ] 

Wangda Tan commented on YARN-3251:
--

Checking this into branch-2.6

> CapacityScheduler deadlock when computing absolute max avail capacity (short 
> term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
> YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3262) Suface application resource requests table

2015-02-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339471#comment-14339471
 ] 

Wangda Tan commented on YARN-3262:
--

Hi [~jianhe],
Thanks for working on this, it will be very helpful!

Took a look at your patch, overall looks good to me, 2 minor comments:

1) getAllResourceRequests could be a method in AbstractYarnScheduler, I feel 
some other places will use that, and we don't have to duplicate the 
implementation every where.

2) You can add a "total-outstanding-resource" in app page as well, it should be 
sum of all ANY resource-request.capacity.

> Suface application resource requests table
> --
>
> Key: YARN-3262
> URL: https://issues.apache.org/jira/browse/YARN-3262
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-3262.1.patch, YARN-3262.2.patch, resource 
> requests.png
>
>
> It would be useful to surface the outstanding resource requests table on the 
> application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-02-26 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-3080:
--
Attachment: YARN-3080.patch

Updated callable to runnable.

> The DockerContainerExecutor could not write the right pid to container pidFile
> --
>
> Key: YARN-3080
> URL: https://issues.apache.org/jira/browse/YARN-3080
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Beckham007
>Assignee: Abin Shahab
> Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, 
> YARN-3080.patch
>
>
> The docker_container_executor_session.sh is like this:
> {quote}
> #!/usr/bin/env bash
> echo `/usr/bin/docker inspect --format {{.State.Pid}} 
> container_1421723685222_0008_01_02` > 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
> /bin/mv -f 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
>  
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
> /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
> GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
> GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
> GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
> GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
> --cpu-shares=1024 -v 
> /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
>  -v 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
>  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
> "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh"
> {quote}
> The DockerContainerExecutor use docker inspect before docker run, so the 
> docker inspect couldn't get the right pid for the docker, signalContainer() 
> and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339418#comment-14339418
 ] 

Hadoop QA commented on YARN-3231:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701186/YARN-3231.v4.patch
  against trunk revision c6d5b37.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6767//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6767//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6767//console

This message is automatically generated.

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
> YARN-3231.v3.patch, YARN-3231.v4.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options

2015-02-26 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339394#comment-14339394
 ] 

Tsuyoshi Ozawa commented on YARN-3255:
--

The warnings by findbugs are not related to the modification. Checking this in.

> RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support 
> generic options
> ---
>
> Key: YARN-3255
> URL: https://issues.apache.org/jira/browse/YARN-3255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: YARN-3255-01.patch, YARN-3255-02.patch
>
>
> Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
> generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
> ability to pass generic options in order to specify configuration files or 
> the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-02-26 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li reassigned YARN-3267:
--

Assignee: Chang Li

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339386#comment-14339386
 ] 

Hadoop QA commented on YARN-3122:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701190/YARN-3122.004.patch
  against trunk revision 1047c88.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6768//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6768//console

This message is automatically generated.

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.prelim.patch, 
> YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339382#comment-14339382
 ] 

Zhijie Shen commented on YARN-3087:
---

+1. the last patch looks good to me. Will commit

> [Aggregator implementation] the REST server (web server) for per-node 
> aggregator does not work if it runs inside node manager
> -
>
> Key: YARN-3087
> URL: https://issues.apache.org/jira/browse/YARN-3087
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Fix For: YARN-2928
>
> Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, 
> YARN-3087-022615.patch
>
>
> This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
> aggregator and the associated REST server. It runs fine as a standalone 
> process, but does not work if it runs inside the node manager due to possible 
> collisions of servlet mapping.
> Exception:
> {noformat}
> org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
> v2 not found
>   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
>   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-02-26 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3267:
---
Assignee: (was: Chang Li)

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3122:

Attachment: YARN-3122.004.patch

Modified CPU usage to be percent per core and the corresponding metric also to 
be percent per core. Thus 2 cores used up should report as 200%
Added doc comments

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.prelim.patch, 
> YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-3231:
--
Attachment: YARN-3231.v4.patch

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
> YARN-3231.v3.patch, YARN-3231.v4.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-26 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339270#comment-14339270
 ] 

Jian He commented on YARN-3222:
---

looks good to me.
while looking at this,  may found another bug;  NODE_USABLE event is sent 
regardless the reconnected node is healthy or not healthy, which is incorrect, 
right ? 
{code}
  rmNode.context.getDispatcher().getEventHandler().handle(
  new NodesListManagerEvent(
  NodesListManagerEventType.NODE_USABLE, rmNode));
{code}

> RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
> order
> ---
>
> Key: YARN-3222
> URL: https://issues.apache.org/jira/browse/YARN-3222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-3222.patch
>
>
> When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
> scheduler in a events node_added,node_removed or node_resource_update. These 
> events should be notified in an sequential order i.e node_added event and 
> next node_resource_update events.
> But if the node is reconnected with different http port, the oder of 
> scheduler events are node_removed --> node_resource_update --> node_added 
> which causes scheduler does not find the node and throw NPE and RM exit.
> Node_Resource_update event should be always should be triggered via 
> RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339264#comment-14339264
 ] 

Karthik Kambatla commented on YARN-3231:


Filed YARN-3271 to move these tests. I am okay with moving these too as part of 
that. I will be glad to review that JIRA too, should anyone want to pick it up.

bq. For 6.3, I don't think there is a problem with "maxRunnableApps for a user 
or queue is decreased".
Would be nice to add the tests even if there is no problem. Seems like a 
logical extension of what the latest patch is doing here. 

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
> YARN-3231.v3.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3271) FairScheduler: Move tests related to max-runnable-apps from TestFairScheduler to TestAppRunnability

2015-02-26 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-3271:
--

 Summary: FairScheduler: Move tests related to max-runnable-apps 
from TestFairScheduler to TestAppRunnability
 Key: YARN-3271
 URL: https://issues.apache.org/jira/browse/YARN-3271
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339260#comment-14339260
 ] 

Hadoop QA commented on YARN-2777:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701151/YARN-2777.002.patch
  against trunk revision dce8b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
  org.apache.hadoop.mapred.TestMiniMRClientCluster
  org.apache.hadoop.mapred.TestMRTimelineEventHandling
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
  org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler
  org.apache.hadoop.mapred.TestJobCleanup

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
org.apache.hadoop.mapred.lib.Tests
org.apache.hadoop.mapred.TestCombineOutputCollector
org.apache.hadoop.mapred.lib.TestMultipleInTests
org.apache.hadoop.mapreduce.Tests
org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6761//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6761//console

This message is automatically generated.

> Mark the end of individual log in aggregated log
> 
>
> Key: YARN-2777
> URL: https://issues.apache.org/jira/browse/YARN-2777
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Varun Saxena
>  Labels: log-aggregation
> Attachments: YARN-2777.001.patch, YARN-2777.002.patch
>
>
> Below is snippet of aggregated log showing hbase master log:
> {code}
> LogType: hbase-hbase-master-ip-172-31-34-167.log
> LogUploadTime: 29-Oct-2014 22:31:55
> LogLength: 24103045
> Log Contents:
> Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
> ...
>   at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>   at java.lang.Thread.run(Thread.java:745)
> LogType: hbase-hbase-master-ip-172-31-34-167.out
> {code}
> Since logs from various daemons are aggregated in one log file, it would be 
> desirable to mark the end of one log before starting with the next.
> e.g. with such a line:
> {code}
> End of LogType: hbase-hbase-master-ip-172-31-34-167.log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-26 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339246#comment-14339246
 ] 

Zhijie Shen commented on YARN-3125:
---

Thanks for the patch, Junping! It looks good to me. Per offline discussion, we 
should add an integration test in TestDistributedShell.

> [Event producers] Change distributed shell to use new timeline service
> --
>
> Key: YARN-3125
> URL: https://issues.apache.org/jira/browse/YARN-3125
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Junping Du
> Attachments: YARN-3125.patch, YARN-3125v2.patch, YARN-3125v3.patch
>
>
> We can start with changing distributed shell to use new timeline service once 
> the framework is completed, in which way we can quickly verify the next gen 
> is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339248#comment-14339248
 ] 

Hadoop QA commented on YARN-3087:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12701178/YARN-3087-022615.patch
  against trunk revision c6d5b37.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6766//console

This message is automatically generated.

> [Aggregator implementation] the REST server (web server) for per-node 
> aggregator does not work if it runs inside node manager
> -
>
> Key: YARN-3087
> URL: https://issues.apache.org/jira/browse/YARN-3087
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Fix For: YARN-2928
>
> Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, 
> YARN-3087-022615.patch
>
>
> This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
> aggregator and the associated REST server. It runs fine as a standalone 
> process, but does not work if it runs inside the node manager due to possible 
> collisions of servlet mapping.
> Exception:
> {noformat}
> org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
> v2 not found
>   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
>   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339237#comment-14339237
 ] 

Hadoop QA commented on YARN-3122:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701167/YARN-3122.003.patch
  against trunk revision 2214dab.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6765//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6765//console

This message is automatically generated.

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339233#comment-14339233
 ] 

Li Lu commented on YARN-3087:
-

Hi [~djp], thanks for the comments! I agree that we may want to use generic 
types to solve the problem. Similar code also appear in v1 timeline object 
model, so maybe we'd like to fix both together? If that's the case we may open 
a separate JIRA to trace this. 

> [Aggregator implementation] the REST server (web server) for per-node 
> aggregator does not work if it runs inside node manager
> -
>
> Key: YARN-3087
> URL: https://issues.apache.org/jira/browse/YARN-3087
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Fix For: YARN-2928
>
> Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, 
> YARN-3087-022615.patch
>
>
> This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
> aggregator and the associated REST server. It runs fine as a standalone 
> process, but does not work if it runs inside the node manager due to possible 
> collisions of servlet mapping.
> Exception:
> {noformat}
> org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
> v2 not found
>   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
>   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3087:

Attachment: YARN-3087-022615.patch

Updated my patch according to [~zjshen]'s comments. Addressed points 1-3. Point 
4 is caused by a limitation of HttpServer2 for now. We may want to decide if we 
want to fix that on our side, or add support to this use case on the 
HttpServer2 side. For now, I think we can temporarily use our current way to 
make the prototype work. 

> [Aggregator implementation] the REST server (web server) for per-node 
> aggregator does not work if it runs inside node manager
> -
>
> Key: YARN-3087
> URL: https://issues.apache.org/jira/browse/YARN-3087
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Fix For: YARN-2928
>
> Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, 
> YARN-3087-022615.patch
>
>
> This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
> aggregator and the associated REST server. It runs fine as a standalone 
> process, but does not work if it runs inside the node manager due to possible 
> collisions of servlet mapping.
> Exception:
> {noformat}
> org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
> v2 not found
>   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
>   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339214#comment-14339214
 ] 

Siqi Li commented on YARN-3231:
---

Hi [~ka...@cloudera.com], thanks for your feedback.

I have updated a new patch which addressed all your comment except 6.1 and 6.3.

For 6.1, it seems that there are other test cases that also might be qualified 
for moving to TestAppRunnability, it would be good to do a larger refactor of 
TestFairScheduler into TestAppRunnability.

For 6.3, I don't think there is a problem with "maxRunnableApps for a user or 
queue is decreased". 

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
> YARN-3231.v3.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339201#comment-14339201
 ] 

Hadoop QA commented on YARN-3231:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701160/YARN-3231.v3.patch
  against trunk revision f0c980a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6763//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6763//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6763//console

This message is automatically generated.

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
> YARN-3231.v3.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339196#comment-14339196
 ] 

Hadoop QA commented on YARN-3270:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701163/YARN-3270.patch
  against trunk revision 2214dab.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6764//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6764//console

This message is automatically generated.

> node label expression not getting set in ApplicationSubmissionContext
> -
>
> Key: YARN-3270
> URL: https://issues.apache.org/jira/browse/YARN-3270
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Priority: Minor
> Attachments: YARN-3270.patch
>
>
> One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
> setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-02-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339170#comment-14339170
 ] 

Vinod Kumar Vavilapalli commented on YARN-3269:
---

Can you modify one of the tests to use a fully qualified patch, in order to 
'prove' that this patch works?

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3269.1.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3122:

Attachment: YARN-3122.003.patch

Addressed feedback

> Metrics for container's actual CPU usage
> 
>
> Key: YARN-3122
> URL: https://issues.apache.org/jira/browse/YARN-3122
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
> YARN-3122.003.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch
>
>
> It would be nice to capture resource usage per container, for a variety of 
> reasons. This JIRA is to track CPU usage. 
> YARN-2965 tracks the resource usage on the node, and the two implementations 
> should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339140#comment-14339140
 ] 

Hadoop QA commented on YARN-3251:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701150/YARN-3251.2.patch
  against trunk revision dce8b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6760//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6760//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6760//console

This message is automatically generated.

> CapacityScheduler deadlock when computing absolute max avail capacity (short 
> term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
> YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-02-26 Thread Rohit Agarwal (JIRA)
Rohit Agarwal created YARN-3270:
---

 Summary: node label expression not getting set in 
ApplicationSubmissionContext
 Key: YARN-3270
 URL: https://issues.apache.org/jira/browse/YARN-3270
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Priority: Minor


One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-02-26 Thread Rohit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Agarwal updated YARN-3270:

Attachment: YARN-3270.patch

Attached the patch.

> node label expression not getting set in ApplicationSubmissionContext
> -
>
> Key: YARN-3270
> URL: https://issues.apache.org/jira/browse/YARN-3270
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Rohit Agarwal
>Priority: Minor
> Attachments: YARN-3270.patch
>
>
> One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
> setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339084#comment-14339084
 ] 

Hadoop QA commented on YARN-3269:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701154/YARN-3269.1.patch
  against trunk revision dce8b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6762//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6762//console

This message is automatically generated.

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3269.1.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.

2015-02-26 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3268:
---
Assignee: (was: Chang Li)

> timelineserver rest api returns html page for 404 when a bad endpoint is used.
> --
>
> Key: YARN-3268
> URL: https://issues.apache.org/jira/browse/YARN-3268
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>
> the timelineserver returns a 404 page instead of giving a REST response. this 
> interferes with the end user pages which try to retrieve data using REST api. 
> this could be due to lack of a 404 handler
> ex. 
> http://timelineserver:8188/badnamespace/v1/timeline/someentity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-3231:
--
Attachment: YARN-3231.v3.patch

> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
> YARN-3231.v3.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-02-26 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li reassigned YARN-3267:
--

Assignee: Chang Li

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.

2015-02-26 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li reassigned YARN-3268:
--

Assignee: Chang Li

> timelineserver rest api returns html page for 404 when a bad endpoint is used.
> --
>
> Key: YARN-3268
> URL: https://issues.apache.org/jira/browse/YARN-3268
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
>
> the timelineserver returns a 404 page instead of giving a REST response. this 
> interferes with the end user pages which try to retrieve data using REST api. 
> this could be due to lack of a 404 handler
> ex. 
> http://timelineserver:8188/badnamespace/v1/timeline/someentity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339060#comment-14339060
 ] 

Zhijie Shen commented on YARN-3087:
---

Thanks for the patch, Li! Some detailed comments about the patch:

1. HierarchicalTimelineEntity is abstract, maybe not necessary.
{code}
// required by JAXB
HierarchicalTimelineEntity() {
  super();
}
{code}

2. Can we mark JAXB methods \@Private?

3. I think rootUnwrapping should be true to be consistent with 
YarnJacksonJaxbJsonProvider. It seems JAXBContextResolver is never used (I 
think the reason is that we are using YarnJacksonJaxbJsonProvider), maybe we 
want to remove the class.
{code}
this.context =
new JSONJAXBContext(JSONConfiguration.natural().rootUnwrapping(false)
.build(), cTypes)
{code}

4. Does it mean if we want to add a filter, we need to hard code here? So 
"hadoop.http.filter.initializers" no longer work? Is it possible to provide 
some similar mechanism to replace what "hadoop.http.filter.initializers" does 
if it doesn't work.
{code}
121   // TODO: replace this by an authentification filter in future.
122   HashMap options = new HashMap();
123   String username = conf.get(HADOOP_HTTP_STATIC_USER,
124   DEFAULT_HADOOP_HTTP_STATIC_USER);
125   options.put(HADOOP_HTTP_STATIC_USER, username);
126   HttpServer2.defineFilter(timelineRestServer.getWebAppContext(),
127   "static_user_filter_timeline",
128   StaticUserWebFilter.StaticUserFilter.class.getName(),
129   options, new String[] {"/*"});
{code}

> [Aggregator implementation] the REST server (web server) for per-node 
> aggregator does not work if it runs inside node manager
> -
>
> Key: YARN-3087
> URL: https://issues.apache.org/jira/browse/YARN-3087
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Fix For: YARN-2928
>
> Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch
>
>
> This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
> aggregator and the associated REST server. It runs fine as a standalone 
> process, but does not work if it runs inside the node manager due to possible 
> collisions of servlet mapping.
> Exception:
> {noformat}
> org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
> v2 not found
>   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
>   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339015#comment-14339015
 ] 

Junping Du commented on YARN-3087:
--

Agree with Vinod that if this is required from JAXB API then we don't have to 
cast it. Thanks [~gtCarrera9] for explanation on this!
Patch looks good to me in overall. One comments is: we have many similar logic 
to cast a MAP to HashMap like below:
{code}
-this.relatedEntities = relatedEntities;
+if (relatedEntities != null && !(relatedEntities instanceof HashMap)) {
+  this.relatedEntities = new HashMap>(relatedEntities);
+} else {
+  this.relatedEntities = (HashMap>) relatedEntities;
+}
{code}
May be we can use Generics to consolidate them.

> [Aggregator implementation] the REST server (web server) for per-node 
> aggregator does not work if it runs inside node manager
> -
>
> Key: YARN-3087
> URL: https://issues.apache.org/jira/browse/YARN-3087
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Fix For: YARN-2928
>
> Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch
>
>
> This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
> aggregator and the associated REST server. It runs fine as a standalone 
> process, but does not work if it runs inside the node manager due to possible 
> collisions of servlet mapping.
> Exception:
> {noformat}
> org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
> v2 not found
>   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
>   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339012#comment-14339012
 ] 

Vinod Kumar Vavilapalli commented on YARN-3025:
---

Coming in very late, apologies.

Some comments:
 - Echoing Bikas's first comment: Today the AMs are expected to maintain their 
own scheduling state. With this you are changing that - part of the scheduling 
state will be remembered but the remaining isn't. We should clearly draw a line 
somewhere, what is it?
 - [~zjshen] did a very good job of dividing the persistence concerns, but what 
is the guarantee that is given to the app writers? "I'll return the list of 
blacklisted nodes whenever I can, but shoot I died, so I can't help you much" 
is not going to cut it. If we want reliable notifications, we should build a 
protocol between AM and RM about the persistence of the blacklisted node list - 
too much of a complexity if you ask me. Why not leave it to the apps?
 - The blacklist information is per application-attempt, and scheduler will 
forget previous application-attempts today. So as I understand it, the patch 
doesn't work.

> Provide API for retrieving blacklisted nodes
> 
>
> Key: YARN-3025
> URL: https://issues.apache.org/jira/browse/YARN-3025
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt
>
>
> We have the following method which updates blacklist:
> {code}
>   public synchronized void updateBlacklist(List blacklistAdditions,
>   List blacklistRemovals) {
> {code}
> Upon AM failover, there should be an API which returns the blacklisted nodes 
> so that the new AM can make consistent decisions.
> The new API can be:
> {code}
>   public synchronized List getBlacklistedNodes()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-02-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3269:

Attachment: YARN-3269.1.patch

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3269.1.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-02-26 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-3269:
---

 Summary: Yarn.nodemanager.remote-app-log-dir could not be 
configured to fully qualified path
 Key: YARN-3269
 URL: https://issues.apache.org/jira/browse/YARN-3269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong


Log aggregation currently is always relative to the default file system, not an 
arbitrary file system identified by URI. So we can't put an arbitrary 
fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338975#comment-14338975
 ] 

Ted Yu commented on YARN-2777:
--

lgtm

> Mark the end of individual log in aggregated log
> 
>
> Key: YARN-2777
> URL: https://issues.apache.org/jira/browse/YARN-2777
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Varun Saxena
>  Labels: log-aggregation
> Attachments: YARN-2777.001.patch, YARN-2777.002.patch
>
>
> Below is snippet of aggregated log showing hbase master log:
> {code}
> LogType: hbase-hbase-master-ip-172-31-34-167.log
> LogUploadTime: 29-Oct-2014 22:31:55
> LogLength: 24103045
> Log Contents:
> Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
> ...
>   at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>   at java.lang.Thread.run(Thread.java:745)
> LogType: hbase-hbase-master-ip-172-31-34-167.out
> {code}
> Since logs from various daemons are aggregated in one log file, it would be 
> desirable to mark the end of one log before starting with the next.
> e.g. with such a line:
> {code}
> End of LogType: hbase-hbase-master-ip-172-31-34-167.log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338968#comment-14338968
 ] 

Varun Saxena commented on YARN-2777:


[~tedyu], made the change. Kindly review

> Mark the end of individual log in aggregated log
> 
>
> Key: YARN-2777
> URL: https://issues.apache.org/jira/browse/YARN-2777
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Varun Saxena
>  Labels: log-aggregation
> Attachments: YARN-2777.001.patch, YARN-2777.002.patch
>
>
> Below is snippet of aggregated log showing hbase master log:
> {code}
> LogType: hbase-hbase-master-ip-172-31-34-167.log
> LogUploadTime: 29-Oct-2014 22:31:55
> LogLength: 24103045
> Log Contents:
> Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
> ...
>   at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>   at java.lang.Thread.run(Thread.java:745)
> LogType: hbase-hbase-master-ip-172-31-34-167.out
> {code}
> Since logs from various daemons are aggregated in one log file, it would be 
> desirable to mark the end of one log before starting with the next.
> e.g. with such a line:
> {code}
> End of LogType: hbase-hbase-master-ip-172-31-34-167.log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338967#comment-14338967
 ] 

Varun Saxena commented on YARN-2777:


[~tedyu], made the change. Kindly review

> Mark the end of individual log in aggregated log
> 
>
> Key: YARN-2777
> URL: https://issues.apache.org/jira/browse/YARN-2777
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Varun Saxena
>  Labels: log-aggregation
> Attachments: YARN-2777.001.patch, YARN-2777.002.patch
>
>
> Below is snippet of aggregated log showing hbase master log:
> {code}
> LogType: hbase-hbase-master-ip-172-31-34-167.log
> LogUploadTime: 29-Oct-2014 22:31:55
> LogLength: 24103045
> Log Contents:
> Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
> ...
>   at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>   at java.lang.Thread.run(Thread.java:745)
> LogType: hbase-hbase-master-ip-172-31-34-167.out
> {code}
> Since logs from various daemons are aggregated in one log file, it would be 
> desirable to mark the end of one log before starting with the next.
> e.g. with such a line:
> {code}
> End of LogType: hbase-hbase-master-ip-172-31-34-167.log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2777:
---
Attachment: YARN-2777.002.patch

> Mark the end of individual log in aggregated log
> 
>
> Key: YARN-2777
> URL: https://issues.apache.org/jira/browse/YARN-2777
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Varun Saxena
>  Labels: log-aggregation
> Attachments: YARN-2777.001.patch, YARN-2777.002.patch
>
>
> Below is snippet of aggregated log showing hbase master log:
> {code}
> LogType: hbase-hbase-master-ip-172-31-34-167.log
> LogUploadTime: 29-Oct-2014 22:31:55
> LogLength: 24103045
> Log Contents:
> Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
> ...
>   at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>   at java.lang.Thread.run(Thread.java:745)
> LogType: hbase-hbase-master-ip-172-31-34-167.out
> {code}
> Since logs from various daemons are aggregated in one log file, it would be 
> desirable to mark the end of one log before starting with the next.
> e.g. with such a line:
> {code}
> End of LogType: hbase-hbase-master-ip-172-31-34-167.log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-02-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338950#comment-14338950
 ] 

Vinod Kumar Vavilapalli commented on YARN-3248:
---

YARN-3025 is related to my first comment above.

> Display count of nodes blacklisted by apps in the web UI
> 
>
> Key: YARN-3248
> URL: https://issues.apache.org/jira/browse/YARN-3248
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screenshot.jpg, apache-yarn-3248.0.patch
>
>
> It would be really useful when debugging app performance and failure issues 
> to get a count of the nodes blacklisted by individual apps displayed in the 
> web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3251:
--
Attachment: YARN-3251.2.patch

Attaching an analogue of the most recent patch against trunk.  I do not believe 
that we will be committing this at this point as [~leftnoteasy] is working on a 
more significant change which will remove the need for it, but I wanted to make 
it available just in case.  For clarity, patch against trunk is 
YARN-3251.2.patch and the patch to commit against 2.6 is 
YARN-3251.2-6-0.4.patch.

> CapacityScheduler deadlock when computing absolute max avail capacity (short 
> term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
> YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338945#comment-14338945
 ] 

Vinod Kumar Vavilapalli commented on YARN-3087:
---

bq. The current solution is a workaround for JAXB resolver, which cannot return 
an interface (Map) type. This work around is consistent with the v1 version of 
our ATS object model (in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/,
 such as TimelineEvent.java, TimelineEntity.java), fixed in YARN-2804. If it's 
needed, maybe we'd like to keep the declarations to be Map, and do the cast in 
the jaxb getter?
Casting it everytime will be expensive. Let's keep it as the patch currently 
does - we are not exposing the fact that it is a HashMap to external world, 
only to Jersey.

> [Aggregator implementation] the REST server (web server) for per-node 
> aggregator does not work if it runs inside node manager
> -
>
> Key: YARN-3087
> URL: https://issues.apache.org/jira/browse/YARN-3087
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Fix For: YARN-2928
>
> Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch
>
>
> This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
> aggregator and the associated REST server. It runs fine as a standalone 
> process, but does not work if it runs inside the node manager due to possible 
> collisions of servlet mapping.
> Exception:
> {noformat}
> org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
> v2 not found
>   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
>   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338914#comment-14338914
 ] 

Hadoop QA commented on YARN-2693:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701122/0006-YARN-2693.patch
  against trunk revision dce8b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1152 javac 
compiler warnings (more than the trunk's current 1151 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6758//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6758//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6758//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6758//console

This message is automatically generated.

> Priority Label Manager in RM to manage application priority based on 
> configuration
> --
>
> Key: YARN-2693
> URL: https://issues.apache.org/jira/browse/YARN-2693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
> 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 
> 0006-YARN-2693.patch
>
>
> Focus of this JIRA is to have a centralized service to handle priority labels.
> Support operations such as
> * Add/Delete priority label to a specified queue
> * Manage integer mapping associated with each priority label
> * Support managing default priority label of a given queue
> * Expose interface to RM to validate priority label
> TO have simplified interface, Priority Manager will support only 
> configuration file in contrast with admin cli and REST. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3251:
--
Attachment: YARN-3251.2-6-0.4.patch

Minor, switch to "Internal", seems to be more common in the codebase

> CapacityScheduler deadlock when computing absolute max avail capacity (short 
> term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
> YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338867#comment-14338867
 ] 

Li Lu commented on YARN-3087:
-

Hi [~djp], thanks for the feedback! I totally understand your concern here. The 
current solution is a workaround for JAXB resolver, which cannot return an 
interface (Map) type. This work around is consistent with the v1 version of our 
ATS object model (in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/,
 such as TimelineEvent.java, TimelineEntity.java), fixed in YARN-2804. If it's 
needed, maybe we'd like to keep the declarations to be Map, and do the cast in 
the jaxb getter? 

> [Aggregator implementation] the REST server (web server) for per-node 
> aggregator does not work if it runs inside node manager
> -
>
> Key: YARN-3087
> URL: https://issues.apache.org/jira/browse/YARN-3087
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Li Lu
> Fix For: YARN-2928
>
> Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch
>
>
> This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
> aggregator and the associated REST server. It runs fine as a standalone 
> process, but does not work if it runs inside the node manager due to possible 
> collisions of servlet mapping.
> Exception:
> {noformat}
> org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
> v2 not found
>   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
>   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338860#comment-14338860
 ] 

Wangda Tan commented on YARN-3251:
--

if opposite opinions -> if no opposite opinions

> CapacityScheduler deadlock when computing absolute max avail capacity (short 
> term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
> YARN-3251.2-6-0.3.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338858#comment-14338858
 ] 

Wangda Tan commented on YARN-3251:
--

LGTM +1, I will commit the patch to branch-2.6 this afternoon if opposite 
opinions. Thanks!

> CapacityScheduler deadlock when computing absolute max avail capacity (short 
> term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
> YARN-3251.2-6-0.3.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338848#comment-14338848
 ] 

Craig Welch commented on YARN-3251:
---

Sorry if that wasn't clear, to reduce risk removed the minor changes in 
CSQueueUtils

> CapacityScheduler deadlock when computing absolute max avail capacity (short 
> term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
> YARN-3251.2-6-0.3.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.

2015-02-26 Thread Prakash Ramachandran (JIRA)
Prakash Ramachandran created YARN-3268:
--

 Summary: timelineserver rest api returns html page for 404 when a 
bad endpoint is used.
 Key: YARN-3268
 URL: https://issues.apache.org/jira/browse/YARN-3268
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prakash Ramachandran


the timelineserver returns a 404 page instead of giving a REST response. this 
interferes with the end user pages which try to retrieve data using REST api. 
this could be due to lack of a 404 handler
ex. 
http://timelineserver:8188/badnamespace/v1/timeline/someentity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3251:
--
Attachment: YARN-3251.2-6-0.3.patch

Removing the csqueueutils

> CapacityScheduler deadlock when computing absolute max avail capacity (short 
> term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
> YARN-3251.2-6-0.3.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338830#comment-14338830
 ] 

Karthik Kambatla commented on YARN-3231:


Thanks for reporting and working on this, [~l201514]. The approach looks 
generally good. Few comments (some nits):
# Rename {{updateRunnabilityonRefreshQueues}} to {{updateRunnabilityOnReload}}? 
And, add a javadoc for when it should be called and what it does.
# javadoc for the newly added private method and the significance of the new 
integer param.
# Call the above method from AllocationReloadListner#onReload after all the 
other queue configs are updated.
# The comment here no longer applies. Remove it? 
{code}
// No more than one app per list will be able to be made runnable, so
// we can stop looking after we've found that many
if (noLongerPendingApps.size() >= maxRunnableApps) {
  break;
}
{code}
# Indentation:
{code}
updateAppsRunnability(appsNowMaybeRunnable,
appsNowMaybeRunnable.size());
{code}
# Newly added tests:
## If it is not too much trouble, can we move them to a new test class 
(TestAppRunnability?) mostly because TestFairScheduler has so many tests 
already. 
## Is it possible to reuse the code between these tests? 
## Should we add tests for when the maxRunnableApps for a user or queue is 
decreased? If you think this might need additional work in the logic as well, I 
am open to filing a follow up JIRA and addressing it there. 


> FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
> job stuck
> --
>
> Key: YARN-3231
> URL: https://issues.apache.org/jira/browse/YARN-3231
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch
>
>
> When a queue is piling up with a lot of pending jobs due to the 
> maxRunningApps limit. We want to increase this property on the fly to make 
> some of the pending job active. However, once we increase the limit, all 
> pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-02-26 Thread Prakash Ramachandran (JIRA)
Prakash Ramachandran created YARN-3267:
--

 Summary: Timelineserver applies the ACL rules after applying the 
limit on the number of records
 Key: YARN-3267
 URL: https://issues.apache.org/jira/browse/YARN-3267
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran


While fetching the entities from timelineserver, the limit is applied on the 
entities to be fetched from leveldb, the ACL filters are applied after this 
(TimelineDataManager.java::getEntities). 
this could mean that even if there are entities available which match the query 
criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3260) NPE if AM attempts to register before RM processes launch event

2015-02-26 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338813#comment-14338813
 ] 

Naganarasimha G R commented on YARN-3260:
-

Hi [~jlowe],
Had a look at the code and some approaches which i can think of are :
* ApplicationMasterService.registerAppAttempt(ApplicationAttemptId) to be 
called in RMAppAttemptImpl.AMLaunchedTransition  instead of 
RMAppAttemptImpl.AttemptStartedTransition and ensuring that ClientToAMToken and 
registerering with ApplicationMasterService in the same block. By doing this we 
can throw InvalidApplicationMasterRequestException if AM tries to register to 
AMS before RMAppAttemptImpl processes RMAppAttempt LAUNCHED event.
* Was thinking of having MultiThreadedDispatcher for processing APP and 
AppAttempt events  similar to the one  in 
SystemMetricsPublisher.MultiThreadedDispatcher with additional modification 
that instead of having {{ "(event.hashCode() & Integer.MAX_VALUE) % 
dispatchers.size();"}} we can think of doing it based on applicationId. This 
can speed up the processing of App events ...

 Was not able to see any other cleaner direct fix for this issue, so was 
wondering whether we need to start looking at the reason for "clusters was 
running behind on processing AsyncDispatcher events". Were these events were 
getting delayed to any particular reason? 

> NPE if AM attempts to register before RM processes launch event
> ---
>
> Key: YARN-3260
> URL: https://issues.apache.org/jira/browse/YARN-3260
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
>
> The RM on one of our clusters was running behind on processing 
> AsyncDispatcher events, and this caused AMs to fail to register due to an 
> NPE.  The AM was launched and attempting to register before the 
> RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token 
> had not been generated yet.  The NPE occurred because the 
> ApplicationMasterService tried to encode the missing token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338782#comment-14338782
 ] 

Hadoop QA commented on YARN-3251:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12700977/YARN-3251.2-6-0.2.patch
  against trunk revision dce8b9c.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6759//console

This message is automatically generated.

> CapacityScheduler deadlock when computing absolute max avail capacity (short 
> term fix for 2.6.1)
> 
>
> Key: YARN-3251
> URL: https://issues.apache.org/jira/browse/YARN-3251
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Craig Welch
>Priority: Blocker
> Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch
>
>
> The ResourceManager can deadlock in the CapacityScheduler when computing the 
> absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338760#comment-14338760
 ] 

Hadoop QA commented on YARN-3255:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700931/YARN-3255-02.patch
  against trunk revision 773b651.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6757//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6757//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6757//console

This message is automatically generated.

> RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support 
> generic options
> ---
>
> Key: YARN-3255
> URL: https://issues.apache.org/jira/browse/YARN-3255
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: YARN-3255-01.patch, YARN-3255-02.patch
>
>
> Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
> generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
> ability to pass generic options in order to specify configuration files or 
> the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338746#comment-14338746
 ] 

Ted Yu commented on YARN-2777:
--

@Varun:
{code}
713   out.println("End of LogType:");
714   out.println(fileType);
{code}
Can you put the above two onto the same line ?

Thanks

> Mark the end of individual log in aggregated log
> 
>
> Key: YARN-2777
> URL: https://issues.apache.org/jira/browse/YARN-2777
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Varun Saxena
>  Labels: log-aggregation
> Attachments: YARN-2777.001.patch
>
>
> Below is snippet of aggregated log showing hbase master log:
> {code}
> LogType: hbase-hbase-master-ip-172-31-34-167.log
> LogUploadTime: 29-Oct-2014 22:31:55
> LogLength: 24103045
> Log Contents:
> Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
> ...
>   at 
> org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>   at java.lang.Thread.run(Thread.java:745)
> LogType: hbase-hbase-master-ip-172-31-34-167.out
> {code}
> Since logs from various daemons are aggregated in one log file, it would be 
> desirable to mark the end of one log before starting with the next.
> e.g. with such a line:
> {code}
> End of LogType: hbase-hbase-master-ip-172-31-34-167.log
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338736#comment-14338736
 ] 

Ted Yu commented on YARN-3025:
--

Ping [~zjshen]

> Provide API for retrieving blacklisted nodes
> 
>
> Key: YARN-3025
> URL: https://issues.apache.org/jira/browse/YARN-3025
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt
>
>
> We have the following method which updates blacklist:
> {code}
>   public synchronized void updateBlacklist(List blacklistAdditions,
>   List blacklistRemovals) {
> {code}
> Upon AM failover, there should be an API which returns the blacklisted nodes 
> so that the new AM can make consistent decisions.
> The new API can be:
> {code}
>   public synchronized List getBlacklistedNodes()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration

2015-02-26 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2693:
--
Attachment: 0006-YARN-2693.patch

Attaching a minimal version of Application Priority manager where only 
configuration support is present. YARN-3250 on longer run will handle admin cli 
and REST support.

> Priority Label Manager in RM to manage application priority based on 
> configuration
> --
>
> Key: YARN-2693
> URL: https://issues.apache.org/jira/browse/YARN-2693
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
> 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 
> 0006-YARN-2693.patch
>
>
> Focus of this JIRA is to have a centralized service to handle priority labels.
> Support operations such as
> * Add/Delete priority label to a specified queue
> * Manage integer mapping associated with each priority label
> * Support managing default priority label of a given queue
> * Expose interface to RM to validate priority label
> TO have simplified interface, Priority Manager will support only 
> configuration file in contrast with admin cli and REST. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-02-26 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338687#comment-14338687
 ] 

Naganarasimha G R commented on YARN-3039:
-

Hi [~djp]
bq. Another idea (from Vinod in offline discussion) is to add a blocking call 
in AMRMClient to get aggregator address directly from RM
+1 for this approach. Also if NM uses this new blocking call in AMRMClient to 
get aggregator address then there might not be any race conditions for posting 
AM container's life cycle events by NM immediately after creation of 
appAggregator through Aux service.

bq. In addition, if adding a new API in AMRMClient can be accepted, NM will use 
TimelineClient too so can handle service discovery automatically.
Are we just adding a method to get the  aggregator address aggregator address ? 
or what other API's are planned ?

bq. NM will notify RM that this new appAggregator is ready for use in next 
heartbeat to RM (missing in this patch).
bq.  RM verify the out of service for this app aggregator first and kick off 
rebind appAggregator to another NM's perNodeAggregatorService in next heartbeat 
comes.
I beleive the idea of using AUX service was to to decouple NM and Timeline 
service. If NM will notify RM about new appAggregator creation (based on AUX 
service) then basically NM should be aware of PerNodeAggregatorServer is 
configured as AUX service, and and if it supports rebinding appAggregator for 
failure then it should be able to communicate with  this Auxservice too, 
whether would this be clean approach?

I also feel we need to support  to start per app aggregator only if app 
requests for it (Zhijie also had mentioned abt this). If not we can make use of 
one default aggregator for all these kind of apps launched in NM, which is just 
used to post container entities from different NM's for these apps.

Any discussions happened wrt RM having its own Aggregator ? I feel it would be 
better for RM to have it as it need not depend on any NM's to post any entities

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> ---
>
> Key: YARN-3039
> URL: https://issues.apache.org/jira/browse/YARN-3039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, YARN-3039-no-test.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >