[jira] [Updated] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server

2015-08-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4044:
--
Attachment: 0002-YARN-4044.patch

Rebasing the patch as YARN-4014 is committed. 
Verified all cases in a real cluster also.

[~rohithsharma] cud u please take a look on this patch. Thank You.

> Running applications information changes such as movequeue is not published 
> to TimeLine server
> --
>
> Key: YARN-4044
> URL: https://issues.apache.org/jira/browse/YARN-4044
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch
>
>
> SystemMetricsPublisher need to expose an appUpdated api to update any change 
> for a running application.
> Events can be 
>   - change of queue for a running application.
> - change of application priority for a running application.
> This ticket intends to handle both RM and timeline side changes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710720#comment-14710720
 ] 

Rohith Sharma K S commented on YARN-3250:
-

Thanks [~sunilg] for the review.. 
bq. Also do we need to write any error for failure cases of 
refreshClusterMaxPriority with RMAuditLogger?
On exception {{logAndWrapException}} has been called which log in error for 
failures cases.

bq. Could you please add rm.stop() at the end of 
testAdminRefreshClusterMaxPriority
Since *rm* instance is at class object, during tearDown {{rm.stop}} has been 
called. So need not explicitely call this.

> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering

2015-08-24 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710716#comment-14710716
 ] 

sandflee commented on YARN-4051:


could anyone help to review it?

> ContainerKillEvent is lost when container is  In New State and is recovering
> 
>
> Key: YARN-4051
> URL: https://issues.apache.org/jira/browse/YARN-4051
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: sandflee
>Assignee: sandflee
>Priority: Critical
> Attachments: YARN-4051.01.patch, YARN-4051.02.patch, 
> YARN-4051.03.patch
>
>
> As in YARN-4050, NM event dispatcher is blocked, and container is in New 
> state, when we finish application, the container still alive even after NM 
> event dispatcher is unblocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4078) Unchecked typecast to AbstractYarnScheduler in AppInfo

2015-08-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710703#comment-14710703
 ] 

Varun Saxena commented on YARN-4078:


In the main code, its guarded at places other than the issue raised here. 

> Unchecked typecast to AbstractYarnScheduler in AppInfo
> --
>
> Key: YARN-4078
> URL: https://issues.apache.org/jira/browse/YARN-4078
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
>
> Currently getPendingResourceRequestForAttempt is present in 
> {{AbstractYarnScheduler}}.
> *But in AppInfo,  we are calling this method by typecasting it to 
> AbstractYarnScheduler, which is incorrect.*
> Because if a custom scheduler is to be added, it will implement 
> YarnScheduler, not AbstractYarnScheduler.
> This method should be moved to YarnScheduler or it should have a guarded 
> check like in other places (RMAppAttemptBlock.getBlackListedNodes) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710669#comment-14710669
 ] 

Hudson commented on YARN-4014:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #307 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/307/])
YARN-4014. Support user cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java


> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-08-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710651#comment-14710651
 ] 

Varun Saxena commented on YARN-4053:


Looking at the issues involved, IMO we should impose restriction on the client 
so that it does not mix longs and doubles. 

> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4053-YARN-2928.01.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-08-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710638#comment-14710638
 ] 

Varun Saxena commented on YARN-4053:


There was a suggestion that we can support only longs. Would supporting only 
longs not cause any impact to potential users of ATS ?
longs however should cover most of the metrics(as of now I can’t think of any 
where decimals would be of great importance).
If we do this, I think TimelineMetric object should be changed to accept only 
java.lang.Long and not java.lang.Number…
Looping [~vinodkv] to get his opinion on this as well.
 
Although, is it unfair to ask client to send values consistently ?
Can’t we document this and enforce this restriction. And if client does not 
comply, it cannot expect consistent results. This can be the contract between 
ATS and its clients.
Major concern here though would be that it won’t be possible to enforce this 
restriction programmatically, neither at the client side nor at the server side.
 
*Possible Solution :*
There is one possible solution though if enforcing this restriction is not 
viable. The real problem in both the solutions would come in applying metric 
filters, if data is inconsistent.
So for this, we can use approach 2(include type in column qualifier) and then 
insert OR filters covering both the column qualifiers for same metric.
 
I will elaborate this with an example.
Let us say we have a metric called JOB_ELAPSED_TIME and client can report both 
integral and floating point values for it(say). With approach 2, we will have 2 
column qualifiers for this metric i.e.  “ JOB_ELAPSED_TIME=L” (for longs) and 
“JOB_ELAPSED_TIME=D” (for doubles).
Now, when a query comes with metric filter value in integer format i.e. 
something like JOB_ELAPSED_TIME > 40 can be transformed to corresponding HBase 
filter of the form (“JOB_ELAPSED_TIME=L” > 40 OR “JOB_ELAPSED_TIME=D” > 40.0).
 i.e. a filter list of the form (“m1” > 10 AND “m2” < 5 AND “m3”=4) would be 
transformed to ((“m1=L” > 10 OR “m1=D”  > 10.0) AND (“m2=L” < 5 OR “m2=D” < 
5.0) AND (“m3=L” = 4 OR “m3=D” = 4.0)).
 
If filter value is in decimal format then we will have to make additional 
changes. If filter is something like JOB_ELAPSED_TIME > 40.75 it will have to 
be converted to (“JOB_ELAPSED_TIME=L” >= 41 OR “JOB_ELAPSED_TIME=D” > 40.75). 
As you can see here, while matching a double value against column qualifier 
storing longs, I would need increase the value to closest integer and change 
filter to >=. Likewise changes will be required for < (less than) and equal 
to(=) comparison as well.
 
However, I am not sure whether adding too many filters will cause any 
performance issue for HBase or not. Because with this solution, we will in 
essence be doubling the size of metric filters.
 
One thing we need to note though is that if we do adopt approach 2(including 
type in column qualifier), regex comparison might become an issue. Because 
theoretically regular expressions can become quite complex, so programmatically 
interpreting a regex and transforming it in a manner where it takes both long 
related column qualifier and double related column qualifier might induce bugs.
Maybe we can just support wildcard match(\*) or just do with prefix and 
substring filters.
 
Thoughts ?

However, we may want to match against only the latest version of the value for 
a metric.
In that case, the solution suggested above won’t work.

> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4053-YARN-2928.01.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-08-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710633#comment-14710633
 ] 

Varun Saxena commented on YARN-4053:


Wanted to discuss so that we can reach a consensus on how to handle YARN-4053.

*Solution 1*: We can add a 1 byte flag as part of the metric value indicating 
whether we are storing integral value(0) or floating point value(1).
*Solution 2* : Another solution suggested is that type can be part of column 
qualifier say something like metric=l where "l" indicates long.

Another solution is to store everything as double. But would it be fair to 
impose this restriction on client while it reads data from ATS ? What if client 
is expecting a long and unable to handle a double.


The major issues surrounding different approaches are that what if client does 
not report metric values consistently(same metric data type). 

Now let us look at the scenarios where metric values come into picture.
*1.* While writing entity to HBase : Here, we need to consider that for the 
same entity, a particular metric can be reported in multiple write calls. 
So it is possible that in one write, all values for a particular metric are 
reported as long and in another write, all as floats. This can create 
inconsistency in both the solutions above (have different flags and encodings 
for same metric in Solution 1 and different column qualifiers for same metric 
in Solution 2).
We can add a valuetype field in TimelineMetric which indicates whether a set of 
values are long or float. And throw an exception in TimelineMetric at the time 
of adding value if types are not consistent. This will atleast ensure same data 
type for a particular write call.
But even here client should make sure that across writes they make sure data 
types are consistent. I think getting a row to find out column qualifier name 
or flags attached with the values wont be a viable option. 
So some sort of restriction on the part of the client(so that they send 
consistent data types for same metric) will have to be placed whether we adopt 
solution 1 or solution 2.
Is there some HBase API I am not aware of ?

*2.* While reading entity from HBase in the absence of any HBase filter : In 
this case there should be no issues in either solution 1 or solution 2. Because 
we read everything as bytes from HBase. We can do the appropriate conversion 
based on the flag or column qualifier name then.

*3.* While reading entity from HBase in the presence of HBase filters : We can 
have 2 kinds of HBase filters. One filter is to retrieve specific columns(to 
determine which metrics to return) and other one is to trim down the 
rows/entities to be returned based on metric value comparison.
The first class of filters which determine which columns to return, those 
should work in both the cases(Solution 1 and 2). 
Even in solution 2, because we use prefix filters as of now. If we use regex 
matching though, it might make things more complicated in case of Solution 2.

For the second set of filters, we would require to know data type of the metric 
value in both the proposed solutions. Because SingleColumnValueFilter requires 
exact column qualifier name(for Solution 2). And for solution 1 also we should 
know the data type of metric so that we can append the value to be compared 
against with the flag(so that BinaryComparator can be used).
If we add filters to our data object model, we can probably include data type 
in filters as well. But that again is dependent on client, whether it sends 
correct data type or not.


As we saw in point 1, we need to impose restriction on the client that it sends 
same data type for every metric. Frankly it should be easy for client as well. 
If for a metric, client expects float values, it will most likely use Double or 
Float.

Thoughts ? Or some other suggestions which can preclude the need for such a 
restriction. 

> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4053-YARN-2928.01.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710631#comment-14710631
 ] 

Hudson commented on YARN-4014:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1035 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1035/])
YARN-4014. Support user cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java


> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4078) Unchecked typecast to AbstractYarnScheduler in AppInfo

2015-08-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710607#comment-14710607
 ] 

Rohith Sharma K S commented on YARN-4078:
-

Recently in ApplicationMasterService similar issue got fixed YARN-3986. Could 
you please verify anywhere else such issues are exist so that all together can 
be combined, discussed and fixed in the same JIRA?

> Unchecked typecast to AbstractYarnScheduler in AppInfo
> --
>
> Key: YARN-4078
> URL: https://issues.apache.org/jira/browse/YARN-4078
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
>
> Currently getPendingResourceRequestForAttempt is present in 
> {{AbstractYarnScheduler}}.
> *But in AppInfo,  we are calling this method by typecasting it to 
> AbstractYarnScheduler, which is incorrect.*
> Because if a custom scheduler is to be added, it will implement 
> YarnScheduler, not AbstractYarnScheduler.
> This method should be moved to YarnScheduler or it should have a guarded 
> check like in other places (RMAppAttemptBlock.getBlackListedNodes) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4078) Unchecked typecast to AbstractYarnScheduler in AppInfo

2015-08-24 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-4078:
---

 Summary: Unchecked typecast to AbstractYarnScheduler in AppInfo
 Key: YARN-4078
 URL: https://issues.apache.org/jira/browse/YARN-4078
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor


Currently getPendingResourceRequestForAttempt is present in 
{{AbstractYarnScheduler}}.
*But in AppInfo,  we are calling this method by typecasting it to 
AbstractYarnScheduler, which is incorrect.*

Because if a custom scheduler is to be added, it will implement YarnScheduler, 
not AbstractYarnScheduler.

This method should be moved to YarnScheduler or it should have a guarded check 
like in other places (RMAppAttemptBlock.getBlackListedNodes) 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710579#comment-14710579
 ] 

Hudson commented on YARN-4014:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #302 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/302/])
YARN-4014. Support user cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java


> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710571#comment-14710571
 ] 

Rohith Sharma K S commented on YARN-4014:
-

Thanks Jian He and Sunil G for detailed review and commit.

> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3970) REST api support for Application Priority

2015-08-24 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-3970:
---

Assignee: Naganarasimha G R  (was: Rohith Sharma K S)

Offline [~Naganarasimha] pinged for taking over this. Assigning to 
Naganarasimha G R.  Expecting patch !!

> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710556#comment-14710556
 ] 

Hudson commented on YARN-4014:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8346 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8346/])
YARN-4014. Support user cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto


> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710530#comment-14710530
 ] 

Jian He commented on YARN-4014:
---

Thanks Sunil for reviewing the patch !

> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710452#comment-14710452
 ] 

Sunil G commented on YARN-4014:
---

Yes.  Thanks for clarifying Rohith and Jian. +1.

> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

2015-08-24 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710384#comment-14710384
 ] 

Li Lu commented on YARN-4061:
-

I just realized that if we implement our logger on HDFS, we need some 
mechanisms to identify the fault tolerant writer so that the storage writer can 
find the correct redo-log upon the next start. Currently, we're organizing 
writers within collector managers. Each node will have one collector manager. 
Therefore, we may need to identify the node in the writer. If in future we plan 
to put collectors into special containers, these collectors will also need 
similar mechanism. This problem does not exist in a single server model (like 
ATS v1) since it only has one writer. 

For now, during the process of building this FT writer, I propose to use local 
file system since it can trivially separate the writers under our 
one-node-one-writer model. We can add HDFS support in future, especially when 
we put our timeline writers into containers (by then we definitely need some 
identification mechanisms for the writers). 

> [Fault tolerance] Fault tolerant writer for timeline v2
> ---
>
> Key: YARN-4061
> URL: https://issues.apache.org/jira/browse/YARN-4061
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710344#comment-14710344
 ] 

Naganarasimha G R commented on YARN-4058:
-

thanks for review and commit [~sjlee0] 

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: YARN-2928
>
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2154) FairScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2015-08-24 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710324#comment-14710324
 ] 

Arun Suresh commented on YARN-2154:
---

Thanks for going thru the patch [~adhoot], [~kasha] and [~bpodgursky],

bq. ..The previous ordering is better since if you happen to choose something 
just above its fairShare, after preemption it may go below and cause additional 
preemption, causing excessive thrashing of resources.
This will not happen, as the current patch has a check to only preempt from an 
app, a container above its fair/min share.

Am still working on the unit tests..

> FairScheduler: Improve preemption to preempt only those containers that would 
> satisfy the incoming request
> --
>
> Key: YARN-2154
> URL: https://issues.apache.org/jira/browse/YARN-2154
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Arun Suresh
>Priority: Critical
> Attachments: YARN-2154.1.patch
>
>
> Today, FairScheduler uses a spray-gun approach to preemption. Instead, it 
> should only preempt resources that would satisfy the incoming request. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710288#comment-14710288
 ] 

Sangjin Lee commented on YARN-4058:
---

The unit test result seems fine to me: 
https://builds.apache.org/job/PreCommit-YARN-Build/8902/testReport/

If there is no objection, I'll commit this patch soon.

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710283#comment-14710283
 ] 

Hadoop QA commented on YARN-4058:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 19s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 58s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 45s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 45s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 20s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  43m 19s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  90m  2s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart 
|
|   | org.apache.hadoop.yarn.server.resourcemanager.TestRM |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752068/YARN-4058.YARN-2928.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 9d14947 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8902/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8902/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8902/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8902/console |


This message was automatically generated.

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.

2015-08-24 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710259#comment-14710259
 ] 

Chang Li commented on YARN-4045:


[~leftnoteasy] thanks for sharing ideas! However, correct me if I am wrong, in 
your suggested way, the negative available memory could still stay for a while 
right ? (after a node is disconnected and before we try to allocate that 
reserved). Is it too expensive to check queue limits for every node disconnect? 
Or is it possible to make reserved container usage not count toward calculation 
of available memory?

> Negative avaialbleMB is being reported for root queue.
> --
>
> Key: YARN-4045
> URL: https://issues.apache.org/jira/browse/YARN-4045
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Rushabh S Shah
>
> We recently deployed 2.7 in one of our cluster.
> We are seeing negative availableMB being reported for queue=root.
> This is from the jmx output:
> {noformat}
> 
> ...
> -163328
> ...
> 
> {noformat}
> The following is the RM log:
> {noformat}
> 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used= 
> cluster=
> 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used= 
> cluster=
> 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used= 
> cluster=
> 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used= 
> cluster=
> 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used= 
> cluster=
> 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used= 
> cluster=
> 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used= 
> cluster=
> 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used= 
> cluster=
> 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used= 
> cluster=
> 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used= 
> cluster=
> 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used= 
> cluster=
> 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used= 
> cluster=
> 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used= 
> cluster=
> 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used= 
> cluster=
> 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used= 
> cluster=
> 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used= 
> cluster=
> 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used= 
> cluster=
> 2015-08-10 14:42:44,487 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used= 
> cluster=
> 2015-08-10 14:42:44,886 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used= 
> cluster=
> 2015-08-10 14:42:44,886 [Resourc

[jira] [Created] (YARN-4077) FairScheduler Reservation should wait for most relaxed scheduling delay permitted before issuing reservation

2015-08-24 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-4077:
---

 Summary: FairScheduler Reservation should wait for most relaxed 
scheduling delay permitted before issuing reservation
 Key: YARN-4077
 URL: https://issues.apache.org/jira/browse/YARN-4077
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


Today if an allocation has a node local request that allows for relaxation, we 
do not wait for the relaxation delay before issuing the reservation. This can 
be too aggressive. Instead we should allow the scheduling delays of relaxation 
to expire before we choose to allow reserving a node for the container. This 
allows for the request to be satisfied on a different node instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4076) FairScheduler does not allow AM to choose which containers to preempt

2015-08-24 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-4076:
---

 Summary: FairScheduler does not allow AM to choose which 
containers to preempt
 Key: YARN-4076
 URL: https://issues.apache.org/jira/browse/YARN-4076
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot


Capacity scheduler allows for AM to choose which containers will be preempted. 
See comment about corresponding work pending for FairScheduler 
https://issues.apache.org/jira/browse/YARN-568?focusedCommentId=13649126&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13649126



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710172#comment-14710172
 ] 

Sangjin Lee commented on YARN-4074:
---

Agreed. I will look to do some refactoring to make this simpler.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-24 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710140#comment-14710140
 ] 

Subru Krishnan commented on YARN-2884:
--

[~jlowe], let me try to answer your question as this approach will not affect 
applications that ship their own configs. To run MapReduce in our cluster where 
AMRMProxy is enabled, the only change we made was to update 
_resourcemanager.scheduler.address_ value to point to the _amrmproxy.address_. 
We thought this is acceptable as AMRMProxy (if enabled) is the Scheduler proxy 
for the apps and moreover quite easy to accomplish as we only had to update the 
MapReduce config only on our gateway machines from where MapReduce jobs are 
submitted. The rolling upgrade reliability as you rightly pointed out is 
maintained as MapReduce configs continues to be independent of node configs. 
FYI we also validated with Spark which exhibits the same characteristics.
Ideally I agree that application configs should be decoupled from the server 
side configs for multiple reasons like rolling upgrades, security, etc but 
unfortunately many applications (REEF, Distributed Shell, etc) depend on the 
node configs today. So in summary the HADOOP_CONF_DIR modification will address 
applications that pick up configs from nodes without breaking self contained 
applications as the modified HADOOP_CONF_DIR does not show up on the latter's 
classpath.
 

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
> YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, 
> YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-24 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710142#comment-14710142
 ] 

Li Lu commented on YARN-4074:
-

bq. This also implies that the canonical stores for the flows and the flow runs 
are the flow activity table and the flow run table respectively...
Ah right... This makes the unified interface less appealing since we may need 
to branch a lot with the getEntities method. However, if we proceed in this 
direction, maybe we'd like to do a deeper refactor of that code. 

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710133#comment-14710133
 ] 

Sangjin Lee commented on YARN-4074:
---

Actually the backend will need to differentiate the queries for flows and flow 
runs from those for other entities, right? For the HBase backend, queries for 
the flows will need to be sent to the flow activity table, those for the flow 
runs will be sent to the flow run table. This also implies that the canonical 
stores for the flows and the flow runs are the flow activity table and the flow 
run table respectively...

We've already gone that way a little bit with the application table, and we 
need to be comfortable with that for us to implement the latter approach.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-24 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710102#comment-14710102
 ] 

Li Lu commented on YARN-4074:
-

I'd incline to use the latter approach to retrieve flows and flow runs, since 
we don't actually differentiate them on the backend. I also incline to keep the 
RESTful API layer simple, and to wrap it with a js native library for web UIs. 
In this way we can separate the process of TimelineEntity retrieval and the 
context of the timeline entities (e.g. is it a flow, or a application, or a DAG 
of applications?). It's also much easier to maintain this interface IMO. 

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2884) Proxying all AM-RM communications

2015-08-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710089#comment-14710089
 ] 

Jason Lowe commented on YARN-2884:
--

Note that not all applications pick up configs from the nodes, and I don't see 
how relying on a HADOOP_CONF_DIR modification will address them.  For example, 
our setup runs a MapReduce job as a self-contained application -- it does not 
reference the jars nor the configs on the cluster nodes.  This makes rolling 
upgrades more reliable, otherwise a config change on the node could break old 
code in a job or new code in a job could break on an old node config.  This 
happened in practice which is why our jobs no longer rely on confs from the 
nodes.  HADOOP_CONF_DIR does _not_ show up on the classpath for such 
applications, otherwise they would be relying on server-side configs and lead 
to the rolling upgrade instabilities.

Any ideas on how to address the self-contained application scenario?

> Proxying all AM-RM communications
> -
>
> Key: YARN-2884
> URL: https://issues.apache.org/jira/browse/YARN-2884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Kishore Chaliparambil
> Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, 
> YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, 
> YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch
>
>
> We introduce the notion of an RMProxy, running on each node (or once per 
> rack). Upon start the AM is forced (via tokens and configuration) to direct 
> all its requests to a new services running on the NM that provide a proxy to 
> the central RM. 
> This give us a place to:
> 1) perform distributed scheduling decisions
> 2) throttling mis-behaving AMs
> 3) mask the access to a federation of RMs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3717) Improve RM node labels web UI

2015-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710050#comment-14710050
 ] 

Hadoop QA commented on YARN-3717:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m 33s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   2m 58s | Site still builds. |
| {color:red}-1{color} | checkstyle |   0m 57s | The applied patch generated  
18 new checkstyle issues (total was 0, now 18). |
| {color:green}+1{color} | whitespace |   0m  6s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   7m 30s | The patch appears to introduce 7 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   5m 54s | Tests failed in 
hadoop-yarn-client. |
| {color:red}-1{color} | yarn tests |   0m 12s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   3m  1s | Tests failed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:red}-1{color} | yarn tests |   0m 14s | Tests failed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |   0m 17s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  62m  4s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
| Failed unit tests | hadoop.yarn.client.api.impl.TestNMClient |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
|   | hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices |
|   | hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp |
| Failed build | hadoop-yarn-common |
|   | hadoop-yarn-server-common |
|   | hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752043/YARN-3717.20150824-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / b5ce87f |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8901/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8901/console |


This message was automatically generated.

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch
>
>
> 1> Add the default-node-Label expres

[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710040#comment-14710040
 ] 

Hadoop QA commented on YARN-4058:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m  0s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 20s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 33s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 48s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   6m 47s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  23m 55s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  71m  1s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
 |
|   | hadoop.yarn.server.resourcemanager.TestApplicationCleanup |
|   | hadoop.yarn.server.resourcemanager.TestRM |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
|   | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens |
|   | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
 |
|   | hadoop.yarn.server.resourcemanager.TestMoveApplication |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterService |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
 |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
 |
|   | hadoop.yarn.server.resourcemanager.TestRMAdminService |
|   | hadoop.yarn.server.resourcemanager.resourcetracker.TestRMNMRPCResponseId |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | 
hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 |
| Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestRMHA |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752068/YARN-4058.YARN-2928.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 9d14947 |
| hadoop-yarn-server-nodemanager te

[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization

2015-08-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710001#comment-14710001
 ] 

Varun Saxena commented on YARN-3011:


Refer to MAPREDUCE-3634

> NM dies because of the failure of resource localization
> ---
>
> Key: YARN-3011
> URL: https://issues.apache.org/jira/browse/YARN-3011
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Wang Hao
>Assignee: Varun Saxena
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: YARN-3011.001.patch, YARN-3011.002.patch, 
> YARN-3011.003.patch, YARN-3011.004.patch
>
>
> NM dies because of IllegalArgumentException when localize resource.
> 2014-12-29 13:43:58,699 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Downloading public rsrc:{ 
> hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
>  1416997035456, FILE, null }
> 2014-12-29 13:43:58,699 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Downloading public rsrc:{ 
> hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
>  1419831474153, FILE, null }
> 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
> at org.apache.hadoop.fs.Path.(Path.java:135)
> at org.apache.hadoop.fs.Path.(Path.java:94)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
>   
> at java.lang.Thread.run(Thread.java:745)
> 2014-12-29 13:43:58,701 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
> Initializing user hadoop
> 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
> connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization

2015-08-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1471#comment-1471
 ] 

Varun Saxena commented on YARN-3011:


[~djp], sorry had missed your comment.
I was under a similar impression when I wrote the comment in January.

But actually all daemons including node manager set 
yarn.dispatcher.exit-on-error configuration explicitly to true in serviceInit. 
{code}
conf.setBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY, true);
{code}

That means the configuration value is completely disregarded.
The default value of false is meant for test cases to avoid JVM exit. This is 
clearly documented in Dispatcher.java. This configuration being an internal 
configuration is not included in yarn-default.xml either.
{code}
  // Configuration to make sure dispatcher crashes but doesn't do system-exit in
  // case of errors. By default, it should be false, so that tests are not
  // affected. For all daemons it should be explicitly set to true so that
  // daemons can crash instead of hanging around.
  public static final String DISPATCHER_EXIT_ON_ERROR_KEY =
  "yarn.dispatcher.exit-on-error";
{code}

We can probably set this config to true in daemons only if 
yarn.dispatcher.exit-on-error config is not set in config file. Thoughts ?
But is there any real use case for it ? A recoverable exception should be 
caught and handled and NOT leaked through to AsyncDispatcher. And a non 
recoverable one should lead to a crash anyways.
cc [~djp], [~jianhe]

> NM dies because of the failure of resource localization
> ---
>
> Key: YARN-3011
> URL: https://issues.apache.org/jira/browse/YARN-3011
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Wang Hao
>Assignee: Varun Saxena
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0
>
> Attachments: YARN-3011.001.patch, YARN-3011.002.patch, 
> YARN-3011.003.patch, YARN-3011.004.patch
>
>
> NM dies because of IllegalArgumentException when localize resource.
> 2014-12-29 13:43:58,699 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Downloading public rsrc:{ 
> hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
>  1416997035456, FILE, null }
> 2014-12-29 13:43:58,699 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Downloading public rsrc:{ 
> hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
>  1419831474153, FILE, null }
> 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.IllegalArgumentException: Can not create a Path from an empty string
> at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
> at org.apache.hadoop.fs.Path.(Path.java:135)
> at org.apache.hadoop.fs.Path.(Path.java:94)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
>   
> at java.lang.Thread.run(Thread.java:745)
> 2014-12-29 13:43:58,701 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
> Initializing user hadoop
> 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Exiting, bbye..
> 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
> connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4073) Unused ApplicationACLsManager in ContainerManagerImpl

2015-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709996#comment-14709996
 ] 

Hadoop QA commented on YARN-4073:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 19s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 49s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 39s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 15s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   7m 32s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  45m 52s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752052/YARN-4073.20150824-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b5ce87f |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8899/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8899/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8899/console |


This message was automatically generated.

> Unused ApplicationACLsManager in ContainerManagerImpl
> -
>
> Key: YARN-4073
> URL: https://issues.apache.org/jira/browse/YARN-4073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4073.20150824-1.patch
>
>
> Unused ApplicationACLsManager in ContainerManagerImpl. Seems like when 
> NMContext was introduced ACLsManager was not completely removed from 
> ContainerManagerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3717) Improve RM node labels web UI

2015-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709921#comment-14709921
 ] 

Hadoop QA commented on YARN-3717:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  24m 47s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   2m 58s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 38s | The applied patch generated  3 
new checkstyle issues (total was 16, now 18). |
| {color:red}-1{color} | whitespace |   0m  6s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 17s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   5m 10s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m 15s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  53m 37s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 123m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.client.TestResourceTrackerOnHA |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
|   | hadoop.yarn.client.api.impl.TestAMRMClient |
|   | hadoop.yarn.client.api.impl.TestNMClient |
|   | hadoop.yarn.client.TestApplicationMasterServiceProtocolOnHA |
|   | hadoop.yarn.client.cli.TestYarnCLI |
|   | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens |
|   | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
|   | hadoop.yarn.server.resourcemanager.TestRMHA |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
 |
|   | hadoop.yarn.server.resourcemanager.TestApplicationACLs |
|   | hadoop.yarn.server.resourcemanager.TestRM |
|   | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | hadoop.yarn.server.resourcemanager.TestClientRMService |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12751920/YARN-3717.20150822-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / feaf034 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job

[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709868#comment-14709868
 ] 

Sangjin Lee commented on YARN-4058:
---

LGTM. Once the jenkins comes back and unless I hear objections, I'll commit the 
patch.

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4058:

Attachment: YARN-4058.YARN-2928.002.patch

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4058:

Attachment: (was: YARN-4058.YARN-2928.002.patch)

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-08-24 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709856#comment-14709856
 ] 

MENG DING commented on YARN-3769:
-

[~leftnoteasy], for better tracking purposes, would it be better to update the 
title of this JIRA to something more general, e.g., *CapacityScheduler: Improve 
preemption to preempt only those containers that would satisfy the incoming 
request* (similar to YARN-2154)? This ticket can then be used to address 
preemption ping-pong issue for both new container request and container 
resource increase request.

Besides the proposal that you have presented, an alternative solution to 
consider is: once we collect the list of preemptable containers, we immediately 
have a *dry run* of the scheduling algorithm to match the preemptable resources 
against outstanding new/increase resource requests. We then only preempt the 
resources that can find a match.

Thoughts?

Meng


> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Wangda Tan
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709860#comment-14709860
 ] 

Naganarasimha G R commented on YARN-4058:
-

Oops, my mistake deleting and uploading new one

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority

2015-08-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709857#comment-14709857
 ] 

Jian He commented on YARN-4014:
---

bq. I think there would NOT ocur any possibility where currentAttempt has old 
priority. 
I think this is true.

> Support user cli interface in for Application Priority
> --
>
> Key: YARN-4014
> URL: https://issues.apache.org/jira/browse/YARN-4014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 
> 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 
> 0004-YARN-4014.patch
>
>
> Track the changes for user-RM client protocol i.e ApplicationClientProtocol 
> changes and discussions in this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709843#comment-14709843
 ] 

Sangjin Lee commented on YARN-4058:
---

Thanks for the update. How about flipping the order of {{null}} and 
{{context.getApplications().putIfAbsent(applicationID, application)}}? :)

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709839#comment-14709839
 ] 

Sangjin Lee commented on YARN-4074:
---

The queries we will need to support are as follows (let me know if you believe 
it's not accurate):

- given cluster, query the most recent N flows (from the flow activity table)
- (optionally) given cluster, user, flow id, query all flow runs

In terms of the implementation, there are two approaches. We can either define 
specific methods for querying for flow and flow runs, and implement them, or 
reuse the {{getEntities()}} method to implement them.

With the former approach, we might be having a proliferation of methods that 
are specific to types. On the other hand with the latter, the API may remain 
clean but the implementation would become messier with more if-else type of 
code.

Personally I'm slightly leaning towards the latter, but I'd love others' 
opinion.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4058:

Attachment: YARN-4058.YARN-2928.002.patch

Hi [~sjlee0], attaching a patch as per your review comment can you please have 
a look at it ?

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch, 
> YARN-4058.YARN-2928.002.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4058:

Description: 
# TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
# In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
created and then checked whether it exists in context.getApplications(). 
everytime ApplicationImpl is created state machine is intialized and 
TimelineClient is created which is required only if added to the context.
# Remove unused imports in TimelineServiceV2Publisher & 
TestSystemMetricsPublisherForV2.java

  was:
# TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
# In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
created and then checked whether it exists in context.getApplications(). 
everytime ApplicationImpl is created state machine is intialized and 
TimelineClient is created which is required only if added to the context.


> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.
> # Remove unused imports in TimelineServiceV2Publisher & 
> TestSystemMetricsPublisherForV2.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709817#comment-14709817
 ] 

Sangjin Lee commented on YARN-4058:
---

Sounds good. I'll review it promptly when you post the updated patch. Thanks!

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-08-24 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4075:
-

 Summary: [reader REST API] implement support for querying for 
flows and flow runs
 Key: YARN-4075
 URL: https://issues.apache.org/jira/browse/YARN-4075
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-08-24 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-4075:
--

Assignee: Varun Saxena

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-24 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee reassigned YARN-4074:
-

Assignee: Sangjin Lee

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-24 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4074:
-

 Summary: [timeline reader] implement support for querying for 
flows and flow runs
 Key: YARN-4074
 URL: https://issues.apache.org/jira/browse/YARN-4074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


Implement support for querying for flows and flow runs.

We should be able to query for the most recent N flows, etc.

This includes changes to the {{TimelineReader}} API if necessary, as well as 
implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4073) Unused ApplicationACLsManager in ContainerManagerImpl

2015-08-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4073:

Attachment: YARN-4073.20150824-1.patch

> Unused ApplicationACLsManager in ContainerManagerImpl
> -
>
> Key: YARN-4073
> URL: https://issues.apache.org/jira/browse/YARN-4073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4073.20150824-1.patch
>
>
> Unused ApplicationACLsManager in ContainerManagerImpl. Seems like when 
> NMContext was introduced ACLsManager was not completely removed from 
> ContainerManagerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4058:

Description: 
# TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
# In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
created and then checked whether it exists in context.getApplications(). 
everytime ApplicationImpl is created state machine is intialized and 
TimelineClient is created which is required only if added to the context.

  was:
# TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
# Unused ApplicationACLsManager in ContainerManagerImpl
# In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
created and then checked whether it exists in context.getApplications(). 
everytime ApplicationImpl is created state machine is intialized and 
TimelineClient is created which is required only if added to the context.


> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-24 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709753#comment-14709753
 ] 

Naganarasimha G R commented on YARN-4058:
-

moving 2nd point to another jira

> Miscellaneous issues in NodeManager project
> ---
>
> Key: YARN-4058
> URL: https://issues.apache.org/jira/browse/YARN-4058
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4058.YARN-2928.001.patch
>
>
> # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
> # Unused ApplicationACLsManager in ContainerManagerImpl
> # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
> created and then checked whether it exists in context.getApplications(). 
> everytime ApplicationImpl is created state machine is intialized and 
> TimelineClient is created which is required only if added to the context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4073) Unused ApplicationACLsManager in ContainerManagerImpl

2015-08-24 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-4073:
---

 Summary: Unused ApplicationACLsManager in ContainerManagerImpl
 Key: YARN-4073
 URL: https://issues.apache.org/jira/browse/YARN-4073
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor


Unused ApplicationACLsManager in ContainerManagerImpl. Seems like when 
NMContext was introduced ACLsManager was not completely removed from 
ContainerManagerImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4072) ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager to support JvmPauseMonitor as a service

2015-08-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4072:
--
Attachment: 0001-YARN-4072.patch

Attaching an initial version of patch.

> ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager 
> to support JvmPauseMonitor as a service
> 
>
> Key: YARN-4072
> URL: https://issues.apache.org/jira/browse/YARN-4072
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4072.patch
>
>
> As JvmPauseMonitor is made as an AbstractService, subsequent method changes 
> are needed in all places which uses the monitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4072) ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager to support JvmPauseMonitor as a service

2015-08-24 Thread Sunil G (JIRA)
Sunil G created YARN-4072:
-

 Summary: ApplicationHistoryServer, WebAppProxyServer, NodeManager 
and ResourceManager to support JvmPauseMonitor as a service
 Key: YARN-4072
 URL: https://issues.apache.org/jira/browse/YARN-4072
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Reporter: Sunil G
Assignee: Sunil G


As JvmPauseMonitor is made as an AbstractService, subsequent method changes are 
needed in all places which uses the monitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3717) Improve RM node labels web UI

2015-08-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3717:

Attachment: YARN-3717.20150824-1.patch
3717_cluster_test_snapshots.zip

Fixing the test issues and attaching the snapshots for the test in local cluster

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label 
> expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative

2015-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709572#comment-14709572
 ] 

Hadoop QA commented on YARN-3933:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | patch |   0m  1s | The patch file was not named 
according to hadoop's naming conventions. Please see 
https://wiki.apache.org/hadoop/HowToContribute for instructions. |
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752004/patch.BUGFIX-JIRA-YARN-3933.txt
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / feaf034 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8896/console |


This message was automatically generated.

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0005-YARN-3893.patch

Hi [~rohithsharma] and [~sunilg]

Thanks for comments.
# So {{createAndInitActiveServices}} approach will not take

Second approach sounds good with fail fast.
I have updated the patch as per the suggestion. Please review



> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative

2015-08-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709251#comment-14709251
 ] 

Junping Du commented on YARN-3933:
--

I think the title here is a bit misleading. Available resource being negative 
shouldn't be a problem (e.g. enabling feature NM resource configuration - 
YARN-291) which means resource are over-commit although we shouldn't see it in 
most of cases. It is actually a race condition bug for FairScheduler, please 
mention it explicitly or developer/user may have impression that resource 
shouldn't be negative in any cases which we never have this assumption.

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3933) Resources(both core and memory) are being negative

2015-08-24 Thread Shiwei Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiwei Guo updated YARN-3933:
-
Attachment: patch.BUGFIX-JIRA-YARN-3933.txt

See this comment: 
https://issues.apache.org/jira/browse/YARN-3933?focusedCommentId=14709146&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709146

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative

2015-08-24 Thread Shiwei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709146#comment-14709146
 ] 

Shiwei Guo commented on YARN-3933:
--

We also seeing this problems, and it may make the RM never allocate resource 
for a queue that has used negative resource.

I did some research and found the this is mainly caused by a race condition of 
calling AbstractYarnScheduler.completedContainer. Lets take FairScheduler as an 
example:
{code:title=FairSchedular.java}
protected synchronized void completedContainer(RMContainer rmContainer,
  ContainerStatus containerStatus, RMContainerEventType event) {
if (rmContainer == null) {
  LOG.info("Null container completed...");
  return;
}


Container container = rmContainer.getContainer();

// Get the application for the finished container
FSAppAttempt application =
getCurrentAttemptForContainer(container.getId());
ApplicationId appId =
container.getId().getApplicationAttemptId().getApplicationId();
if (application == null) {
  LOG.info("Container " + container + " of" +
  " unknown application attempt " + appId +
  " completed with event " + event);
  return;
}
if(!application.getLiveContainersMap().containsKey(container.getId())){
  LOG.info("Container " + container + " of application attempt " + appId
  + " is not alive, skip do completedContainer operation on event " + 
event);
  return;
}

// Get the node on which the container was allocated
FSSchedulerNode node = getFSSchedulerNode(container.getNodeId());

if (rmContainer.getState() == RMContainerState.RESERVED) {
  application.unreserve(rmContainer.getReservedPriority(), node);
} else {
  application.containerCompleted(rmContainer, containerStatus, event);
  node.releaseContainer(container);
  updateRootQueueMetrics();
}

LOG.info("Application attempt " + application.getApplicationAttemptId()
+ " released container " + container.getId() + " on node: " + node
+ " with event: " + event);
  }
{code}

completedContainer method will call application.containerCompleted, which will 
subtraction the resources used by this container from the usedResource counter 
of the application. So, if the completedContainer are called twice with the 
same container, the counter is subtracted too much values. So is the 
updateRootQueueMetrics call, so we can see negative allocatedMemory on 
rootQueue.

The solution is to check whether the container being supplied is still live 
*inside* the completedContainer (as shown in the patch). There is some check 
before calling completedContainer, but that's not enough.

For a more deeply discussion, the completedContainer may be called from two 
place:

1. Trigered by RMContainerEventType.FINISHED event:
{code:title=FairScheduler.nodeUpdate}
// Process completed containers
for (ContainerStatus completedContainer : completedContainers) {
  ContainerId containerId = completedContainer.getContainerId();
  LOG.debug("Container FINISHED: " + containerId);
  completedContainer(getRMContainer(containerId),
  completedContainer, RMContainerEventType.FINISHED);
}
{code}

2. Trigered by  RMContainerEventType.RELEASED
{code:title=AbstractYarnScheduler.releaseContainers}
completedContainer(rmContainer,
SchedulerUtils.createAbnormalContainerStatus(containerId,
  SchedulerUtils.RELEASED_CONTAINER), RMContainerEventType.RELEASED);
{code}

RMContainerEventType.RELEASED is not triggered by MapReduce ApplicationMaster, 
so we won't see this problem on MR jobs. But TEZ will triggered it when it do 
not need this this container, while the NodeManger will also report a container 
complete message to RM ,which in turn trigger the RMContainerEventType.FINISHED 
event. If RMContainerEventType.FINISHED event comes to RM early than TEZ AM, 
the problem happens.

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This messag

[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709144#comment-14709144
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2229 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2229/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java


> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset synchronously
> -
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: resourcemanager
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
> YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
> YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709128#comment-14709128
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #291 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/291/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java


> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset synchronously
> -
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: resourcemanager
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
> YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
> YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3933) Resources(both core and memory) are being negative

2015-08-24 Thread Shiwei Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiwei Guo updated YARN-3933:
-
Affects Version/s: 2.5.2

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3933) Resources(both core and memory) are being negative

2015-08-24 Thread Shiwei Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiwei Guo updated YARN-3933:
-
Component/s: resourcemanager

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-842) Resource Manager & Node Manager UI's doesn't work with IE

2015-08-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709098#comment-14709098
 ] 

Rohith Sharma K S commented on YARN-842:


I verified in the IE9 and greater, able to view the applicaitons. Does this 
issue still anyone facing in community? else can it be closed?

> Resource Manager & Node Manager UI's doesn't work with IE
> -
>
> Key: YARN-842
> URL: https://issues.apache.org/jira/browse/YARN-842
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.4-alpha
>Reporter: Devaraj K
>
> {code:xml}
> Webpage error details
> User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; 
> SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media 
> Center PC 6.0)
> Timestamp: Mon, 17 Jun 2013 12:06:03 UTC
> Message: 'JSON' is undefined
> Line: 41
> Char: 218
> Code: 0
> URI: http://10.18.40.24:8088/cluster/apps
> {code}
> RM & NM UI's are not working with IE and showing the above error for every 
> link on the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709091#comment-14709091
 ] 

Hudson commented on YARN-3896:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2248 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2248/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java


> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset synchronously
> -
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: resourcemanager
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
> YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
> YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708951#comment-14708951
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #299 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/299/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java


> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset synchronously
> -
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: resourcemanager
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
> YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
> YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4065) container-executor error should include effective user id

2015-08-24 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708940#comment-14708940
 ] 

Harsh J commented on YARN-4065:
---

Agreed - and figuring this has wasted a few mins at another customer I worked 
with last week. This would be a welcome change - would you be willing to submit 
a patch adding the context to the error message?

> container-executor error should include effective user id
> -
>
> Key: YARN-4065
> URL: https://issues.apache.org/jira/browse/YARN-4065
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Casey Brotherton
>Priority: Trivial
>
> When container-executor fails to access it's config file, the following 
> message will be thrown:
> {code}
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container executor initialization is : 24
> ExitCodeException exitCode=24: Invalid conf file provided : 
> /etc/hadoop/conf/container-executor.cfg
> {code}
> The real problem may be a change in the container-executor not running as set 
> uid root.
> From:
> https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/SecureContainer.html
> {quote}
> The container-executor program must be owned by root and have the permission 
> set ---sr-s---.
> {quote}
> The error message could be improved by printing out the effective user id 
> with the error message, and possibly the executable trying to access the 
> config file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708925#comment-14708925
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1032 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1032/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java


> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset synchronously
> -
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: resourcemanager
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
> YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
> YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously

2015-08-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708911#comment-14708911
 ] 

Hudson commented on YARN-3896:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #303 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/303/])
YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id 
has not been reset synchronously. (Jun Gong via rohithsharmaks) 
(rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java
* 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java


> RMNode transitioned from RUNNING to REBOOTED because its response id had not 
> been reset synchronously
> -
>
> Key: YARN-3896
> URL: https://issues.apache.org/jira/browse/YARN-3896
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>  Labels: resourcemanager
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
> YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
> YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch
>
>
> {noformat}
> 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
> Resolved 10.208.132.153 to /default-rack
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> Reconnect from the node at: 10.208.132.153
> 2015-07-03 16:49:39,075 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
> NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
> with capability: , assigned nodeId 
> 10.208.132.153:8041
> 2015-07-03 16:49:39,104 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
> behind rm response id:2506413 nm response id:0
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
> Node 10.208.132.153:8041 as it is now REBOOTED
> 2015-07-03 16:49:39,137 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
> 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
> {noformat}
> The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
> set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
> heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)