[jira] [Updated] (YARN-4044) Running applications information changes such as movequeue is not published to TimeLine server
[ https://issues.apache.org/jira/browse/YARN-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4044: -- Attachment: 0002-YARN-4044.patch Rebasing the patch as YARN-4014 is committed. Verified all cases in a real cluster also. [~rohithsharma] cud u please take a look on this patch. Thank You. > Running applications information changes such as movequeue is not published > to TimeLine server > -- > > Key: YARN-4044 > URL: https://issues.apache.org/jira/browse/YARN-4044 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineserver >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-4044.patch, 0002-YARN-4044.patch > > > SystemMetricsPublisher need to expose an appUpdated api to update any change > for a running application. > Events can be > - change of queue for a running application. > - change of application priority for a running application. > This ticket intends to handle both RM and timeline side changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710720#comment-14710720 ] Rohith Sharma K S commented on YARN-3250: - Thanks [~sunilg] for the review.. bq. Also do we need to write any error for failure cases of refreshClusterMaxPriority with RMAuditLogger? On exception {{logAndWrapException}} has been called which log in error for failures cases. bq. Could you please add rm.stop() at the end of testAdminRefreshClusterMaxPriority Since *rm* instance is at class object, during tearDown {{rm.stop}} has been called. So need not explicitely call this. > Support admin cli interface in for Application Priority > --- > > Key: YARN-3250 > URL: https://issues.apache.org/jira/browse/YARN-3250 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch > > > Current Application Priority Manager supports only configuration via file. > To support runtime configurations for admin cli and REST, a common management > interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4051) ContainerKillEvent is lost when container is In New State and is recovering
[ https://issues.apache.org/jira/browse/YARN-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710716#comment-14710716 ] sandflee commented on YARN-4051: could anyone help to review it? > ContainerKillEvent is lost when container is In New State and is recovering > > > Key: YARN-4051 > URL: https://issues.apache.org/jira/browse/YARN-4051 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: sandflee >Assignee: sandflee >Priority: Critical > Attachments: YARN-4051.01.patch, YARN-4051.02.patch, > YARN-4051.03.patch > > > As in YARN-4050, NM event dispatcher is blocked, and container is in New > state, when we finish application, the container still alive even after NM > event dispatcher is unblocked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4078) Unchecked typecast to AbstractYarnScheduler in AppInfo
[ https://issues.apache.org/jira/browse/YARN-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710703#comment-14710703 ] Varun Saxena commented on YARN-4078: In the main code, its guarded at places other than the issue raised here. > Unchecked typecast to AbstractYarnScheduler in AppInfo > -- > > Key: YARN-4078 > URL: https://issues.apache.org/jira/browse/YARN-4078 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > > Currently getPendingResourceRequestForAttempt is present in > {{AbstractYarnScheduler}}. > *But in AppInfo, we are calling this method by typecasting it to > AbstractYarnScheduler, which is incorrect.* > Because if a custom scheduler is to be added, it will implement > YarnScheduler, not AbstractYarnScheduler. > This method should be moved to YarnScheduler or it should have a guarded > check like in other places (RMAppAttemptBlock.getBlackListedNodes) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710669#comment-14710669 ] Hudson commented on YARN-4014: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #307 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/307/]) YARN-4014. Support user cli interface in for Application Priority. Contributed by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710651#comment-14710651 ] Varun Saxena commented on YARN-4053: Looking at the issues involved, IMO we should impose restriction on the client so that it does not mix longs and doubles. > Change the way metric values are stored in HBase Storage > > > Key: YARN-4053 > URL: https://issues.apache.org/jira/browse/YARN-4053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4053-YARN-2928.01.patch > > > Currently HBase implementation uses GenericObjectMapper to convert and store > values in backend HBase storage. This converts everything into a string > representation(ASCII/UTF-8 encoded byte array). > While this is fine in most cases, it does not quite serve our use case for > metrics. > So we need to decide how are we going to encode and decode metric values and > store them in HBase. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710638#comment-14710638 ] Varun Saxena commented on YARN-4053: There was a suggestion that we can support only longs. Would supporting only longs not cause any impact to potential users of ATS ? longs however should cover most of the metrics(as of now I can’t think of any where decimals would be of great importance). If we do this, I think TimelineMetric object should be changed to accept only java.lang.Long and not java.lang.Number… Looping [~vinodkv] to get his opinion on this as well. Although, is it unfair to ask client to send values consistently ? Can’t we document this and enforce this restriction. And if client does not comply, it cannot expect consistent results. This can be the contract between ATS and its clients. Major concern here though would be that it won’t be possible to enforce this restriction programmatically, neither at the client side nor at the server side. *Possible Solution :* There is one possible solution though if enforcing this restriction is not viable. The real problem in both the solutions would come in applying metric filters, if data is inconsistent. So for this, we can use approach 2(include type in column qualifier) and then insert OR filters covering both the column qualifiers for same metric. I will elaborate this with an example. Let us say we have a metric called JOB_ELAPSED_TIME and client can report both integral and floating point values for it(say). With approach 2, we will have 2 column qualifiers for this metric i.e. “ JOB_ELAPSED_TIME=L” (for longs) and “JOB_ELAPSED_TIME=D” (for doubles). Now, when a query comes with metric filter value in integer format i.e. something like JOB_ELAPSED_TIME > 40 can be transformed to corresponding HBase filter of the form (“JOB_ELAPSED_TIME=L” > 40 OR “JOB_ELAPSED_TIME=D” > 40.0). i.e. a filter list of the form (“m1” > 10 AND “m2” < 5 AND “m3”=4) would be transformed to ((“m1=L” > 10 OR “m1=D” > 10.0) AND (“m2=L” < 5 OR “m2=D” < 5.0) AND (“m3=L” = 4 OR “m3=D” = 4.0)). If filter value is in decimal format then we will have to make additional changes. If filter is something like JOB_ELAPSED_TIME > 40.75 it will have to be converted to (“JOB_ELAPSED_TIME=L” >= 41 OR “JOB_ELAPSED_TIME=D” > 40.75). As you can see here, while matching a double value against column qualifier storing longs, I would need increase the value to closest integer and change filter to >=. Likewise changes will be required for < (less than) and equal to(=) comparison as well. However, I am not sure whether adding too many filters will cause any performance issue for HBase or not. Because with this solution, we will in essence be doubling the size of metric filters. One thing we need to note though is that if we do adopt approach 2(including type in column qualifier), regex comparison might become an issue. Because theoretically regular expressions can become quite complex, so programmatically interpreting a regex and transforming it in a manner where it takes both long related column qualifier and double related column qualifier might induce bugs. Maybe we can just support wildcard match(\*) or just do with prefix and substring filters. Thoughts ? However, we may want to match against only the latest version of the value for a metric. In that case, the solution suggested above won’t work. > Change the way metric values are stored in HBase Storage > > > Key: YARN-4053 > URL: https://issues.apache.org/jira/browse/YARN-4053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4053-YARN-2928.01.patch > > > Currently HBase implementation uses GenericObjectMapper to convert and store > values in backend HBase storage. This converts everything into a string > representation(ASCII/UTF-8 encoded byte array). > While this is fine in most cases, it does not quite serve our use case for > metrics. > So we need to decide how are we going to encode and decode metric values and > store them in HBase. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage
[ https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710633#comment-14710633 ] Varun Saxena commented on YARN-4053: Wanted to discuss so that we can reach a consensus on how to handle YARN-4053. *Solution 1*: We can add a 1 byte flag as part of the metric value indicating whether we are storing integral value(0) or floating point value(1). *Solution 2* : Another solution suggested is that type can be part of column qualifier say something like metric=l where "l" indicates long. Another solution is to store everything as double. But would it be fair to impose this restriction on client while it reads data from ATS ? What if client is expecting a long and unable to handle a double. The major issues surrounding different approaches are that what if client does not report metric values consistently(same metric data type). Now let us look at the scenarios where metric values come into picture. *1.* While writing entity to HBase : Here, we need to consider that for the same entity, a particular metric can be reported in multiple write calls. So it is possible that in one write, all values for a particular metric are reported as long and in another write, all as floats. This can create inconsistency in both the solutions above (have different flags and encodings for same metric in Solution 1 and different column qualifiers for same metric in Solution 2). We can add a valuetype field in TimelineMetric which indicates whether a set of values are long or float. And throw an exception in TimelineMetric at the time of adding value if types are not consistent. This will atleast ensure same data type for a particular write call. But even here client should make sure that across writes they make sure data types are consistent. I think getting a row to find out column qualifier name or flags attached with the values wont be a viable option. So some sort of restriction on the part of the client(so that they send consistent data types for same metric) will have to be placed whether we adopt solution 1 or solution 2. Is there some HBase API I am not aware of ? *2.* While reading entity from HBase in the absence of any HBase filter : In this case there should be no issues in either solution 1 or solution 2. Because we read everything as bytes from HBase. We can do the appropriate conversion based on the flag or column qualifier name then. *3.* While reading entity from HBase in the presence of HBase filters : We can have 2 kinds of HBase filters. One filter is to retrieve specific columns(to determine which metrics to return) and other one is to trim down the rows/entities to be returned based on metric value comparison. The first class of filters which determine which columns to return, those should work in both the cases(Solution 1 and 2). Even in solution 2, because we use prefix filters as of now. If we use regex matching though, it might make things more complicated in case of Solution 2. For the second set of filters, we would require to know data type of the metric value in both the proposed solutions. Because SingleColumnValueFilter requires exact column qualifier name(for Solution 2). And for solution 1 also we should know the data type of metric so that we can append the value to be compared against with the flag(so that BinaryComparator can be used). If we add filters to our data object model, we can probably include data type in filters as well. But that again is dependent on client, whether it sends correct data type or not. As we saw in point 1, we need to impose restriction on the client that it sends same data type for every metric. Frankly it should be easy for client as well. If for a metric, client expects float values, it will most likely use Double or Float. Thoughts ? Or some other suggestions which can preclude the need for such a restriction. > Change the way metric values are stored in HBase Storage > > > Key: YARN-4053 > URL: https://issues.apache.org/jira/browse/YARN-4053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4053-YARN-2928.01.patch > > > Currently HBase implementation uses GenericObjectMapper to convert and store > values in backend HBase storage. This converts everything into a string > representation(ASCII/UTF-8 encoded byte array). > While this is fine in most cases, it does not quite serve our use case for > metrics. > So we need to decide how are we going to encode and decode metric values and > store them in HBase. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710631#comment-14710631 ] Hudson commented on YARN-4014: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1035 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1035/]) YARN-4014. Support user cli interface in for Application Priority. Contributed by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4078) Unchecked typecast to AbstractYarnScheduler in AppInfo
[ https://issues.apache.org/jira/browse/YARN-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710607#comment-14710607 ] Rohith Sharma K S commented on YARN-4078: - Recently in ApplicationMasterService similar issue got fixed YARN-3986. Could you please verify anywhere else such issues are exist so that all together can be combined, discussed and fixed in the same JIRA? > Unchecked typecast to AbstractYarnScheduler in AppInfo > -- > > Key: YARN-4078 > URL: https://issues.apache.org/jira/browse/YARN-4078 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > > Currently getPendingResourceRequestForAttempt is present in > {{AbstractYarnScheduler}}. > *But in AppInfo, we are calling this method by typecasting it to > AbstractYarnScheduler, which is incorrect.* > Because if a custom scheduler is to be added, it will implement > YarnScheduler, not AbstractYarnScheduler. > This method should be moved to YarnScheduler or it should have a guarded > check like in other places (RMAppAttemptBlock.getBlackListedNodes) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4078) Unchecked typecast to AbstractYarnScheduler in AppInfo
Naganarasimha G R created YARN-4078: --- Summary: Unchecked typecast to AbstractYarnScheduler in AppInfo Key: YARN-4078 URL: https://issues.apache.org/jira/browse/YARN-4078 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Currently getPendingResourceRequestForAttempt is present in {{AbstractYarnScheduler}}. *But in AppInfo, we are calling this method by typecasting it to AbstractYarnScheduler, which is incorrect.* Because if a custom scheduler is to be added, it will implement YarnScheduler, not AbstractYarnScheduler. This method should be moved to YarnScheduler or it should have a guarded check like in other places (RMAppAttemptBlock.getBlackListedNodes) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710579#comment-14710579 ] Hudson commented on YARN-4014: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #302 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/302/]) YARN-4014. Support user cli interface in for Application Priority. Contributed by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710571#comment-14710571 ] Rohith Sharma K S commented on YARN-4014: - Thanks Jian He and Sunil G for detailed review and commit. > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3970) REST api support for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S reassigned YARN-3970: --- Assignee: Naganarasimha G R (was: Rohith Sharma K S) Offline [~Naganarasimha] pinged for taking over this. Assigning to Naganarasimha G R. Expecting patch !! > REST api support for Application Priority > - > > Key: YARN-3970 > URL: https://issues.apache.org/jira/browse/YARN-3970 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Naganarasimha G R > > REST api support for application priority. > - get/set priority of an application > - get default priority of a queue > - get cluster max priority -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710556#comment-14710556 ] Hudson commented on YARN-4014: -- FAILURE: Integrated in Hadoop-trunk-Commit #8346 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8346/]) YARN-4014. Support user cli interface in for Application Priority. Contributed by Rohith Sharma K S (jianhe: rev 57c7ae1affb2e1821fbdc3f47738d7e6fd83c7c1) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityRequest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityResponsePBImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateUpdateAppEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/ApplicationCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/UpdateApplicationPriorityRequestPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/UpdateApplicationPriorityResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710530#comment-14710530 ] Jian He commented on YARN-4014: --- Thanks Sunil for reviewing the patch ! > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710452#comment-14710452 ] Sunil G commented on YARN-4014: --- Yes. Thanks for clarifying Rohith and Jian. +1. > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2
[ https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710384#comment-14710384 ] Li Lu commented on YARN-4061: - I just realized that if we implement our logger on HDFS, we need some mechanisms to identify the fault tolerant writer so that the storage writer can find the correct redo-log upon the next start. Currently, we're organizing writers within collector managers. Each node will have one collector manager. Therefore, we may need to identify the node in the writer. If in future we plan to put collectors into special containers, these collectors will also need similar mechanism. This problem does not exist in a single server model (like ATS v1) since it only has one writer. For now, during the process of building this FT writer, I propose to use local file system since it can trivially separate the writers under our one-node-one-writer model. We can add HDFS support in future, especially when we put our timeline writers into containers (by then we definitely need some identification mechanisms for the writers). > [Fault tolerance] Fault tolerant writer for timeline v2 > --- > > Key: YARN-4061 > URL: https://issues.apache.org/jira/browse/YARN-4061 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: FaulttolerantwriterforTimelinev2.pdf > > > We need to build a timeline writer that can be resistant to backend storage > down time and timeline collector failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710344#comment-14710344 ] Naganarasimha G R commented on YARN-4058: - thanks for review and commit [~sjlee0] > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: YARN-2928 > > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2154) FairScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710324#comment-14710324 ] Arun Suresh commented on YARN-2154: --- Thanks for going thru the patch [~adhoot], [~kasha] and [~bpodgursky], bq. ..The previous ordering is better since if you happen to choose something just above its fairShare, after preemption it may go below and cause additional preemption, causing excessive thrashing of resources. This will not happen, as the current patch has a check to only preempt from an app, a container above its fair/min share. Am still working on the unit tests.. > FairScheduler: Improve preemption to preempt only those containers that would > satisfy the incoming request > -- > > Key: YARN-2154 > URL: https://issues.apache.org/jira/browse/YARN-2154 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Arun Suresh >Priority: Critical > Attachments: YARN-2154.1.patch > > > Today, FairScheduler uses a spray-gun approach to preemption. Instead, it > should only preempt resources that would satisfy the incoming request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710288#comment-14710288 ] Sangjin Lee commented on YARN-4058: --- The unit test result seems fine to me: https://builds.apache.org/job/PreCommit-YARN-Build/8902/testReport/ If there is no objection, I'll commit this patch soon. > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710283#comment-14710283 ] Hadoop QA commented on YARN-4058: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 19s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 58s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 45s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 20s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 43m 19s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 90m 2s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart | | | org.apache.hadoop.yarn.server.resourcemanager.TestRM | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752068/YARN-4058.YARN-2928.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 9d14947 | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8902/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8902/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8902/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8902/console | This message was automatically generated. > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4045) Negative avaialbleMB is being reported for root queue.
[ https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710259#comment-14710259 ] Chang Li commented on YARN-4045: [~leftnoteasy] thanks for sharing ideas! However, correct me if I am wrong, in your suggested way, the negative available memory could still stay for a while right ? (after a node is disconnected and before we try to allocate that reserved). Is it too expensive to check queue limits for every node disconnect? Or is it possible to make reserved container usage not count toward calculation of available memory? > Negative avaialbleMB is being reported for root queue. > -- > > Key: YARN-4045 > URL: https://issues.apache.org/jira/browse/YARN-4045 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Rushabh S Shah > > We recently deployed 2.7 in one of our cluster. > We are seeing negative availableMB being reported for queue=root. > This is from the jmx output: > {noformat} > > ... > -163328 > ... > > {noformat} > The following is the RM log: > {noformat} > 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:44,487 [ResourceManager Event Processor] INFO > capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 > absoluteUsedCapacity=1.0032743 used= > cluster= > 2015-08-10 14:42:44,886 [ResourceManager Event Processor] INFO > capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 > absoluteUsedCapacity=1.0029854 used= > cluster= > 2015-08-10 14:42:44,886 [Resourc
[jira] [Created] (YARN-4077) FairScheduler Reservation should wait for most relaxed scheduling delay permitted before issuing reservation
Anubhav Dhoot created YARN-4077: --- Summary: FairScheduler Reservation should wait for most relaxed scheduling delay permitted before issuing reservation Key: YARN-4077 URL: https://issues.apache.org/jira/browse/YARN-4077 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Today if an allocation has a node local request that allows for relaxation, we do not wait for the relaxation delay before issuing the reservation. This can be too aggressive. Instead we should allow the scheduling delays of relaxation to expire before we choose to allow reserving a node for the container. This allows for the request to be satisfied on a different node instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4076) FairScheduler does not allow AM to choose which containers to preempt
Anubhav Dhoot created YARN-4076: --- Summary: FairScheduler does not allow AM to choose which containers to preempt Key: YARN-4076 URL: https://issues.apache.org/jira/browse/YARN-4076 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Capacity scheduler allows for AM to choose which containers will be preempted. See comment about corresponding work pending for FairScheduler https://issues.apache.org/jira/browse/YARN-568?focusedCommentId=13649126&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13649126 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710172#comment-14710172 ] Sangjin Lee commented on YARN-4074: --- Agreed. I will look to do some refactoring to make this simpler. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710140#comment-14710140 ] Subru Krishnan commented on YARN-2884: -- [~jlowe], let me try to answer your question as this approach will not affect applications that ship their own configs. To run MapReduce in our cluster where AMRMProxy is enabled, the only change we made was to update _resourcemanager.scheduler.address_ value to point to the _amrmproxy.address_. We thought this is acceptable as AMRMProxy (if enabled) is the Scheduler proxy for the apps and moreover quite easy to accomplish as we only had to update the MapReduce config only on our gateway machines from where MapReduce jobs are submitted. The rolling upgrade reliability as you rightly pointed out is maintained as MapReduce configs continues to be independent of node configs. FYI we also validated with Spark which exhibits the same characteristics. Ideally I agree that application configs should be decoupled from the server side configs for multiple reasons like rolling upgrades, security, etc but unfortunately many applications (REEF, Distributed Shell, etc) depend on the node configs today. So in summary the HADOOP_CONF_DIR modification will address applications that pick up configs from nodes without breaking self contained applications as the modified HADOOP_CONF_DIR does not show up on the latter's classpath. > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, > YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, > YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710142#comment-14710142 ] Li Lu commented on YARN-4074: - bq. This also implies that the canonical stores for the flows and the flow runs are the flow activity table and the flow run table respectively... Ah right... This makes the unified interface less appealing since we may need to branch a lot with the getEntities method. However, if we proceed in this direction, maybe we'd like to do a deeper refactor of that code. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710133#comment-14710133 ] Sangjin Lee commented on YARN-4074: --- Actually the backend will need to differentiate the queries for flows and flow runs from those for other entities, right? For the HBase backend, queries for the flows will need to be sent to the flow activity table, those for the flow runs will be sent to the flow run table. This also implies that the canonical stores for the flows and the flow runs are the flow activity table and the flow run table respectively... We've already gone that way a little bit with the application table, and we need to be comfortable with that for us to implement the latter approach. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710102#comment-14710102 ] Li Lu commented on YARN-4074: - I'd incline to use the latter approach to retrieve flows and flow runs, since we don't actually differentiate them on the backend. I also incline to keep the RESTful API layer simple, and to wrap it with a js native library for web UIs. In this way we can separate the process of TimelineEntity retrieval and the context of the timeline entities (e.g. is it a flow, or a application, or a DAG of applications?). It's also much easier to maintain this interface IMO. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710089#comment-14710089 ] Jason Lowe commented on YARN-2884: -- Note that not all applications pick up configs from the nodes, and I don't see how relying on a HADOOP_CONF_DIR modification will address them. For example, our setup runs a MapReduce job as a self-contained application -- it does not reference the jars nor the configs on the cluster nodes. This makes rolling upgrades more reliable, otherwise a config change on the node could break old code in a job or new code in a job could break on an old node config. This happened in practice which is why our jobs no longer rely on confs from the nodes. HADOOP_CONF_DIR does _not_ show up on the classpath for such applications, otherwise they would be relying on server-side configs and lead to the rolling upgrade instabilities. Any ideas on how to address the self-contained application scenario? > Proxying all AM-RM communications > - > > Key: YARN-2884 > URL: https://issues.apache.org/jira/browse/YARN-2884 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Carlo Curino >Assignee: Kishore Chaliparambil > Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, > YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, > YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch > > > We introduce the notion of an RMProxy, running on each node (or once per > rack). Upon start the AM is forced (via tokens and configuration) to direct > all its requests to a new services running on the NM that provide a proxy to > the central RM. > This give us a place to: > 1) perform distributed scheduling decisions > 2) throttling mis-behaving AMs > 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3717) Improve RM node labels web UI
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710050#comment-14710050 ] Hadoop QA commented on YARN-3717: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 33s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 58s | Site still builds. | | {color:red}-1{color} | checkstyle | 0m 57s | The applied patch generated 18 new checkstyle issues (total was 0, now 18). | | {color:green}+1{color} | whitespace | 0m 6s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 7m 30s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 5m 54s | Tests failed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 0m 12s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 3m 1s | Tests failed in hadoop-yarn-server-applicationhistoryservice. | | {color:red}-1{color} | yarn tests | 0m 14s | Tests failed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 0m 17s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 62m 4s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-common | | Failed unit tests | hadoop.yarn.client.api.impl.TestNMClient | | | hadoop.yarn.client.api.impl.TestYarnClient | | | hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices | | | hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebApp | | Failed build | hadoop-yarn-common | | | hadoop-yarn-server-common | | | hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752043/YARN-3717.20150824-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / b5ce87f | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8901/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8901/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8901/console | This message was automatically generated. > Improve RM node labels web UI > - > > Key: YARN-3717 > URL: https://issues.apache.org/jira/browse/YARN-3717 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, > YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch > > > 1> Add the default-node-Label expres
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710040#comment-14710040 ] Hadoop QA commented on YARN-4058: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 0s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 20s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 33s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 48s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 6m 47s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 23m 55s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 71m 1s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.TestApplicationCleanup | | | hadoop.yarn.server.resourcemanager.TestRM | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens | | | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior | | | hadoop.yarn.server.resourcemanager.TestMoveApplication | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings | | | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterService | | | hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerHealth | | | hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel | | | hadoop.yarn.server.resourcemanager.TestRMAdminService | | | hadoop.yarn.server.resourcemanager.resourcetracker.TestRMNMRPCResponseId | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestRMHA | | | org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752068/YARN-4058.YARN-2928.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 9d14947 | | hadoop-yarn-server-nodemanager te
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710001#comment-14710001 ] Varun Saxena commented on YARN-3011: Refer to MAPREDUCE-3634 > NM dies because of the failure of resource localization > --- > > Key: YARN-3011 > URL: https://issues.apache.org/jira/browse/YARN-3011 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Wang Hao >Assignee: Varun Saxena > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: YARN-3011.001.patch, YARN-3011.002.patch, > YARN-3011.003.patch, YARN-3011.004.patch > > > NM dies because of IllegalArgumentException when localize resource. > 2014-12-29 13:43:58,699 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, > 1416997035456, FILE, null } > 2014-12-29 13:43:58,699 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, > 1419831474153, FILE, null } > 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.IllegalArgumentException: Can not create a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:135) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > > at java.lang.Thread.run(Thread.java:745) > 2014-12-29 13:43:58,701 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user hadoop > 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Exiting, bbye.. > 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting > connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization
[ https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1471#comment-1471 ] Varun Saxena commented on YARN-3011: [~djp], sorry had missed your comment. I was under a similar impression when I wrote the comment in January. But actually all daemons including node manager set yarn.dispatcher.exit-on-error configuration explicitly to true in serviceInit. {code} conf.setBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY, true); {code} That means the configuration value is completely disregarded. The default value of false is meant for test cases to avoid JVM exit. This is clearly documented in Dispatcher.java. This configuration being an internal configuration is not included in yarn-default.xml either. {code} // Configuration to make sure dispatcher crashes but doesn't do system-exit in // case of errors. By default, it should be false, so that tests are not // affected. For all daemons it should be explicitly set to true so that // daemons can crash instead of hanging around. public static final String DISPATCHER_EXIT_ON_ERROR_KEY = "yarn.dispatcher.exit-on-error"; {code} We can probably set this config to true in daemons only if yarn.dispatcher.exit-on-error config is not set in config file. Thoughts ? But is there any real use case for it ? A recoverable exception should be caught and handled and NOT leaked through to AsyncDispatcher. And a non recoverable one should lead to a crash anyways. cc [~djp], [~jianhe] > NM dies because of the failure of resource localization > --- > > Key: YARN-3011 > URL: https://issues.apache.org/jira/browse/YARN-3011 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Wang Hao >Assignee: Varun Saxena > Labels: 2.6.1-candidate > Fix For: 2.7.0 > > Attachments: YARN-3011.001.patch, YARN-3011.002.patch, > YARN-3011.003.patch, YARN-3011.004.patch > > > NM dies because of IllegalArgumentException when localize resource. > 2014-12-29 13:43:58,699 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, > 1416997035456, FILE, null } > 2014-12-29 13:43:58,699 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Downloading public rsrc:{ > hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, > 1419831474153, FILE, null } > 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.IllegalArgumentException: Can not create a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) > at org.apache.hadoop.fs.Path.(Path.java:135) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > > at java.lang.Thread.run(Thread.java:745) > 2014-12-29 13:43:58,701 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user hadoop > 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: > Exiting, bbye.. > 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting > connection close header... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4073) Unused ApplicationACLsManager in ContainerManagerImpl
[ https://issues.apache.org/jira/browse/YARN-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709996#comment-14709996 ] Hadoop QA commented on YARN-4073: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 8 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 49s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 39s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 7m 32s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 45m 52s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752052/YARN-4073.20150824-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b5ce87f | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8899/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8899/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8899/console | This message was automatically generated. > Unused ApplicationACLsManager in ContainerManagerImpl > - > > Key: YARN-4073 > URL: https://issues.apache.org/jira/browse/YARN-4073 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4073.20150824-1.patch > > > Unused ApplicationACLsManager in ContainerManagerImpl. Seems like when > NMContext was introduced ACLsManager was not completely removed from > ContainerManagerImpl -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3717) Improve RM node labels web UI
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709921#comment-14709921 ] Hadoop QA commented on YARN-3717: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 24m 47s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 58s | Site still builds. | | {color:red}-1{color} | checkstyle | 2m 38s | The applied patch generated 3 new checkstyle issues (total was 16, now 18). | | {color:red}-1{color} | whitespace | 0m 6s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 5m 10s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 3m 15s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 53m 37s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 123m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.TestResourceTrackerOnHA | | | hadoop.yarn.client.api.impl.TestYarnClient | | | hadoop.yarn.client.api.impl.TestAMRMClient | | | hadoop.yarn.client.api.impl.TestNMClient | | | hadoop.yarn.client.TestApplicationMasterServiceProtocolOnHA | | | hadoop.yarn.client.cli.TestYarnCLI | | | hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens | | | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.TestRMHA | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebAppFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.TestRM | | | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.TestClientRMService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751920/YARN-3717.20150822-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / feaf034 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/whitespace.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8898/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709868#comment-14709868 ] Sangjin Lee commented on YARN-4058: --- LGTM. Once the jenkins comes back and unless I hear objections, I'll commit the patch. > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4058: Attachment: YARN-4058.YARN-2928.002.patch > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4058: Attachment: (was: YARN-4058.YARN-2928.002.patch) > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709856#comment-14709856 ] MENG DING commented on YARN-3769: - [~leftnoteasy], for better tracking purposes, would it be better to update the title of this JIRA to something more general, e.g., *CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request* (similar to YARN-2154)? This ticket can then be used to address preemption ping-pong issue for both new container request and container resource increase request. Besides the proposal that you have presented, an alternative solution to consider is: once we collect the list of preemptable containers, we immediately have a *dry run* of the scheduling algorithm to match the preemptable resources against outstanding new/increase resource requests. We then only preempt the resources that can find a match. Thoughts? Meng > Preemption occurring unnecessarily because preemption doesn't consider user > limit > - > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Wangda Tan > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709860#comment-14709860 ] Naganarasimha G R commented on YARN-4058: - Oops, my mistake deleting and uploading new one > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709857#comment-14709857 ] Jian He commented on YARN-4014: --- bq. I think there would NOT ocur any possibility where currentAttempt has old priority. I think this is true. > Support user cli interface in for Application Priority > -- > > Key: YARN-4014 > URL: https://issues.apache.org/jira/browse/YARN-4014 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, > 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, > 0004-YARN-4014.patch > > > Track the changes for user-RM client protocol i.e ApplicationClientProtocol > changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709843#comment-14709843 ] Sangjin Lee commented on YARN-4058: --- Thanks for the update. How about flipping the order of {{null}} and {{context.getApplications().putIfAbsent(applicationID, application)}}? :) > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709839#comment-14709839 ] Sangjin Lee commented on YARN-4074: --- The queries we will need to support are as follows (let me know if you believe it's not accurate): - given cluster, query the most recent N flows (from the flow activity table) - (optionally) given cluster, user, flow id, query all flow runs In terms of the implementation, there are two approaches. We can either define specific methods for querying for flow and flow runs, and implement them, or reuse the {{getEntities()}} method to implement them. With the former approach, we might be having a proliferation of methods that are specific to types. On the other hand with the latter, the API may remain clean but the implementation would become messier with more if-else type of code. Personally I'm slightly leaning towards the latter, but I'd love others' opinion. > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4058: Attachment: YARN-4058.YARN-2928.002.patch Hi [~sjlee0], attaching a patch as per your review comment can you please have a look at it ? > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch, > YARN-4058.YARN-2928.002.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4058: Description: # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is created and then checked whether it exists in context.getApplications(). everytime ApplicationImpl is created state machine is intialized and TimelineClient is created which is required only if added to the context. # Remove unused imports in TimelineServiceV2Publisher & TestSystemMetricsPublisherForV2.java was: # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is created and then checked whether it exists in context.getApplications(). everytime ApplicationImpl is created state machine is intialized and TimelineClient is created which is required only if added to the context. > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. > # Remove unused imports in TimelineServiceV2Publisher & > TestSystemMetricsPublisherForV2.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709817#comment-14709817 ] Sangjin Lee commented on YARN-4058: --- Sounds good. I'll review it promptly when you post the updated patch. Thanks! > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs
Sangjin Lee created YARN-4075: - Summary: [reader REST API] implement support for querying for flows and flow runs Key: YARN-4075 URL: https://issues.apache.org/jira/browse/YARN-4075 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee We need to be able to query for flows and flow runs via REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-4075: -- Assignee: Varun Saxena > [reader REST API] implement support for querying for flows and flow runs > > > Key: YARN-4075 > URL: https://issues.apache.org/jira/browse/YARN-4075 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > > We need to be able to query for flows and flow runs via REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee reassigned YARN-4074: - Assignee: Sangjin Lee > [timeline reader] implement support for querying for flows and flow runs > > > Key: YARN-4074 > URL: https://issues.apache.org/jira/browse/YARN-4074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > Implement support for querying for flows and flow runs. > We should be able to query for the most recent N flows, etc. > This includes changes to the {{TimelineReader}} API if necessary, as well as > implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs
Sangjin Lee created YARN-4074: - Summary: [timeline reader] implement support for querying for flows and flow runs Key: YARN-4074 URL: https://issues.apache.org/jira/browse/YARN-4074 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Sangjin Lee Implement support for querying for flows and flow runs. We should be able to query for the most recent N flows, etc. This includes changes to the {{TimelineReader}} API if necessary, as well as implementation of the API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4073) Unused ApplicationACLsManager in ContainerManagerImpl
[ https://issues.apache.org/jira/browse/YARN-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4073: Attachment: YARN-4073.20150824-1.patch > Unused ApplicationACLsManager in ContainerManagerImpl > - > > Key: YARN-4073 > URL: https://issues.apache.org/jira/browse/YARN-4073 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4073.20150824-1.patch > > > Unused ApplicationACLsManager in ContainerManagerImpl. Seems like when > NMContext was introduced ACLsManager was not completely removed from > ContainerManagerImpl -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4058: Description: # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is created and then checked whether it exists in context.getApplications(). everytime ApplicationImpl is created state machine is intialized and TimelineClient is created which is required only if added to the context. was: # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing # Unused ApplicationACLsManager in ContainerManagerImpl # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is created and then checked whether it exists in context.getApplications(). everytime ApplicationImpl is created state machine is intialized and TimelineClient is created which is required only if added to the context. > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709753#comment-14709753 ] Naganarasimha G R commented on YARN-4058: - moving 2nd point to another jira > Miscellaneous issues in NodeManager project > --- > > Key: YARN-4058 > URL: https://issues.apache.org/jira/browse/YARN-4058 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4058.YARN-2928.001.patch > > > # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing > # Unused ApplicationACLsManager in ContainerManagerImpl > # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is > created and then checked whether it exists in context.getApplications(). > everytime ApplicationImpl is created state machine is intialized and > TimelineClient is created which is required only if added to the context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4073) Unused ApplicationACLsManager in ContainerManagerImpl
Naganarasimha G R created YARN-4073: --- Summary: Unused ApplicationACLsManager in ContainerManagerImpl Key: YARN-4073 URL: https://issues.apache.org/jira/browse/YARN-4073 Project: Hadoop YARN Issue Type: Bug Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Unused ApplicationACLsManager in ContainerManagerImpl. Seems like when NMContext was introduced ACLsManager was not completely removed from ContainerManagerImpl -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4072) ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager to support JvmPauseMonitor as a service
[ https://issues.apache.org/jira/browse/YARN-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4072: -- Attachment: 0001-YARN-4072.patch Attaching an initial version of patch. > ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager > to support JvmPauseMonitor as a service > > > Key: YARN-4072 > URL: https://issues.apache.org/jira/browse/YARN-4072 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4072.patch > > > As JvmPauseMonitor is made as an AbstractService, subsequent method changes > are needed in all places which uses the monitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4072) ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager to support JvmPauseMonitor as a service
Sunil G created YARN-4072: - Summary: ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager to support JvmPauseMonitor as a service Key: YARN-4072 URL: https://issues.apache.org/jira/browse/YARN-4072 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Reporter: Sunil G Assignee: Sunil G As JvmPauseMonitor is made as an AbstractService, subsequent method changes are needed in all places which uses the monitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3717) Improve RM node labels web UI
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3717: Attachment: YARN-3717.20150824-1.patch 3717_cluster_test_snapshots.zip Fixing the test issues and attaching the snapshots for the test in local cluster > Improve RM node labels web UI > - > > Key: YARN-3717 > URL: https://issues.apache.org/jira/browse/YARN-3717 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, > YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch > > > 1> Add the default-node-Label expression for each queue in scheduler page. > 2> In Application/Appattempt page show the app configured node label > expression for AM and Job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709572#comment-14709572 ] Hadoop QA commented on YARN-3933: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | patch | 0m 1s | The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. | | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752004/patch.BUGFIX-JIRA-YARN-3933.txt | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / feaf034 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8896/console | This message was automatically generated. > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3893: --- Attachment: 0005-YARN-3893.patch Hi [~rohithsharma] and [~sunilg] Thanks for comments. # So {{createAndInitActiveServices}} approach will not take Second approach sounds good with fail fast. I have updated the patch as per the suggestion. Please review > Both RM in active state when Admin#transitionToActive failure from refeshAll() > -- > > Key: YARN-3893 > URL: https://issues.apache.org/jira/browse/YARN-3893 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, > 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, > yarn-site.xml > > > Cases that can cause this. > # Capacity scheduler xml is wrongly configured during switch > # Refresh ACL failure due to configuration > # Refresh User group failure due to configuration > Continuously both RM will try to be active > {code} > dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm1 > 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin> > ./yarn rmadmin -getServiceState rm2 > 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > active > {code} > # Both Web UI active > # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709251#comment-14709251 ] Junping Du commented on YARN-3933: -- I think the title here is a bit misleading. Available resource being negative shouldn't be a problem (e.g. enabling feature NM resource configuration - YARN-291) which means resource are over-commit although we shouldn't see it in most of cases. It is actually a race condition bug for FairScheduler, please mention it explicitly or developer/user may have impression that resource shouldn't be negative in any cases which we never have this assumption. > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiwei Guo updated YARN-3933: - Attachment: patch.BUGFIX-JIRA-YARN-3933.txt See this comment: https://issues.apache.org/jira/browse/YARN-3933?focusedCommentId=14709146&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709146 > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > Labels: patch > Attachments: patch.BUGFIX-JIRA-YARN-3933.txt > > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709146#comment-14709146 ] Shiwei Guo commented on YARN-3933: -- We also seeing this problems, and it may make the RM never allocate resource for a queue that has used negative resource. I did some research and found the this is mainly caused by a race condition of calling AbstractYarnScheduler.completedContainer. Lets take FairScheduler as an example: {code:title=FairSchedular.java} protected synchronized void completedContainer(RMContainer rmContainer, ContainerStatus containerStatus, RMContainerEventType event) { if (rmContainer == null) { LOG.info("Null container completed..."); return; } Container container = rmContainer.getContainer(); // Get the application for the finished container FSAppAttempt application = getCurrentAttemptForContainer(container.getId()); ApplicationId appId = container.getId().getApplicationAttemptId().getApplicationId(); if (application == null) { LOG.info("Container " + container + " of" + " unknown application attempt " + appId + " completed with event " + event); return; } if(!application.getLiveContainersMap().containsKey(container.getId())){ LOG.info("Container " + container + " of application attempt " + appId + " is not alive, skip do completedContainer operation on event " + event); return; } // Get the node on which the container was allocated FSSchedulerNode node = getFSSchedulerNode(container.getNodeId()); if (rmContainer.getState() == RMContainerState.RESERVED) { application.unreserve(rmContainer.getReservedPriority(), node); } else { application.containerCompleted(rmContainer, containerStatus, event); node.releaseContainer(container); updateRootQueueMetrics(); } LOG.info("Application attempt " + application.getApplicationAttemptId() + " released container " + container.getId() + " on node: " + node + " with event: " + event); } {code} completedContainer method will call application.containerCompleted, which will subtraction the resources used by this container from the usedResource counter of the application. So, if the completedContainer are called twice with the same container, the counter is subtracted too much values. So is the updateRootQueueMetrics call, so we can see negative allocatedMemory on rootQueue. The solution is to check whether the container being supplied is still live *inside* the completedContainer (as shown in the patch). There is some check before calling completedContainer, but that's not enough. For a more deeply discussion, the completedContainer may be called from two place: 1. Trigered by RMContainerEventType.FINISHED event: {code:title=FairScheduler.nodeUpdate} // Process completed containers for (ContainerStatus completedContainer : completedContainers) { ContainerId containerId = completedContainer.getContainerId(); LOG.debug("Container FINISHED: " + containerId); completedContainer(getRMContainer(containerId), completedContainer, RMContainerEventType.FINISHED); } {code} 2. Trigered by RMContainerEventType.RELEASED {code:title=AbstractYarnScheduler.releaseContainers} completedContainer(rmContainer, SchedulerUtils.createAbnormalContainerStatus(containerId, SchedulerUtils.RELEASED_CONTAINER), RMContainerEventType.RELEASED); {code} RMContainerEventType.RELEASED is not triggered by MapReduce ApplicationMaster, so we won't see this problem on MR jobs. But TEZ will triggered it when it do not need this this container, while the NodeManger will also report a container complete message to RM ,which in turn trigger the RMContainerEventType.FINISHED event. If RMContainerEventType.FINISHED event comes to RM early than TEZ AM, the problem happens. > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This messag
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709144#comment-14709144 ] Hudson commented on YARN-3896: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2229 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2229/]) YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id has not been reset synchronously. (Jun Gong via rohithsharmaks) (rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * hadoop-yarn-project/CHANGES.txt * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset synchronously > - > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Labels: resourcemanager > Fix For: 2.8.0 > > Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, > YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, > YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709128#comment-14709128 ] Hudson commented on YARN-3896: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #291 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/291/]) YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id has not been reset synchronously. (Jun Gong via rohithsharmaks) (rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset synchronously > - > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Labels: resourcemanager > Fix For: 2.8.0 > > Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, > YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, > YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiwei Guo updated YARN-3933: - Affects Version/s: 2.5.2 > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.2 >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3933) Resources(both core and memory) are being negative
[ https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiwei Guo updated YARN-3933: - Component/s: resourcemanager > Resources(both core and memory) are being negative > -- > > Key: YARN-3933 > URL: https://issues.apache.org/jira/browse/YARN-3933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Lavkesh Lahngir >Assignee: Lavkesh Lahngir > > In our cluster we are seeing available memory and cores being negative. > Initial inspection: > Scenario no. 1: > In capacity scheduler the method allocateContainersToNode() checks if > there are excess reservation of containers for an application, and they are > no longer needed then it calls queue.completedContainer() which causes > resources being negative. And they were never assigned in the first place. > I am still looking through the code. Can somebody suggest how to simulate > excess containers assignments ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-842) Resource Manager & Node Manager UI's doesn't work with IE
[ https://issues.apache.org/jira/browse/YARN-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709098#comment-14709098 ] Rohith Sharma K S commented on YARN-842: I verified in the IE9 and greater, able to view the applicaitons. Does this issue still anyone facing in community? else can it be closed? > Resource Manager & Node Manager UI's doesn't work with IE > - > > Key: YARN-842 > URL: https://issues.apache.org/jira/browse/YARN-842 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.0.4-alpha >Reporter: Devaraj K > > {code:xml} > Webpage error details > User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; > SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media > Center PC 6.0) > Timestamp: Mon, 17 Jun 2013 12:06:03 UTC > Message: 'JSON' is undefined > Line: 41 > Char: 218 > Code: 0 > URI: http://10.18.40.24:8088/cluster/apps > {code} > RM & NM UI's are not working with IE and showing the above error for every > link on the UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709091#comment-14709091 ] Hudson commented on YARN-3896: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2248 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2248/]) YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id has not been reset synchronously. (Jun Gong via rohithsharmaks) (rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset synchronously > - > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Labels: resourcemanager > Fix For: 2.8.0 > > Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, > YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, > YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708951#comment-14708951 ] Hudson commented on YARN-3896: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #299 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/299/]) YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id has not been reset synchronously. (Jun Gong via rohithsharmaks) (rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * hadoop-yarn-project/CHANGES.txt * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset synchronously > - > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Labels: resourcemanager > Fix For: 2.8.0 > > Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, > YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, > YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4065) container-executor error should include effective user id
[ https://issues.apache.org/jira/browse/YARN-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708940#comment-14708940 ] Harsh J commented on YARN-4065: --- Agreed - and figuring this has wasted a few mins at another customer I worked with last week. This would be a welcome change - would you be willing to submit a patch adding the context to the error message? > container-executor error should include effective user id > - > > Key: YARN-4065 > URL: https://issues.apache.org/jira/browse/YARN-4065 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Casey Brotherton >Priority: Trivial > > When container-executor fails to access it's config file, the following > message will be thrown: > {code} > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container executor initialization is : 24 > ExitCodeException exitCode=24: Invalid conf file provided : > /etc/hadoop/conf/container-executor.cfg > {code} > The real problem may be a change in the container-executor not running as set > uid root. > From: > https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/SecureContainer.html > {quote} > The container-executor program must be owned by root and have the permission > set ---sr-s---. > {quote} > The error message could be improved by printing out the effective user id > with the error message, and possibly the executable trying to access the > config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708925#comment-14708925 ] Hudson commented on YARN-3896: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1032 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1032/]) YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id has not been reset synchronously. (Jun Gong via rohithsharmaks) (rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset synchronously > - > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Labels: resourcemanager > Fix For: 2.8.0 > > Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, > YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, > YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset synchronously
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708911#comment-14708911 ] Hudson commented on YARN-3896: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #303 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/303/]) YARN-3896. RMNode transitioned from RUNNING to REBOOTED because its response id has not been reset synchronously. (Jun Gong via rohithsharmaks) (rohithsharmaks: rev feaf0349949e831ce3f25814c1bbff52f17bfe8f) * hadoop-yarn-project/CHANGES.txt * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java * hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java > RMNode transitioned from RUNNING to REBOOTED because its response id had not > been reset synchronously > - > > Key: YARN-3896 > URL: https://issues.apache.org/jira/browse/YARN-3896 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jun Gong >Assignee: Jun Gong > Labels: resourcemanager > Fix For: 2.8.0 > > Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, > YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, > YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch > > > {noformat} > 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: > Resolved 10.208.132.153 to /default-rack > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > Reconnect from the node at: 10.208.132.153 > 2015-07-03 16:49:39,075 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: > NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered > with capability: , assigned nodeId > 10.208.132.153:8041 > 2015-07-03 16:49:39,104 INFO > org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far > behind rm response id:2506413 nm response id:0 > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating > Node 10.208.132.153:8041 as it is now REBOOTED > 2015-07-03 16:49:39,137 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: > 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED > {noformat} > The node(10.208.132.153) reconnected with RM. When it registered with RM, RM > set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's > heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)