[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701028#comment-14701028 ] Hadoop QA commented on YARN-4024: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 38s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 9s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 0m 21s | Tests failed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 53m 34s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 97m 42s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750953/YARN-4024-draft-v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 71566e2 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8872/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8872/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8872/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8872/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8872/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8872/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8872/console | This message was automatically generated. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2923) Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700781#comment-14700781 ] Hadoop QA commented on YARN-2923: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 8s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 39s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 9s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 22s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 1m 56s | Tests failed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 6m 24s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 53m 23s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.util.TestRackResolver | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750943/YARN-2923.20150818-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 71566e2 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8871/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8871/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8871/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8871/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8871/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8871/console | This message was automatically generated. Support configuration based NodeLabelsProvider Service in Distributed Node Label Configuration Setup - Key: YARN-2923 URL: https://issues.apache.org/jira/browse/YARN-2923 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Fix For: 2.8.0 Attachments: YARN-2923.20141204-1.patch, YARN-2923.20141210-1.patch, YARN-2923.20150328-1.patch, YARN-2923.20150404-1.patch, YARN-2923.20150517-1.patch, YARN-2923.20150817-1.patch, YARN-2923.20150818-1.patch As part of Distributed Node Labels configuration we need to support Node labels to be configured in Yarn-site.xml. And on modification of Node Labels configuration in yarn-site.xml, NM should be able to get modified Node labels from this NodeLabelsprovider service without NM restart -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-draft-v3.patch YARN-4024-draft-v3.patch: fix the checkstyle warning and testcase failure YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700845#comment-14700845 ] Jeff Zhang commented on YARN-2262: -- Please ignore my last comment, finally find how to use ATS for Storing history data. {code} private ApplicationHistoryManager createApplicationHistoryManager( Configuration conf) { // Backward compatibility: // APPLICATION_HISTORY_STORE is neither null nor empty, it means that the // user has enabled it explicitly. if (conf.get(YarnConfiguration.APPLICATION_HISTORY_STORE) == null || conf.get(YarnConfiguration.APPLICATION_HISTORY_STORE).length() == 0 || conf.get(YarnConfiguration.APPLICATION_HISTORY_STORE).equals( NullApplicationHistoryStore.class.getName())) { return new ApplicationHistoryManagerOnTimelineStore( timelineDataManager, aclsManager); } else { LOG.warn(The filesystem based application history store is deprecated.); return new ApplicationHistoryManagerImpl(); } } {code} Few fields displaying wrong values in Timeline server after RM restart -- Key: YARN-2262 URL: https://issues.apache.org/jira/browse/YARN-2262 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.4.0 Reporter: Nishan Shetty Assignee: Naganarasimha G R Attachments: Capture.PNG, Capture1.PNG, yarn-testos-historyserver-HOST-10-18-40-95.log, yarn-testos-resourcemanager-HOST-10-18-40-84.log, yarn-testos-resourcemanager-HOST-10-18-40-95.log Few fields displaying wrong values in Timeline server after RM restart State:null FinalStatus: UNDEFINED Started: 8-Jul-2014 14:58:08 Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700830#comment-14700830 ] Jeff Zhang commented on YARN-2262: -- bq. No longer maintain FS based generic history store. I can reproduce this issue easily by restarting RM when app is running. Check YARN-2033, I do see that app history data can now be stored in Timeline service. But it looks like there's no ATS implementation of ApplicationHistoryStore. FileSystemApplicationHistoryStore is still the only feasible one for RM recovery. so does it make sense to make it no longer maintain. Or do I miss something ? [~zjshen] [~djp] Few fields displaying wrong values in Timeline server after RM restart -- Key: YARN-2262 URL: https://issues.apache.org/jira/browse/YARN-2262 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.4.0 Reporter: Nishan Shetty Assignee: Naganarasimha G R Attachments: Capture.PNG, Capture1.PNG, yarn-testos-historyserver-HOST-10-18-40-95.log, yarn-testos-resourcemanager-HOST-10-18-40-84.log, yarn-testos-resourcemanager-HOST-10-18-40-95.log Few fields displaying wrong values in Timeline server after RM restart State:null FinalStatus: UNDEFINED Started: 8-Jul-2014 14:58:08 Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700867#comment-14700867 ] Jeff Zhang commented on YARN-2262: -- And the document needs to be updated for the deprecation of FileSystemApplicationHistoryStore. http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/TimelineServer.html Few fields displaying wrong values in Timeline server after RM restart -- Key: YARN-2262 URL: https://issues.apache.org/jira/browse/YARN-2262 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.4.0 Reporter: Nishan Shetty Assignee: Naganarasimha G R Attachments: Capture.PNG, Capture1.PNG, yarn-testos-historyserver-HOST-10-18-40-95.log, yarn-testos-resourcemanager-HOST-10-18-40-84.log, yarn-testos-resourcemanager-HOST-10-18-40-95.log Few fields displaying wrong values in Timeline server after RM restart State:null FinalStatus: UNDEFINED Started: 8-Jul-2014 14:58:08 Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3979) Am in ResourceLocalizationService hang 10 min cause RM kill AM
[ https://issues.apache.org/jira/browse/YARN-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700886#comment-14700886 ] zhangyubiao commented on YARN-3979: --- Thanks for Rohith Sharma K S's patch , We stop the copy of Logs that the program gone , and we will test patch for our test enviroment and if it's OK . we will patch for our production envirments . Thank you for your help. Am in ResourceLocalizationService hang 10 min cause RM kill AM --- Key: YARN-3979 URL: https://issues.apache.org/jira/browse/YARN-3979 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Environment: CentOS 6.5 Hadoop-2.2.0 Reporter: zhangyubiao Attachments: ERROR103.log 2015-07-27 02:46:17,348 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1437735375558 _104282_01_01 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1437735375558_104282_01 (auth:SIMPLE) 2015-07-27 02:56:18,510 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for appattempt_1437735375558_104282_0 1 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ContainerManagementProtocolPB -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700861#comment-14700861 ] Jeff Zhang commented on YARN-2262: -- But my app still can not be recovered. Does it mean yarn can not recover running app ? {code} 2015-08-18 15:18:35,270 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Recovering attempt: appattempt_1439882258172_0001_01 with final state: null 2015-08-18 15:18:35,270 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1439882258172_0001_01 2015-08-18 15:18:35,273 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1439882258172_0001_01 2015-08-18 15:18:35,277 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application added - appId: application_1439882258172_0001 user: jzhang leaf-queue of parent: root #applications: 1 2015-08-18 15:18:35,278 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Accepted application application_1439882258172_0001 from user: jzhang, in queue: default 2015-08-18 15:18:35,278 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1439882258172_0001_01 State change from NEW to LAUNCHED 2015-08-18 15:18:35,278 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1439882258172_0001 State change from NEW to ACCEPTED {code} {code} 2015-08-18 15:18:36,305 ERROR org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Application attempt appattempt_1439882258172_0001_01 doesn't exist in ApplicationMasterService cache. 2015-08-18 15:18:36,306 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8030, call org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 192.168.3.3:56241 Call#56 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1439882258172_0001_01 doesn't exist in ApplicationMasterService cache. at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:436) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 2015-08-18 15:18:37,298 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved 192.168.3.3 to /default-rack {code} Few fields displaying wrong values in Timeline server after RM restart -- Key: YARN-2262 URL: https://issues.apache.org/jira/browse/YARN-2262 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.4.0 Reporter: Nishan Shetty Assignee: Naganarasimha G R Attachments: Capture.PNG, Capture1.PNG, yarn-testos-historyserver-HOST-10-18-40-95.log, yarn-testos-resourcemanager-HOST-10-18-40-84.log, yarn-testos-resourcemanager-HOST-10-18-40-95.log Few fields displaying wrong values in Timeline server after RM restart State:null FinalStatus: UNDEFINED Started: 8-Jul-2014 14:58:08 Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701163#comment-14701163 ] Hadoop QA commented on YARN-2005: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 21s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 47s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:red}-1{color} | whitespace | 0m 12s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 51s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 52s | Tests passed in hadoop-sls. | | {color:red}-1{color} | yarn tests | 0m 22s | Tests failed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 57m 41s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 104m 16s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750966/YARN-2005.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 71566e2 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8874/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8874/artifact/patchprocess/whitespace.txt | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/8874/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8874/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8874/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8874/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8874/console | This message was automatically generated. Blacklisting support for scheduling AMs --- Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Anubhav Dhoot Attachments: YARN-2005.001.patch, YARN-2005.002.patch, YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701109#comment-14701109 ] Junping Du commented on YARN-3045: -- I have commit latest (011) patch to YARN-2928 branch. Thanks [~Naganarasimha] for contributing the patch and [~sjlee0] for review! bq. So shall i handle YARN-3367 jira and then revisit the missing NM container and application events? Sure. I make it unassigned so feel free to pick up it. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, YARN-3045-YARN-2928.011.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3367: - Assignee: (was: Junping Du) Replace starting a separate thread for post entity with event loop in TimelineClient Key: YARN-3367 URL: https://issues.apache.org/jira/browse/YARN-3367 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Junping Du Since YARN-3039, we add loop in TimelineClient to wait for collectorServiceAddress ready before posting any entity. In consumer of TimelineClient (like AM), we are starting a new thread for each call to get rid of potential deadlock in main thread. This way has at least 3 major defects: 1. The consumer need some additional code to wrap a thread before calling putEntities() in TimelineClient. 2. It cost many thread resources which is unnecessary. 3. The sequence of events could be out of order because each posting operation thread get out of waiting loop randomly. We should have something like event loop in TimelineClient side, putEntities() only put related entities into a queue of entities and a separated thread handle to deliver entities in queue to collector via REST call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701144#comment-14701144 ] Hadoop QA commented on YARN-4014: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 49s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 47s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 30s | The applied patch generated 3 new checkstyle issues (total was 31, now 34). | | {color:green}+1{color} | whitespace | 0m 12s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 14s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | mapreduce tests | 104m 56s | Tests failed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 5m 9s | Tests failed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 0m 23s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 0m 22s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 161m 9s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.mapreduce.lib.output.TestJobOutputCommitter | | | hadoop.yarn.client.api.impl.TestNMClient | | | hadoop.yarn.client.api.impl.TestYarnClient | | Timed out tests | org.apache.hadoop.mapreduce.TestLargeSort | | | org.apache.hadoop.yarn.client.api.impl.TestAHSClient | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | Failed build | hadoop-yarn-common | | | hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750963/0004-YARN-4014.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 71566e2 | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/8873/artifact/patchprocess/diffJavadocWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8873/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/8873/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8873/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8873/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8873/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8873/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8873/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8873/console | This message was automatically generated. Support user cli interface in for Application Priority -- Key: YARN-4014 URL: https://issues.apache.org/jira/browse/YARN-4014 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch Track the changes for user-RM client protocol i.e ApplicationClientProtocol changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3367: --- Assignee: Naganarasimha G R Replace starting a separate thread for post entity with event loop in TimelineClient Key: YARN-3367 URL: https://issues.apache.org/jira/browse/YARN-3367 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Junping Du Assignee: Naganarasimha G R Since YARN-3039, we add loop in TimelineClient to wait for collectorServiceAddress ready before posting any entity. In consumer of TimelineClient (like AM), we are starting a new thread for each call to get rid of potential deadlock in main thread. This way has at least 3 major defects: 1. The consumer need some additional code to wrap a thread before calling putEntities() in TimelineClient. 2. It cost many thread resources which is unnecessary. 3. The sequence of events could be out of order because each posting operation thread get out of waiting loop randomly. We should have something like event loop in TimelineClient side, putEntities() only put related entities into a queue of entities and a separated thread handle to deliver entities in queue to collector via REST call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701656#comment-14701656 ] Junping Du commented on YARN-4025: -- bq. The EntityTable.java file is already fixed in the v.3 patch. I mean the example. id3?id4?id5 should be id3=id4=id5? or I miss something here? :) Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch, YARN-4025-YARN-2928.003.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701755#comment-14701755 ] Hudson commented on YARN-3857: -- FAILURE: Integrated in Hadoop-trunk-Commit #8317 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8317/]) YARN-3857: Memory leak in ResourceManager with SIMPLE mode. Contributed by mujunchao. (zxu: rev 3a76a010b85176f2bcb85ed6f74c25dcb8acfe4d) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenSecretManagerInRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Memory leak in ResourceManager with SIMPLE mode --- Key: YARN-3857 URL: https://issues.apache.org/jira/browse/YARN-3857 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: mujunchao Assignee: mujunchao Priority: Critical Labels: patch Attachments: YARN-3857-1.patch, YARN-3857-2.patch, YARN-3857-3.patch, YARN-3857-4.patch, hadoop-yarn-server-resourcemanager.patch We register the ClientTokenMasterKey to avoid client may hold an invalid ClientToken after RM restarts. In SIMPLE mode, we register PairApplicationAttemptId, null , But we never remove it from HashMap, as unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701672#comment-14701672 ] Sangjin Lee commented on YARN-4025: --- Oh OK. Got it. I thought you meant the line you referred to. Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch, YARN-4025-YARN-2928.003.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701598#comment-14701598 ] Wangda Tan commented on YARN-4024: -- Hi [~zhiguohong], Thanks for update, some minor comments: 1) I think we can limit the changes of remove cache in the NodesListManager, in the handle(..), we can do the flush(..), it will be as same as doing this in RMNodeImpl, and don't need expose an extra method, correct? 2) I suggest to rename CachedResolver.flush to something like removeCache, flush is more like a file system concept to me. 3) Add tests to see if NodesListManager can handle events correctly if you agree with 2). YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701645#comment-14701645 ] Sangjin Lee commented on YARN-4025: --- Thanks for your review [~djp]! {quote} Do we handle columnPrefixBytes to be null here as Javadoc comments? I saw we handle this case explicitly in readResults() but I didn't see it here. Let me know if I miss something. {quote} That is a good catch. Let me look into that. If we retain the same behavior for a null qualifier (and probably we should), then the return type of this method would need to go back to {{MapObject, Object}}. I'll also think about the method names. Cc'ing [~vrushalic] for her opinion also. {quote} Checking with javadoc in Separator and TimelineWriterUtils - a negative value indicates no limit on number of segments., so can we define a constant value like NO_LIMIT to replace -1 here? {quote} Will do. {quote} I think we should do the same thing to some javadoc examples in EntityTable.java. {quote} The {{EntityTable.java}} file is already fixed in the v.3 patch. {quote} Forget to mention that, YARN-3049 should rename TestHBaseTimelineWriterImpl to something include Reader. Would you like to do it here? Thanks! {quote} Good idea. The thought definitely occurred to me. I'll update the patch pretty soon. Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch, YARN-4025-YARN-2928.003.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3814: --- Attachment: YARN-3814-YARN-2928.05.patch REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814-YARN-2928.05.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1473) Exception from container-launch(Apache Hadoop 2.2.0)
[ https://issues.apache.org/jira/browse/YARN-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701766#comment-14701766 ] Maximiliano Mendez commented on YARN-1473: -- I don't know if it's [~gagansab] case but I've found this digging some logs of the container in yarn local-dir configuration: java.io.FileNotFoundException: ${yarn.nodemanager.log-dirs}/application_1439909765014_0004/container_e08_1439909765014_0004_02_01 (Is a directory) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at java.io.FileOutputStream.init(FileOutputStream.java:142) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.clinit(LogManager.java:127) at org.apache.log4j.Logger.getLogger(Logger.java:104) at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:262) at org.apache.commons.logging.impl.Log4JLogger.init(Log4JLogger.java:108) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1025) at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:844) at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:541) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:292) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:269) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657) at org.apache.hadoop.service.AbstractService.clinit(AbstractService.java:43) Could this be causing the error? Exception from container-launch(Apache Hadoop 2.2.0) Key: YARN-1473 URL: https://issues.apache.org/jira/browse/YARN-1473 Project: Hadoop YARN Issue Type: Bug Environment: CentOS5.8 and Apache Hadoop 2.2.0 Reporter: Joy Xu Attachments: yarn-site.xml Hello all, I have meet a exception from container-launch when I run the built-in wordcount program .and the error messge as follow: {code} 13/12/05 00:17:31 INFO mapreduce.Job: Job job_1386171829089_0003 failed with state FAILED due to: Application application_1386171829089_0003 failed 2 times due to AM Container for appattempt_1386171829089_0003_02 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701811#comment-14701811 ] Hadoop QA commented on YARN-4014: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 45s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 57s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 57s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 37s | The applied patch generated 5 new checkstyle issues (total was 31, now 36). | | {color:green}+1{color} | whitespace | 0m 12s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 15s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | mapreduce tests | 101m 22s | Tests failed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 0m 39s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 29s | Tests passed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 2m 12s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 56m 1s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 218m 20s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.util.TestRackResolver | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | Timed out tests | org.apache.hadoop.mapred.TestMRIntermediateDataEncryption | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751035/0004-YARN-4014.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fc509f6 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8876/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/8876/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8876/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8876/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8876/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8876/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8876/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8876/console | This message was automatically generated. Support user cli interface in for Application Priority -- Key: YARN-4014 URL: https://issues.apache.org/jira/browse/YARN-4014 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 0004-YARN-4014.patch Track the changes for user-RM client protocol i.e ApplicationClientProtocol changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1644) RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701654#comment-14701654 ] Jian He commented on YARN-1644: --- bq. I am also wondering if we should do the same for ContainerMangagerImpl#startContainers That should be the same issue. We may do this too. RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing - Key: YARN-1644 URL: https://issues.apache.org/jira/browse/YARN-1644 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Wangda Tan Assignee: MENG DING Attachments: YARN-1644-YARN-1197.4.patch, YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch, YARN-1644.3.patch, yarn-1644.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3893: --- Affects Version/s: 2.7.1 Target Version/s: 2.7.2 Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.1 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu resolved YARN-3857. - Resolution: Fixed Memory leak in ResourceManager with SIMPLE mode --- Key: YARN-3857 URL: https://issues.apache.org/jira/browse/YARN-3857 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: mujunchao Assignee: mujunchao Priority: Critical Labels: patch Fix For: 2.7.2 Attachments: YARN-3857-1.patch, YARN-3857-2.patch, YARN-3857-3.patch, YARN-3857-4.patch, hadoop-yarn-server-resourcemanager.patch We register the ClientTokenMasterKey to avoid client may hold an invalid ClientToken after RM restarts. In SIMPLE mode, we register PairApplicationAttemptId, null , But we never remove it from HashMap, as unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3857: Fix Version/s: 2.7.2 Memory leak in ResourceManager with SIMPLE mode --- Key: YARN-3857 URL: https://issues.apache.org/jira/browse/YARN-3857 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: mujunchao Assignee: mujunchao Priority: Critical Labels: patch Fix For: 2.7.2 Attachments: YARN-3857-1.patch, YARN-3857-2.patch, YARN-3857-3.patch, YARN-3857-4.patch, hadoop-yarn-server-resourcemanager.patch We register the ClientTokenMasterKey to avoid client may hold an invalid ClientToken after RM restarts. In SIMPLE mode, we register PairApplicationAttemptId, null , But we never remove it from HashMap, as unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3857) Memory leak in ResourceManager with SIMPLE mode
[ https://issues.apache.org/jira/browse/YARN-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701824#comment-14701824 ] zhihai xu commented on YARN-3857: - Thanks to [~mujunchao] for the contribution and to Devaraj for additional review! I committed this to trunk, branch-2 and branch-2.7. Memory leak in ResourceManager with SIMPLE mode --- Key: YARN-3857 URL: https://issues.apache.org/jira/browse/YARN-3857 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: mujunchao Assignee: mujunchao Priority: Critical Labels: patch Fix For: 2.7.2 Attachments: YARN-3857-1.patch, YARN-3857-2.patch, YARN-3857-3.patch, YARN-3857-4.patch, hadoop-yarn-server-resourcemanager.patch We register the ClientTokenMasterKey to avoid client may hold an invalid ClientToken after RM restarts. In SIMPLE mode, we register PairApplicationAttemptId, null , But we never remove it from HashMap, as unregister only runing while in Security mode, so memory leak coming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4057) If ContainersMonitor is not enabled, only print related log info one time
[ https://issues.apache.org/jira/browse/YARN-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-4057: Fix Version/s: 2.8.0 If ContainersMonitor is not enabled, only print related log info one time - Key: YARN-4057 URL: https://issues.apache.org/jira/browse/YARN-4057 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Jun Gong Assignee: Jun Gong Priority: Minor Fix For: 2.8.0 Attachments: YARN-4057.01.patch ContainersMonitorImpl will check whether it is enabled when handling every event, and it will print following messages again and again if not enabled: {quote} 2015-08-17 13:20:13,792 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Neither virutal-memory nor physical-memory is needed. Not running the monitor-thread {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701775#comment-14701775 ] Vrushali C commented on YARN-3901: -- I see, yes, will name it accordingly. Populate flow run data in the flow_run table Key: YARN-3901 URL: https://issues.apache.org/jira/browse/YARN-3901 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Vrushali C Attachments: YARN-3901-YARN-2928.WIP.patch As per the schema proposed in YARN-3815 in https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf filing jira to track creation and population of data in the flow run table. Some points that are being considered: - Stores per flow run information aggregated across applications, flow version RM’s collector writes to on app creation and app completion - Per App collector writes to it for metric updates at a slower frequency than the metric updates to application table primary key: cluster ! user ! flow ! flow run id - Only the latest version of flow-level aggregated metrics will be kept, even if the entity and application level keep a timeseries. - The running_apps column will be incremented on app creation, and decremented on app completion. - For min_start_time the RM writer will simply write a value with the tag for the applicationId. A coprocessor will return the min value of all written values. - - Upon flush and compactions, the min value between all the cells of this column will be written to the cell without any tag (empty tag) and all the other cells will be discarded. - Ditto for the max_end_time, but then the max will be kept. - Tags are represented as #type:value. The type can be not set (0), or can indicate running (1) or complete (2). In those cases (for metrics) only complete app metrics are collapsed on compaction. - The m! values are aggregated (summed) upon read. Only when applications are completed (indicated by tag type 2) can the values be collapsed. - The application ids that have completed and been aggregated into the flow numbers are retained in a separate column for historical tracking: we don’t want to re-aggregate for those upon replay -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4057) If ContainersMonitor is not enabled, only print related log info one time
[ https://issues.apache.org/jira/browse/YARN-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701830#comment-14701830 ] zhihai xu commented on YARN-4057: - thanks [~hex108] for the contribution! I committed this to trunk and branch-2. If ContainersMonitor is not enabled, only print related log info one time - Key: YARN-4057 URL: https://issues.apache.org/jira/browse/YARN-4057 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Jun Gong Assignee: Jun Gong Priority: Minor Fix For: 2.8.0 Attachments: YARN-4057.01.patch ContainersMonitorImpl will check whether it is enabled when handling every event, and it will print following messages again and again if not enabled: {quote} 2015-08-17 13:20:13,792 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Neither virutal-memory nor physical-memory is needed. Not running the monitor-thread {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1644) RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701590#comment-14701590 ] MENG DING commented on YARN-1644: - Thanks a lot [~leftnoteasy] and [~jianhe] for your comments and suggestions. After more thoughts, I prefer [~jianhe]'s suggestion to synchronize {{ContainerMangagerImpl#increaseContainersResource}} with NM-RM registration. If we do that, we should be able to resolve the RM recovery race condition issue, more specifically: * If increaseContainersResource happens first, then container resource will be increased in NM before NM-RM registration. * If NM-RM registration happens first, then NM will get a new RM identifier after registration. Any subsequent increase request with a token issued by old RM will be rejected. For implementation, I think I can simply synchronize on the {{NMContext}} object in both {{ContainerMangagerImpl}} and {{NodeStatusUpdaterImpl}}. Let me know if you have further thoughts or comments. I am also wondering if we should do the same for {{ContainerMangagerImpl#startContainers}}? RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing - Key: YARN-1644 URL: https://issues.apache.org/jira/browse/YARN-1644 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Wangda Tan Assignee: MENG DING Attachments: YARN-1644-YARN-1197.4.patch, YARN-1644-YARN-1197.5.patch, YARN-1644.1.patch, YARN-1644.2.patch, YARN-1644.3.patch, yarn-1644.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701866#comment-14701866 ] Hadoop QA commented on YARN-3814: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 27s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 12m 16s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 29s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 16s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 47s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 50s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 55s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 2m 50s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 49m 45s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751064/YARN-3814-YARN-2928.05.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 9a82008 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8877/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8877/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8877/console | This message was automatically generated. REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814-YARN-2928.05.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4025: -- Attachment: YARN-4025-YARN-2928.004.patch v.4 patch posted. - implemented proper handling of a null column prefix - added more javadoc to clarify several places - renamed the test from {{TestHBaseTimelineWriterImpl}} to {{TestHBaseTimelineStorage}} - clarified and made explicit the no-limit split - fixed javadoc comments for the value separator - added some logging statements This should address most of the review comments. I stopped short of renaming the {{readResults()}} method. We can treat that method as the default {{readResults()}} method, and the other one as the one for having raw (non-string) components. I added more javadoc to clarify that point. Let me know. Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch, YARN-4025-YARN-2928.003.patch, YARN-4025-YARN-2928.004.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4059) Preemption should delay assignments back to the preempted queue
[ https://issues.apache.org/jira/browse/YARN-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4059: --- Attachment: YARN-4059.patch Preemption should delay assignments back to the preempted queue --- Key: YARN-4059 URL: https://issues.apache.org/jira/browse/YARN-4059 Project: Hadoop YARN Issue Type: Improvement Reporter: Chang Li Assignee: Chang Li Attachments: YARN-4059.patch When preempting containers from a queue it can take a while for the other queues to fully consume the resources that were freed up, due to delays waiting for better locality, etc. Those delays can cause the resources to be assigned back to the preempted queue, and then the preemption cycle continues. We should consider adding a delay, either based on node heartbeat counts or time, to avoid granting containers to a queue that was recently preempted. The delay should be sufficient to cover the cycles of the preemption monitor, so we won't try to assign containers in-between preemption events for a queue. Worst-case scenario for assigning freed resources to other queues is when all the other queues want no locality. No locality means only one container is assigned per heartbeat, so we need to wait for the entire cluster heartbeating in times the number of containers that could run on a single node. So the penalty time for a queue should be the max of either the preemption monitor cycle time or the amount of time it takes to allocate the cluster with one container per heartbeat. Guessing this will be somewhere around 2 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-679) add an entry point that can start any Yarn service
[ https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-679: --- Labels: (was: BB2015-05-TBR) add an entry point that can start any Yarn service -- Key: YARN-679 URL: https://issues.apache.org/jira/browse/YARN-679 Project: Hadoop YARN Issue Type: New Feature Components: api Affects Versions: 2.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-679-001.patch, YARN-679-002.patch, YARN-679-002.patch, YARN-679-003.patch, YARN-679-004.patch, org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf Time Spent: 72h Remaining Estimate: 0h There's no need to write separate .main classes for every Yarn service, given that the startup mechanism should be identical: create, init, start, wait for stopped -with an interrupt handler to trigger a clean shutdown on a control-c interrrupt. Provide one that takes any classname, and a list of config files/options -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brook Zhou updated YARN-3223: - Attachment: YARN-3223-v0.1.patch Contains tests, formatting changes Resource update during NM graceful decommission --- Key: YARN-3223 URL: https://issues.apache.org/jira/browse/YARN-3223 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.1 Reporter: Junping Du Assignee: Brook Zhou Attachments: YARN-3223-v0.1.patch, YARN-3223-v0.patch During NM graceful decommission, we should handle resource update properly, include: make RMNode keep track of old resource for possible rollback, keep available resource to 0 and used resource get updated when container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4057) If ContainersMonitor is not enabled, only print related log info one time
[ https://issues.apache.org/jira/browse/YARN-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701971#comment-14701971 ] Hudson commented on YARN-4057: -- FAILURE: Integrated in Hadoop-trunk-Commit #8318 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8318/]) YARN-4057. If ContainersMonitor is not enabled, only print related log info one time. Contributed by Jun Gong. (zxu: rev 14215c8ef83d58b8443c52a3cb93e6d44fc87065) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java If ContainersMonitor is not enabled, only print related log info one time - Key: YARN-4057 URL: https://issues.apache.org/jira/browse/YARN-4057 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Jun Gong Assignee: Jun Gong Priority: Minor Fix For: 2.8.0 Attachments: YARN-4057.01.patch ContainersMonitorImpl will check whether it is enabled when handling every event, and it will print following messages again and again if not enabled: {quote} 2015-08-17 13:20:13,792 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Neither virutal-memory nor physical-memory is needed. Not running the monitor-thread {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2005: Attachment: YARN-2005.006.patch Fixed YarnConfiguration unit test. Other failure is not happening locally for me. Blacklisting support for scheduling AMs --- Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Anubhav Dhoot Attachments: YARN-2005.001.patch, YARN-2005.002.patch, YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, YARN-2005.006.patch It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4059) Preemption should delay assignments back to the preempted queue
[ https://issues.apache.org/jira/browse/YARN-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4059: --- Issue Type: Improvement (was: Bug) Preemption should delay assignments back to the preempted queue --- Key: YARN-4059 URL: https://issues.apache.org/jira/browse/YARN-4059 Project: Hadoop YARN Issue Type: Improvement Reporter: Chang Li Assignee: Chang Li When preempting containers from a queue it can take a while for the other queues to fully consume the resources that were freed up, due to delays waiting for better locality, etc. Those delays can cause the resources to be assigned back to the preempted queue, and then the preemption cycle continues. We should consider adding a delay, either based on node heartbeat counts or time, to avoid granting containers to a queue that was recently preempted. The delay should be sufficient to cover the cycles of the preemption monitor, so we won't try to assign containers in-between preemption events for a queue. Worst-case scenario for assigning freed resources to other queues is when all the other queues want no locality. No locality means only one container is assigned per heartbeat, so we need to wait for the entire cluster heartbeating in times the number of containers that could run on a single node. So the penalty time for a queue should be the max of either the preemption monitor cycle time or the amount of time it takes to allocate the cluster with one container per heartbeat. Guessing this will be somewhere around 2 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701893#comment-14701893 ] Li Lu commented on YARN-3814: - Hi [~varun_saxena], thanks for the patch! I think the patch is mostly good, only with a few nits: - In TimelineReaderManager - Configuration and YarnConfiguration appears to be unused. - callerUGI is not used and not documented. What's our plan on that? How to set caller UGI for now? - In TimelineReaderWebServices, - Can we have two constants for default delimiters? Right now we're spreading them in the source code like: {code} parseKeyStrValuesStr(relatesTo, ,, :), parseKeyStrValuesStr(isRelatedTo, ,, :), parseKeyStrValueObj(infofilters, ,, :), parseKeyStrValueStr(conffilters, ,, :), parseValuesStr(metricfilters, ,), parseValuesStr(eventfilters, ,), parseFieldsStr(fields, ,), callerUGI); {code} Similar problem also happens on line 280 after patch. REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814-YARN-2928.05.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702337#comment-14702337 ] Hadoop QA commented on YARN-2005: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 3m 22s | trunk compilation may be broken. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:red}-1{color} | javac | 2m 25s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751130/YARN-2005.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7ecbfd4 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8881/console | This message was automatically generated. Blacklisting support for scheduling AMs --- Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Anubhav Dhoot Attachments: YARN-2005.001.patch, YARN-2005.002.patch, YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, YARN-2005.006.patch It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702370#comment-14702370 ] Varun Saxena commented on YARN-3814: Yes, we wont use it as of now. REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814-YARN-2928.05.patch, YARN-3814-YARN-2928.06.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702397#comment-14702397 ] Hadoop QA commented on YARN-2884: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 21m 27s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 31s | The applied patch generated 1 new checkstyle issues (total was 237, now 237). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 54s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 1m 57s | Tests failed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 25s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 6m 14s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 53m 11s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 114m 10s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.util.TestRackResolver | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751175/YARN-2884-V9.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 30e342a | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8880/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8880/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8880/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8880/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8880/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8880/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8880/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8880/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8880/console | This message was automatically generated. Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino Assignee: Kishore Chaliparambil Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702306#comment-14702306 ] Varun Saxena commented on YARN-3814: As now I have removed it, we can add it later when we do ACLs'. Is that fine ? REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814-YARN-2928.05.patch, YARN-3814-YARN-2928.06.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702346#comment-14702346 ] Li Lu commented on YARN-3814: - OK... If we're sure we will not use that method else where, that LGTM. REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814-YARN-2928.05.patch, YARN-3814-YARN-2928.06.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702421#comment-14702421 ] Rohith Sharma K S commented on YARN-4014: - test failures are unrelated to this patch. Support user cli interface in for Application Priority -- Key: YARN-4014 URL: https://issues.apache.org/jira/browse/YARN-4014 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 0004-YARN-4014.patch Track the changes for user-RM client protocol i.e ApplicationClientProtocol changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4060) Revisit default retry config for connection with RM
[ https://issues.apache.org/jira/browse/YARN-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702308#comment-14702308 ] Jian He commented on YARN-4060: --- bq. Is it considered backwards compatible to change defaults? It should be fine, IMO Revisit default retry config for connection with RM Key: YARN-4060 URL: https://issues.apache.org/jira/browse/YARN-4060 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He 15 minutes timeout for AM/NM connection with RM in non-ha scenario turns out to be short in production environment. The suggestion is to increase that to 30 min. Also, the retry-interval is set to 30 seconds which appears too long. We may reduce that to 10 seconds ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702426#comment-14702426 ] Hadoop QA commented on YARN-3814: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 34s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 10s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 16s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 14s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 42s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 52s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 35s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 40m 20s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751178/YARN-3814-YARN-2928.06.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 9a82008 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8882/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8882/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8882/console | This message was automatically generated. REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814-YARN-2928.05.patch, YARN-3814-YARN-2928.06.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4060) Revisit default retry config for connection with RM
[ https://issues.apache.org/jira/browse/YARN-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702186#comment-14702186 ] Karthik Kambatla commented on YARN-4060: I am in favor of the change. Is it considered backwards compatible to change defaults? Revisit default retry config for connection with RM Key: YARN-4060 URL: https://issues.apache.org/jira/browse/YARN-4060 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He 15 minutes timeout for AM/NM connection with RM in non-ha scenario turns out to be short in production environment. The suggestion is to increase that to 30 min. Also, the retry-interval is set to 30 seconds which appears too long. We may reduce that to 10 seconds ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702212#comment-14702212 ] Hadoop QA commented on YARN-4025: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 58s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 18s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 5s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 42s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 52s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 29s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 41m 45s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751118/YARN-4025-YARN-2928.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 9a82008 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8879/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8879/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8879/console | This message was automatically generated. Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch, YARN-4025-YARN-2928.003.patch, YARN-4025-YARN-2928.004.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702253#comment-14702253 ] Varun Saxena commented on YARN-3814: bq.Can we have two constants for default delimiters? Ok. bq. callerUGI is not used and not documented. What's our plan on that? How to set caller UGI for now? callerUGI will be used for applying ACLs'. It is currently set in TimelineReaderWebServices. Can be removed for now. bq. Configuration and YarnConfiguration appears to be unused. Will remove the imports. REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814-YARN-2928.05.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702266#comment-14702266 ] Li Lu commented on YARN-3814: - bq. callerUGI will be used for applying ACLs'. It is currently set in TimelineReaderWebServices. Can be removed for now. Yes I know it's used for applying ACLs. We do have plan to support security in (possible near) future, and by then the UGI info will become useful. That's actually why I'm not suggesting remove it but instead document our current intentions on it. So I'd incline to not to remove it for now, but to make it clear about our current assumptions/requirements on it. REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814-YARN-2928.05.patch, YARN-3814-YARN-2928.06.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4057) If ContainersMonitor is not enabled, only print related log info one time
[ https://issues.apache.org/jira/browse/YARN-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702247#comment-14702247 ] Jun Gong commented on YARN-4057: Thanks [~zxu] for the review and commit! If ContainersMonitor is not enabled, only print related log info one time - Key: YARN-4057 URL: https://issues.apache.org/jira/browse/YARN-4057 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Jun Gong Assignee: Jun Gong Priority: Minor Fix For: 2.8.0 Attachments: YARN-4057.01.patch ContainersMonitorImpl will check whether it is enabled when handling every event, and it will print following messages again and again if not enabled: {quote} 2015-08-17 13:20:13,792 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Neither virutal-memory nor physical-memory is needed. Not running the monitor-thread {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3814: --- Attachment: YARN-3814-YARN-2928.06.patch Added constants, removed unused imports and unused callerUGI in TimelineReaderManager REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3814-YARN-2928.01.patch, YARN-3814-YARN-2928.02.patch, YARN-3814-YARN-2928.03.patch, YARN-3814-YARN-2928.04.patch, YARN-3814-YARN-2928.05.patch, YARN-3814-YARN-2928.06.patch, YARN-3814.reference.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702274#comment-14702274 ] Junping Du commented on YARN-4025: -- bq. We can treat that method as the default readResults() method, and the other one as the one for having raw (non-string) components. I added more javadoc to clarify that point. Sounds good. Thanks for addressing this and other review comments. +1 on latest (004) patch. Will commit it shortly if no further comments from others. Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch, YARN-4025-YARN-2928.003.patch, YARN-4025-YARN-2928.004.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2
Li Lu created YARN-4061: --- Summary: [Fault tolerance] Fault tolerant writer for timeline v2 Key: YARN-4061 URL: https://issues.apache.org/jira/browse/YARN-4061 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu We need to build a timeline writer that can be resistant to backend storage down time and timeline collector failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4060) Revisit default retry config for connection with RM
Jian He created YARN-4060: - Summary: Revisit default retry config for connection with RM Key: YARN-4060 URL: https://issues.apache.org/jira/browse/YARN-4060 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He 15 minutes timeout for AM/NM connection with RM in non-ha scenario turns out to be short in production environment. The suggestion is to increase that to 30 min. Also, the retry-interval is set to 30 seconds which appears too long. We may reduce that to 10 seconds ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2884) Proxying all AM-RM communications
[ https://issues.apache.org/jira/browse/YARN-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishore Chaliparambil updated YARN-2884: Attachment: YARN-2884-V9.patch Thanks [~jianhe] for reviewing the patch. I have uploaded a new patch that addresses all your comments. Proxying all AM-RM communications - Key: YARN-2884 URL: https://issues.apache.org/jira/browse/YARN-2884 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Carlo Curino Assignee: Kishore Chaliparambil Attachments: YARN-2884-V1.patch, YARN-2884-V2.patch, YARN-2884-V3.patch, YARN-2884-V4.patch, YARN-2884-V5.patch, YARN-2884-V6.patch, YARN-2884-V7.patch, YARN-2884-V8.patch, YARN-2884-V9.patch We introduce the notion of an RMProxy, running on each node (or once per rack). Upon start the AM is forced (via tokens and configuration) to direct all its requests to a new services running on the NM that provide a proxy to the central RM. This give us a place to: 1) perform distributed scheduling decisions 2) throttling mis-behaving AMs 3) mask the access to a federation of RMs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3422) relatedentities always return empty list when primary filter is set
[ https://issues.apache.org/jira/browse/YARN-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li resolved YARN-3422. Resolution: Won't Fix relatedentities always return empty list when primary filter is set --- Key: YARN-3422 URL: https://issues.apache.org/jira/browse/YARN-3422 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Chang Li Assignee: Chang Li Attachments: YARN-3422.1.patch When you curl for ats entities with a primary filter, the relatedentities fields always return empty list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4059) Preemption should delay assignments back to the preempted queue
[ https://issues.apache.org/jira/browse/YARN-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702258#comment-14702258 ] Hadoop QA commented on YARN-4059: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 31s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 10m 33s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 2s | The applied patch generated 4 new checkstyle issues (total was 184, now 188). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 13 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 37s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 53m 29s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 95m 47s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751097/YARN-4059.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 71aedfa | | javadoc | https://builds.apache.org/job/PreCommit-YARN-Build/8878/artifact/patchprocess/diffJavadocWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8878/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8878/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8878/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8878/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8878/console | This message was automatically generated. Preemption should delay assignments back to the preempted queue --- Key: YARN-4059 URL: https://issues.apache.org/jira/browse/YARN-4059 Project: Hadoop YARN Issue Type: Improvement Reporter: Chang Li Assignee: Chang Li Attachments: YARN-4059.patch When preempting containers from a queue it can take a while for the other queues to fully consume the resources that were freed up, due to delays waiting for better locality, etc. Those delays can cause the resources to be assigned back to the preempted queue, and then the preemption cycle continues. We should consider adding a delay, either based on node heartbeat counts or time, to avoid granting containers to a queue that was recently preempted. The delay should be sufficient to cover the cycles of the preemption monitor, so we won't try to assign containers in-between preemption events for a queue. Worst-case scenario for assigning freed resources to other queues is when all the other queues want no locality. No locality means only one container is assigned per heartbeat, so we need to wait for the entire cluster heartbeating in times the number of containers that could run on a single node. So the penalty time for a queue should be the max of either the preemption monitor cycle time or the amount of time it takes to allocate the cluster with one container per heartbeat. Guessing this will be somewhere around 2 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702523#comment-14702523 ] Xuan Gong commented on YARN-221: +1. The last patch looks good to me. Let us wait for several days. If there are no other comments, I will commit this on this weekend. [~mingma] At the mean time, could you open a related MR ticket and link it here, please ? NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v4.patch Thanks for your comments, [~leftnoteasy]. I didn't notice there's already such events. I updated the patch accordingly. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3986) getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface instead
[ https://issues.apache.org/jira/browse/YARN-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702485#comment-14702485 ] Varun Saxena commented on YARN-3986: There is a JIRA raised for TestContainerAllocation failure. Its unrelated. getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface instead -- Key: YARN-3986 URL: https://issues.apache.org/jira/browse/YARN-3986 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3986.01.patch, YARN-3986.02.patch, YARN-3986.03.patch Currently getTransferredContainers is present in {{AbstractYarnScheduler}}. *But in ApplicationMasterService, while registering AM, we are calling this method by typecasting it to AbstractYarnScheduler, which is incorrect.* This method should be moved to YarnScheduler. Because if a custom scheduler is to be added, it will implement YarnScheduler, not AbstractYarnScheduler. As ApplicationMasterService is calling getTransferredContainers by typecasting it to AbstractYarnScheduler, it is imposing an indirect dependency on AbstractYarnScheduler for any pluggable custom scheduler. We can move the method to YarnScheduler and leave the definition in AbstractYarnScheduler as it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4028) AppBlock page key update and diagnostics value null on recovery
[ https://issues.apache.org/jira/browse/YARN-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702534#comment-14702534 ] Xuan Gong commented on YARN-4028: - +1 LGTM. Checking this in AppBlock page key update and diagnostics value null on recovery --- Key: YARN-4028 URL: https://issues.apache.org/jira/browse/YARN-4028 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-4028.patch, 0002-YARN-4028.patch, Image.jpg All keys ends with *:* adding the same in *Log Aggregation Status* for consistency Also Diagnostics value shown as null on recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4028) AppBlock page key update and diagnostics value null on recovery
[ https://issues.apache.org/jira/browse/YARN-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702544#comment-14702544 ] Xuan Gong commented on YARN-4028: - Committed into trunk/branch-2. Thanks, Bibin A Chundatt. AppBlock page key update and diagnostics value null on recovery --- Key: YARN-4028 URL: https://issues.apache.org/jira/browse/YARN-4028 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-YARN-4028.patch, 0002-YARN-4028.patch, Image.jpg All keys ends with *:* adding the same in *Log Aggregation Status* for consistency Also Diagnostics value shown as null on recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4014) Support user cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-4014: Attachment: 0004-YARN-4014.patch Updating the same with fixing java doc issues.. Kick off jenkins Support user cli interface in for Application Priority -- Key: YARN-4014 URL: https://issues.apache.org/jira/browse/YARN-4014 Project: Hadoop YARN Issue Type: Sub-task Components: client, resourcemanager Reporter: Rohith Sharma K S Assignee: Rohith Sharma K S Attachments: 0001-YARN-4014-V1.patch, 0001-YARN-4014.patch, 0002-YARN-4014.patch, 0003-YARN-4014.patch, 0004-YARN-4014.patch, 0004-YARN-4014.patch Track the changes for user-RM client protocol i.e ApplicationClientProtocol changes and discussions in this jira. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority
[ https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701292#comment-14701292 ] Rohith Sharma K S commented on YARN-3250: - [~sunilg] [~jianhe] would you have look at patch please? I will rebase the patch based on the review comments. Support admin cli interface in for Application Priority --- Key: YARN-3250 URL: https://issues.apache.org/jira/browse/YARN-3250 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Rohith Sharma K S Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch Current Application Priority Manager supports only configuration via file. To support runtime configurations for admin cli and REST, a common management interface has to be added which can be shared with NodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701291#comment-14701291 ] Xianyin Xin commented on YARN-3652: --- A simple introduction of the preview patch: SchedulerMetrics is focus on metrics that related to the scheduler's performace. The following metrics are considered: num of waiting events in the scheduler dispatch queue; num of all kinds events in the scheduler dispatch queue; events handling rate; node update handling rate; events adding rate; node update adding rate; statistical info of num of waiting events; statistical info of num of waiting node update events; containers allocation rate; scheduling method exec rate, i.e., num of scheduling tries per second; app allocation call duration; nodeUpdate call duration; scheduling call duration; These metrics give rich information of the scheduler performance, which can be used to diagnose the anomaly of the scheduler. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin Attachments: YARN-3652-preview.patch As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701325#comment-14701325 ] Junping Du commented on YARN-4025: -- bq. One major change I did is that now TestHBaseTimelineWriterImpl verifies the timeline entities read by HBaseTimelineReaderImpl as well. This provides a nice benefit of verifying correctness of HBaseTimelineReaderImpl. It uncovered a bug in the process. Nice work! Forget to mention that, YARN-3049 should rename TestHBaseTimelineWriterImpl to something include Reader. Would you like to do it here? Thanks! Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch, YARN-4025-YARN-2928.003.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701267#comment-14701267 ] Xianyin Xin commented on YARN-3652: --- In the patch i used functions from HADOOP-12338. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin Attachments: YARN-3652-preview.patch As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701296#comment-14701296 ] Xianyin Xin commented on YARN-3652: --- Hi [~sunilg], [~vvasudev], would you please have a look? Any comments are welcome. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin Attachments: YARN-3652-preview.patch As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701302#comment-14701302 ] Naganarasimha G R commented on YARN-3367: - Thanks [~djp], Assigning this jira to myself. Replace starting a separate thread for post entity with event loop in TimelineClient Key: YARN-3367 URL: https://issues.apache.org/jira/browse/YARN-3367 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Junping Du Assignee: Naganarasimha G R Since YARN-3039, we add loop in TimelineClient to wait for collectorServiceAddress ready before posting any entity. In consumer of TimelineClient (like AM), we are starting a new thread for each call to get rid of potential deadlock in main thread. This way has at least 3 major defects: 1. The consumer need some additional code to wrap a thread before calling putEntities() in TimelineClient. 2. It cost many thread resources which is unnecessary. 3. The sequence of events could be out of order because each posting operation thread get out of waiting loop randomly. We should have something like event loop in TimelineClient side, putEntities() only put related entities into a queue of entities and a separated thread handle to deliver entities in queue to collector via REST call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4025) Deal with byte representations of Longs in writer code
[ https://issues.apache.org/jira/browse/YARN-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701309#comment-14701309 ] Junping Du commented on YARN-4025: -- Thanks [~sjlee0] for updating the patch. 003 patch looks good to me in general. Some minor comments: In ColumnHelper.java, {code} /** ... + * @param columnPrefixBytes + * optional prefix to limit columns. If null all columns are + * returned. ... + */ + public Mapbyte[][], Object readResultsHavingCompoundColumnQualifiers( {code} Do we handle columnPrefixBytes to be null here as Javadoc comments? I saw we handle this case explicitly in readResults() but I didn't see it here. Let me know if I miss something. In addition, looks like previous readResults() only handle the case CQs are all String. I think we should update that method name to something like: readResultsWithAllStringColumnQualifiers() to get rid of possible confusion. Last but not the least, for result to be null case, do we need to handle it with log some warn messages like other cases in this patch? {code} + byte[][] columnQualifierParts = Separator.VALUES.split( + columnNameParts[1], -1); {code} Checking with javadoc in Separator and TimelineWriterUtils - a negative value indicates no limit on number of segments., so can we define a constant value like NO_LIMIT to replace -1 here? Actually, from checking with implementation in TimelineWriterUtils, 0 also indicates the same thing (no limit). Sounds like we don't have any tests in TestTimelineWriterUtils.java, we may want to improve this in future? In ApplicationTable, {code} - * || e!eventId?timestamp?infoKey: | | | + * || e!eventId=timestamp=infoKey: | {code} I think we should do the same thing to some javadoc examples in EntityTable.java. Other looks fine to me. Deal with byte representations of Longs in writer code -- Key: YARN-4025 URL: https://issues.apache.org/jira/browse/YARN-4025 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Vrushali C Assignee: Sangjin Lee Attachments: YARN-4025-YARN-2928.001.patch, YARN-4025-YARN-2928.002.patch, YARN-4025-YARN-2928.003.patch Timestamps are being stored as Longs in hbase by the HBaseTimelineWriterImpl code. There seem to be some places in the code where there are conversions between Long to byte[] to String for easier argument passing between function calls. Then these values end up being converted back to byte[] while storing in hbase. It would be better to pass around byte[] or the Longs themselves as applicable. This may result in some api changes (store function) as well in adding a few more function calls like getColumnQualifier which accepts a pre-encoded byte array. It will be in addition to the existing api which accepts a String and the ColumnHelper to return a byte[] column name instead of a String one. Filing jira to track these changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701242#comment-14701242 ] Jason Lowe commented on YARN-3942: -- [~rajesh] the initial exception looks like an issue with the HDFS client layer, and most HDFS clients would have similar problems trying to use HDFS. Normally HDFS operations are not retried because there are many retries already in the HDFS client and server layers. So I don't think that exception is an issue to fix in the ATS but rather the HDFS configuration and/or code. Also the patch does not treat that exception being logged as fatal. It just logs the fact that it couldn't complete a scan for that iteration. It will try again in the next scan interval. The real problem is indicated by this line: {noformat} 2015-08-18 01:03:35,600 [SIGTERM handler] ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: RECEIVED SIGNAL 15: SIGTERM {noformat} Something outside of the ATS is killing the process with SIGTERM. Timeline store to read events from HDFS --- Key: YARN-3942 URL: https://issues.apache.org/jira/browse/YARN-3942 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-3942.001.patch This adds a new timeline store plugin that is intended as a stop-gap measure to mitigate some of the issues we've seen with ATS v1 while waiting for ATS v2. The intent of this plugin is to provide a workable solution for running the Tez UI against the timeline server on a large-scale clusters running many thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3652) A SchedulerMetrics may be need for evaluating the scheduler's performance
[ https://issues.apache.org/jira/browse/YARN-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyin Xin updated YARN-3652: -- Attachment: YARN-3652-preview.patch A preview patch submitted. A SchedulerMetrics may be need for evaluating the scheduler's performance - Key: YARN-3652 URL: https://issues.apache.org/jira/browse/YARN-3652 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager, scheduler Reporter: Xianyin Xin Attachments: YARN-3652-preview.patch As discussed in YARN-3630, a {{SchedulerMetrics}} may be need for evaluating the scheduler's performance. The performance indexes includes #events waiting for being handled by scheduler, the throughput, the scheduling delay and/or other indicators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701285#comment-14701285 ] Hadoop QA commented on YARN-4024: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 54s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 37s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 1m 55s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 57m 27s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 105m 51s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.util.TestRackResolver | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751009/YARN-4024-draft-v3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 71566e2 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8875/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8875/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8875/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8875/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8875/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8875/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8875/console | This message was automatically generated. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3986) getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface instead
[ https://issues.apache.org/jira/browse/YARN-3986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701334#comment-14701334 ] Rohith Sharma K S commented on YARN-3986: - +1 for the latest patch.. getTransferredContainers in AbstractYarnScheduler should be present in YarnScheduler interface instead -- Key: YARN-3986 URL: https://issues.apache.org/jira/browse/YARN-3986 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Attachments: YARN-3986.01.patch, YARN-3986.02.patch, YARN-3986.03.patch Currently getTransferredContainers is present in {{AbstractYarnScheduler}}. *But in ApplicationMasterService, while registering AM, we are calling this method by typecasting it to AbstractYarnScheduler, which is incorrect.* This method should be moved to YarnScheduler. Because if a custom scheduler is to be added, it will implement YarnScheduler, not AbstractYarnScheduler. As ApplicationMasterService is calling getTransferredContainers by typecasting it to AbstractYarnScheduler, it is imposing an indirect dependency on AbstractYarnScheduler for any pluggable custom scheduler. We can move the method to YarnScheduler and leave the definition in AbstractYarnScheduler as it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4059) Preemption should delay assignments back to the preempted queue
Chang Li created YARN-4059: -- Summary: Preemption should delay assignments back to the preempted queue Key: YARN-4059 URL: https://issues.apache.org/jira/browse/YARN-4059 Project: Hadoop YARN Issue Type: Bug Reporter: Chang Li Assignee: Chang Li When preempting containers from a queue it can take a while for the other queues to fully consume the resources that were freed up, due to delays waiting for better locality, etc. Those delays can cause the resources to be assigned back to the preempted queue, and then the preemption cycle continues. We should consider adding a delay, either based on node heartbeat counts or time, to avoid granting containers to a queue that was recently preempted. The delay should be sufficient to cover the cycles of the preemption monitor, so we won't try to assign containers in-between preemption events for a queue. Worst-case scenario for assigning freed resources to other queues is when all the other queues want no locality. No locality means only one container is assigned per heartbeat, so we need to wait for the entire cluster heartbeating in times the number of containers that could run on a single node. So the penalty time for a queue should be the max of either the preemption monitor cycle time or the amount of time it takes to allocate the cluster with one container per heartbeat. Guessing this will be somewhere around 2 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701416#comment-14701416 ] Bibin A Chundatt commented on YARN-3893: Hi [~rohithsharma] Thank you for your review comments Will update the same and upload patch soon. Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1473) Exception from container-launch(Apache Hadoop 2.2.0)
[ https://issues.apache.org/jira/browse/YARN-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701424#comment-14701424 ] Maximiliano Mendez commented on YARN-1473: -- Same error here after an upgrade from 2.6.0 to 2.7.1 Exception from container-launch(Apache Hadoop 2.2.0) Key: YARN-1473 URL: https://issues.apache.org/jira/browse/YARN-1473 Project: Hadoop YARN Issue Type: Bug Environment: CentOS5.8 and Apache Hadoop 2.2.0 Reporter: Joy Xu Attachments: yarn-site.xml Hello all, I have meet a exception from container-launch when I run the built-in wordcount program .and the error messge as follow: {code} 13/12/05 00:17:31 INFO mapreduce.Job: Job job_1386171829089_0003 failed with state FAILED due to: Application application_1386171829089_0003 failed 2 times due to AM Container for appattempt_1386171829089_0003_02 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. 13/12/05 00:17:31 INFO mapreduce.Job: Counters: 0 {code} Hope someone can Help. Thx. -- This message was sent by Atlassian JIRA (v6.3.4#6332)