[jira] [Commented] (YARN-422) Add AM-NM client library
[ https://issues.apache.org/jira/browse/YARN-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645307#comment-13645307 ] Vinod Kumar Vavilapalli commented on YARN-422: -- bq. 1. Semantically, it is a bit strange RM use "AM"NMClient. Agreed. May be we should just call it NMClient? bq. 2. Technically, hadoop-yarn-client has dependency on hadoop-yarn-server-resourcemanager in test scope. If we want to use AMNMClient in AMLauncher, hadoop-yarn-server-resourcemanager needs to add the dependency on hadoop-yarn-client, forming a circular dependency. The dependencies are per scope, so there is not circular dependency either in test scope or non-test scope. Is this patch ready for review? Or just a definition file? Doesn't seem so. In any case, I think we need to have either - separate call-backs for failures on startContainer() and failure on stopContainer() - or may be just one call-back with the original event-type? > Add AM-NM client library > > > Key: YARN-422 > URL: https://issues.apache.org/jira/browse/YARN-422 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Zhijie Shen > Attachments: AMNMClient_Defination.txt, > AMNMClient_Definition_Updated_With_Tests.txt, proposal_v1.pdf > > > Create a simple wrapper over the AM-NM container protocol to provide hide the > details of the protocol implementation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager
[ https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645300#comment-13645300 ] Hudson commented on YARN-599: - Integrated in Hadoop-trunk-Commit #3698 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3698/]) YARN-599. Refactoring submitApplication in ClientRMService and RMAppManager to separate out various validation checks depending on whether they rely on RM configuration or not. Contributed by Zhijie Shen. (Revision 1477478) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1477478 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManagerSubmitEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java > Refactoring submitApplication in ClientRMService and RMAppManager > - > > Key: YARN-599 > URL: https://issues.apache.org/jira/browse/YARN-599 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.0.5-beta > > Attachments: YARN-599.1.patch, YARN-599.2.patch > > > Currently, ClientRMService#submitApplication call RMAppManager#handle, and > consequently call RMAppMangager#submitApplication directly, though the code > looks like scheduling an APP_SUBMIT event. > In addition, the validation code before creating an RMApp instance is not > well organized. Ideally, the dynamic validation, which depends on the RM's > configuration, should be put in RMAppMangager#submitApplication. > RMAppMangager#submitApplication is called by > ClientRMService#submitApplication and RMAppMangager#recover. Since the > configuration may be changed after RM restarts, the validation needs to be > done again even in recovery mode. Therefore, resource request validation, > which based on min/max resource limits, should be moved from > ClientRMService#submitApplication to RMAppMangager#submitApplication. On the > other hand, the static validation, which is independent of the RM's > configuration should be put in ClientRMService#submitApplication, because it > is only need to be done once during the first submission. > Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. > RMAppMangager#submitApplication has a flaw is not synchronized. If two > application submissions with the same application ID enter the function, and > one progresses to the completion of RMApp instantiation, and the other > progresses the completion of putting the RMApp instance into rmContext, the > slower submission will cause an exception due to the duplicate application > ID. However, the exception will cause the RMApp instance already in rmContext > (belongs to the faster submission) being rejected with the current code flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager
[ https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645288#comment-13645288 ] Vinod Kumar Vavilapalli commented on YARN-599: -- Hm, it isn't straight-forward to figure that failures during RMAppManager.submitApplication() are properly put in Audit logs. But they are, I just verified. The latest patch looks good to me. +1, checking it in.. > Refactoring submitApplication in ClientRMService and RMAppManager > - > > Key: YARN-599 > URL: https://issues.apache.org/jira/browse/YARN-599 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-599.1.patch, YARN-599.2.patch > > > Currently, ClientRMService#submitApplication call RMAppManager#handle, and > consequently call RMAppMangager#submitApplication directly, though the code > looks like scheduling an APP_SUBMIT event. > In addition, the validation code before creating an RMApp instance is not > well organized. Ideally, the dynamic validation, which depends on the RM's > configuration, should be put in RMAppMangager#submitApplication. > RMAppMangager#submitApplication is called by > ClientRMService#submitApplication and RMAppMangager#recover. Since the > configuration may be changed after RM restarts, the validation needs to be > done again even in recovery mode. Therefore, resource request validation, > which based on min/max resource limits, should be moved from > ClientRMService#submitApplication to RMAppMangager#submitApplication. On the > other hand, the static validation, which is independent of the RM's > configuration should be put in ClientRMService#submitApplication, because it > is only need to be done once during the first submission. > Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. > RMAppMangager#submitApplication has a flaw is not synchronized. If two > application submissions with the same application ID enter the function, and > one progresses to the completion of RMApp instantiation, and the other > progresses the completion of putting the RMApp instance into rmContext, the > slower submission will cause an exception due to the duplicate application > ID. However, the exception will cause the RMApp instance already in rmContext > (belongs to the faster submission) being rejected with the current code flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager
[ https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645248#comment-13645248 ] Hadoop QA commented on YARN-599: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581118/YARN-599.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/844//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/844//console This message is automatically generated. > Refactoring submitApplication in ClientRMService and RMAppManager > - > > Key: YARN-599 > URL: https://issues.apache.org/jira/browse/YARN-599 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-599.1.patch, YARN-599.2.patch > > > Currently, ClientRMService#submitApplication call RMAppManager#handle, and > consequently call RMAppMangager#submitApplication directly, though the code > looks like scheduling an APP_SUBMIT event. > In addition, the validation code before creating an RMApp instance is not > well organized. Ideally, the dynamic validation, which depends on the RM's > configuration, should be put in RMAppMangager#submitApplication. > RMAppMangager#submitApplication is called by > ClientRMService#submitApplication and RMAppMangager#recover. Since the > configuration may be changed after RM restarts, the validation needs to be > done again even in recovery mode. Therefore, resource request validation, > which based on min/max resource limits, should be moved from > ClientRMService#submitApplication to RMAppMangager#submitApplication. On the > other hand, the static validation, which is independent of the RM's > configuration should be put in ClientRMService#submitApplication, because it > is only need to be done once during the first submission. > Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. > RMAppMangager#submitApplication has a flaw is not synchronized. If two > application submissions with the same application ID enter the function, and > one progresses to the completion of RMApp instantiation, and the other > progresses the completion of putting the RMApp instance into rmContext, the > slower submission will cause an exception due to the duplicate application > ID. However, the exception will cause the RMApp instance already in rmContext > (belongs to the faster submission) being rejected with the current code flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-621: Assignee: Vinod Kumar Vavilapalli (was: Omkar Vinit Joshi) Allen, can you share your environment details, I am not able to reproduce this in my setup. > RM triggers web auth failure before first job > - > > Key: YARN-621 > URL: https://issues.apache.org/jira/browse/YARN-621 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.4-alpha >Reporter: Allen Wittenauer >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > > On a secure YARN setup, before the first job is executed, going to the web > interface of the resource manager triggers authentication errors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-578) NodeManager should use SecureIOUtils for serving logs and intermediate outputs
[ https://issues.apache.org/jira/browse/YARN-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645240#comment-13645240 ] Vinod Kumar Vavilapalli commented on YARN-578: -- Can you use this only for YARN changes i.e. serving logs and open a separate MAPREDUCE ticket for ShuffleHandler? For the YARN changes: - Remove the comment above the code which talks about SecureIOUtils ;) - I think we should separate the exception message to clearly say whether this was an permission-issue or something else. > NodeManager should use SecureIOUtils for serving logs and intermediate outputs > -- > > Key: YARN-578 > URL: https://issues.apache.org/jira/browse/YARN-578 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Omkar Vinit Joshi > Attachments: yarn-578-20130426.patch > > > Log servlets for serving logs and the ShuffleService for serving intermediate > outputs both should use SecureIOUtils for avoiding symlink attacks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-599) Refactoring submitApplication in ClientRMService and RMAppManager
[ https://issues.apache.org/jira/browse/YARN-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-599: - Attachment: YARN-599.2.patch In the newer patch, I've updated the comments in ClientRMService and RMAppManager, and added audit logging for user, and duplicate Id exceptions. > Refactoring submitApplication in ClientRMService and RMAppManager > - > > Key: YARN-599 > URL: https://issues.apache.org/jira/browse/YARN-599 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-599.1.patch, YARN-599.2.patch > > > Currently, ClientRMService#submitApplication call RMAppManager#handle, and > consequently call RMAppMangager#submitApplication directly, though the code > looks like scheduling an APP_SUBMIT event. > In addition, the validation code before creating an RMApp instance is not > well organized. Ideally, the dynamic validation, which depends on the RM's > configuration, should be put in RMAppMangager#submitApplication. > RMAppMangager#submitApplication is called by > ClientRMService#submitApplication and RMAppMangager#recover. Since the > configuration may be changed after RM restarts, the validation needs to be > done again even in recovery mode. Therefore, resource request validation, > which based on min/max resource limits, should be moved from > ClientRMService#submitApplication to RMAppMangager#submitApplication. On the > other hand, the static validation, which is independent of the RM's > configuration should be put in ClientRMService#submitApplication, because it > is only need to be done once during the first submission. > Furthermore, try-catch flow in RMAppMangager#submitApplication has a flaw. > RMAppMangager#submitApplication has a flaw is not synchronized. If two > application submissions with the same application ID enter the function, and > one progresses to the completion of RMApp instantiation, and the other > progresses the completion of putting the RMApp instance into rmContext, the > slower submission will cause an exception due to the duplicate application > ID. However, the exception will cause the RMApp instance already in rmContext > (belongs to the faster submission) being rejected with the current code flow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-142) Change YARN APIs to throw IOException
[ https://issues.apache.org/jira/browse/YARN-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645060#comment-13645060 ] Siddharth Seth commented on YARN-142: - After HADOOP-9343, it should be possible for YarnException to not be rooted at IOException. So all methods can declare IOException and YarnException - and have the specializations of YarnException listed in the Javadoc. > Change YARN APIs to throw IOException > - > > Key: YARN-142 > URL: https://issues.apache.org/jira/browse/YARN-142 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 0.23.3, 2.0.0-alpha >Reporter: Siddharth Seth >Assignee: Xuan Gong >Priority: Blocker > Attachments: YARN-142.1.patch, YARN-142.2.patch, YARN-142.3.patch, > YARN-142.4.patch > > > Ref: MAPREDUCE-4067 > All YARN APIs currently throw YarnRemoteException. > 1) This cannot be extended in it's current form. > 2) The RPC layer can throw IOExceptions. These end up showing up as > UndeclaredThrowableExceptions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645042#comment-13645042 ] Hadoop QA commented on YARN-513: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581065/YARN-513.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/843//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/843//console This message is automatically generated. > Verify all clients will wait for RM to restart > -- > > Key: YARN-513 > URL: https://issues.apache.org/jira/browse/YARN-513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, > YARN-513.4.patch > > > When the RM is restarting, the NM, AM and Clients should wait for some time > for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645023#comment-13645023 ] Hudson commented on YARN-506: - Integrated in Hadoop-trunk-Commit #3695 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3695/]) YARN-506. Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute. Contributed by Ivan Mitic. (Revision 1477408) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1477408 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/util/CgroupsLCEResourcesHandler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java > Move to common utils FileUtil#setReadable/Writable/Executable and > FileUtil#canRead/Write/Execute > > > Key: YARN-506 > URL: https://issues.apache.org/jira/browse/YARN-506 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > Fix For: 3.0.0 > > Attachments: YARN-506.commonfileutils.2.patch, > YARN-506.commonfileutils.patch > > > Move to common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645015#comment-13645015 ] Hadoop QA commented on YARN-326: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581061/YARN-326-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/842//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/842//console This message is automatically generated. > Add multi-resource scheduling to the fair scheduler > --- > > Key: YARN-326 > URL: https://issues.apache.org/jira/browse/YARN-326 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: FairSchedulerDRFDesignDoc-1.pdf, > FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, > YARN-326.patch, YARN-326.patch > > > With YARN-2 in, the capacity scheduler has the ability to schedule based on > multiple resources, using dominant resource fairness. The fair scheduler > should be able to do multiple resource scheduling as well, also using > dominant resource fairness. > More details to come on how the corner cases with fair scheduler configs such > as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-513: --- Attachment: YARN-513.4.patch Fix -1 on javadoc warning > Verify all clients will wait for RM to restart > -- > > Key: YARN-513 > URL: https://issues.apache.org/jira/browse/YARN-513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, > YARN-513.4.patch > > > When the RM is restarting, the NM, AM and Clients should wait for some time > for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644996#comment-13644996 ] Hadoop QA commented on YARN-506: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12580366/YARN-506.commonfileutils.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/841//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/841//console This message is automatically generated. > Move to common utils FileUtil#setReadable/Writable/Executable and > FileUtil#canRead/Write/Execute > > > Key: YARN-506 > URL: https://issues.apache.org/jira/browse/YARN-506 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > Attachments: YARN-506.commonfileutils.2.patch, > YARN-506.commonfileutils.patch > > > Move to common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644997#comment-13644997 ] Hadoop QA commented on YARN-618: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581060/YARN-618.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/840//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/840//console This message is automatically generated. > Modify RM_INVALID_IDENTIFIER to a -ve number > - > > Key: YARN-618 > URL: https://issues.apache.org/jira/browse/YARN-618 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-618.1.patch, YARN-618.patch > > > RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. > Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-326: Attachment: YARN-326-1.patch > Add multi-resource scheduling to the fair scheduler > --- > > Key: YARN-326 > URL: https://issues.apache.org/jira/browse/YARN-326 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: FairSchedulerDRFDesignDoc-1.pdf, > FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326-1.patch, > YARN-326.patch, YARN-326.patch > > > With YARN-2 in, the capacity scheduler has the ability to schedule based on > multiple resources, using dominant resource fairness. The fair scheduler > should be able to do multiple resource scheduling as well, also using > dominant resource fairness. > More details to come on how the corner cases with fair scheduler configs such > as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-618: - Attachment: YARN-618.1.patch fixed test failure > Modify RM_INVALID_IDENTIFIER to a -ve number > - > > Key: YARN-618 > URL: https://issues.apache.org/jira/browse/YARN-618 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-618.1.patch, YARN-618.patch > > > RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. > Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644970#comment-13644970 ] Siddharth Seth commented on YARN-528: - bq Thanks for doing this Sid. I started pulling on the string and there was just too much involved, so I had to stop. Any thoughts on the approach used in the patch. Making IDs immutable should be reasonably fast using this - changing the PB mechanisms for other classes is a different beast though. > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: y528_AppIdPart_01_Refactor.txt, > y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt, > y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-575) ContainerManager APIs should be user accessible
[ https://issues.apache.org/jira/browse/YARN-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644964#comment-13644964 ] Siddharth Seth commented on YARN-575: - I'm fine going the route of getting container status from the RM - when required. Assuming we keep the NM equivalent though, for AMs to use. The AppTokens will be used for Authentication as well as Authorization for getContainerStatus calls ? > ContainerManager APIs should be user accessible > --- > > Key: YARN-575 > URL: https://issues.apache.org/jira/browse/YARN-575 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.0.4-alpha >Reporter: Siddharth Seth >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > > Auth for ContainerManager is based on the containerId being accessed - since > this is what is used to launch containers (There's likely another jira > somewhere to change this to not be containerId based). > What this also means is the API is effectively not usable with kerberos > credentials. > Also, it should be possible to use this API with some generic tokens > (RMDelegation?), instead of with Container specific tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644954#comment-13644954 ] Hadoop QA commented on YARN-618: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581054/YARN-618.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/839//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/839//console This message is automatically generated. > Modify RM_INVALID_IDENTIFIER to a -ve number > - > > Key: YARN-618 > URL: https://issues.apache.org/jira/browse/YARN-618 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-618.patch > > > RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. > Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-618) Modify RM_INVALID_IDENTIFIER to a -ve number
[ https://issues.apache.org/jira/browse/YARN-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-618: - Attachment: YARN-618.patch This patch changed RM_INVALID_IDENTIFIER to a -ve number, and changed the tests accordingly > Modify RM_INVALID_IDENTIFIER to a -ve number > - > > Key: YARN-618 > URL: https://issues.apache.org/jira/browse/YARN-618 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-618.patch > > > RM_INVALID_IDENTIFIER set to 0 doesnt sound right as many tests set it to 0. > Probably a -ve number is what we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644891#comment-13644891 ] Hadoop QA commented on YARN-513: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581001/YARN-513.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1366 javac compiler warnings (more than the trunk's current 1365 warnings). {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/838//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/838//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/838//console This message is automatically generated. > Verify all clients will wait for RM to restart > -- > > Key: YARN-513 > URL: https://issues.apache.org/jira/browse/YARN-513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch > > > When the RM is restarting, the NM, AM and Clients should wait for some time > for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-579) Make ApplicationToken part of Container's token list to help RM-restart
[ https://issues.apache.org/jira/browse/YARN-579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644859#comment-13644859 ] Vinod Kumar Vavilapalli commented on YARN-579: -- I validated this on trunk, I can run it successfully on trunk even now. It seems like it is failing on branch-2. Something at RPC level I suppose, digging through.. > Make ApplicationToken part of Container's token list to help RM-restart > --- > > Key: YARN-579 > URL: https://issues.apache.org/jira/browse/YARN-579 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.0.4-alpha >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Fix For: 2.0.5-beta > > Attachments: YARN-579-20130422.1.txt, > YARN-579-20130422.1_YARNChanges.txt > > > Container is already persisted for helping RM restart. Instead of explicitly > setting ApplicationToken in AM's env, if we change it to be in Container, we > can avoid env and can also help restart. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644845#comment-13644845 ] Bikas Saha commented on YARN-582: - The RMStore stores applications and their attempts. And it is used to restore applications and their attempts from the data that they had earlier stored. This allows the recovery code to follow existing code paths to the fullest extent and prevent recovery logic from diverging from the "normal" code path. So I would like to avoid storing tokens separately from apps/attempts and then have to manage their relationship later on during recovery. As far as saving appToken and clientToken, I agree it would be nice to have a single object store all attempt tokens in one place. At AppSubmitContext does that for app tokens. > Restore appToken and clientToken for app attempt after RM restart > - > > Key: YARN-582 > URL: https://issues.apache.org/jira/browse/YARN-582 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Jian He > Attachments: YARN-582.1.patch > > > These need to be saved and restored on a per app attempt basis. This is > required only when work preserving restart is implemented for secure > clusters. In non-preserving restart app attempts are killed and so this does > not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644828#comment-13644828 ] Jian He commented on YARN-582: -- Yes, application-level token is stored along with ApplicationSubmissionContext, no need additional handle for that > Restore appToken and clientToken for app attempt after RM restart > - > > Key: YARN-582 > URL: https://issues.apache.org/jira/browse/YARN-582 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Jian He > Attachments: YARN-582.1.patch > > > These need to be saved and restored on a per app attempt basis. This is > required only when work preserving restart is implemented for secure > clusters. In non-preserving restart app attempts are killed and so this does > not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-624) Support gang scheduling in the AM RM protocol
Sandy Ryza created YARN-624: --- Summary: Support gang scheduling in the AM RM protocol Key: YARN-624 URL: https://issues.apache.org/jira/browse/YARN-624 Project: Hadoop YARN Issue Type: Sub-task Components: api, scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on YARN-392 and elsewhere, gang scheduling, in which a scheduler runs a set of tasks when they can all be run at the same time, would be a useful feature for YARN schedulers to support. Currently, AMs can approximate this by holding on to containers until they get all the ones they need. However, this lends itself to deadlocks when different AMs are waiting on the same containers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-620) TestContainerLocalizer.testContainerLocalizerMain failed on branch-2
[ https://issues.apache.org/jira/browse/YARN-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644826#comment-13644826 ] Jian He commented on YARN-620: -- checked, it works fine now > TestContainerLocalizer.testContainerLocalizerMain failed on branch-2 > - > > Key: YARN-620 > URL: https://issues.apache.org/jira/browse/YARN-620 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-620.1.patch > > > Argument(s) are different! Wanted: > localFs.mkdir( > > /Users/jhe/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer/0/usercache/yak/filecache, > isA(org.apache.hadoop.fs.permission.FsPermission), > false > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testContainerLocalizerMain(TestContainerLocalizer.java:170) > Actual invocation has different arguments: > localFs.mkdir( > > file:/Users/jhe/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer/0/usercache/yak/filecache, > rwxr-xr-x, > false > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testContainerLocalizerMain(TestContainerLocalizer.java:162) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testContainerLocalizerMain(TestContainerLocalizer.java:170) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644820#comment-13644820 ] Vinod Kumar Vavilapalli commented on YARN-582: -- bq. Is it feasible to handle all tokens in an opaque credentials within the store? Agreed. But because there are two types of tokens - application level and application-attempt level, we should have two credential fields. > Restore appToken and clientToken for app attempt after RM restart > - > > Key: YARN-582 > URL: https://issues.apache.org/jira/browse/YARN-582 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Jian He > Attachments: YARN-582.1.patch > > > These need to be saved and restored on a per app attempt basis. This is > required only when work preserving restart is implemented for secure > clusters. In non-preserving restart app attempts are killed and so this does > not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container
[ https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644816#comment-13644816 ] Vinod Kumar Vavilapalli commented on YARN-613: -- bq. Question: How do you plan for NMs to authenticate the AM tokens? I thought I covered it but missed stating that - RM will share the underlying secret key corresponding to AM tokens as part of node-registration just like the one corresponding to ContainerTokens. > Create NM proxy per NM instead of per container > --- > > Key: YARN-613 > URL: https://issues.apache.org/jira/browse/YARN-613 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Vinod Kumar Vavilapalli > > Currently a new NM proxy has to be created per container since the secure > authentication is using a containertoken from the container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644814#comment-13644814 ] Vinod Kumar Vavilapalli commented on YARN-617: -- bq. Does there really need to be different NM behavior? Ie. Why can't the NM always require container tokens regardless of security setting? That is what I meant in my points above. ContainerTokens will always be sent irrespective of security and are used for *authorization*. I just put them as separate points to highlight that in secure mode, we also use ContainerTokens for *authentication*. > In unsercure mode, AM can fake resource requirements > - > > Key: YARN-617 > URL: https://issues.apache.org/jira/browse/YARN-617 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Minor > > Without security, it is impossible to completely avoid AMs faking resources. > We can at the least make it as difficult as possible by using the same > container tokens and the RM-NM shared key mechanism over unauthenticated > RM-NM channel. > In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container
[ https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644809#comment-13644809 ] Daryn Sharp commented on YARN-613: -- Question: How do you plan for NMs to authenticate the AM tokens? > Create NM proxy per NM instead of per container > --- > > Key: YARN-613 > URL: https://issues.apache.org/jira/browse/YARN-613 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Vinod Kumar Vavilapalli > > Currently a new NM proxy has to be created per container since the secure > authentication is using a containertoken from the container. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644805#comment-13644805 ] Daryn Sharp commented on YARN-582: -- I've only glanced over the patch, but do these tokens actually need to be handled specially? Is it feasible to handle all tokens in an opaque credentials within the store? I think that may reduce the copy-n-paste code throughout the stores for restoring these tokens. > Restore appToken and clientToken for app attempt after RM restart > - > > Key: YARN-582 > URL: https://issues.apache.org/jira/browse/YARN-582 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Jian He > Attachments: YARN-582.1.patch > > > These need to be saved and restored on a per app attempt basis. This is > required only when work preserving restart is implemented for secure > clusters. In non-preserving restart app attempts are killed and so this does > not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements
[ https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644795#comment-13644795 ] Daryn Sharp commented on YARN-617: -- Does there really need to be different NM behavior? Ie. Why can't the NM always require container tokens regardless of security setting? > In unsercure mode, AM can fake resource requirements > - > > Key: YARN-617 > URL: https://issues.apache.org/jira/browse/YARN-617 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Minor > > Without security, it is impossible to completely avoid AMs faking resources. > We can at the least make it as difficult as possible by using the same > container tokens and the RM-NM shared key mechanism over unauthenticated > RM-NM channel. > In the minimum, this will avoid accidental bugs in AMs in unsecure mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-575) ContainerManager APIs should be user accessible
[ https://issues.apache.org/jira/browse/YARN-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644793#comment-13644793 ] Daryn Sharp commented on YARN-575: -- I agree with your 2nd point, I think allowing users to directly stop containers will lead to problems. > ContainerManager APIs should be user accessible > --- > > Key: YARN-575 > URL: https://issues.apache.org/jira/browse/YARN-575 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.0.4-alpha >Reporter: Siddharth Seth >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > > Auth for ContainerManager is based on the containerId being accessed - since > this is what is used to launch containers (There's likely another jira > somewhere to change this to not be containerId based). > What this also means is the API is effectively not usable with kerberos > credentials. > Also, it should be possible to use this API with some generic tokens > (RMDelegation?), instead of with Container specific tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (YARN-579) Make ApplicationToken part of Container's token list to help RM-restart
[ https://issues.apache.org/jira/browse/YARN-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp reopened YARN-579: -- This has broken secure clusters. The AM is unable to find the token to register with the RM. I've debugged it far enough to see that localization has put the token in the nm-private dir, so it looks like the AM has amnesia when it connects to the RM. {noformat} 2013-04-29 17:47:02,666 DEBUG [IPC Client (4914628) connection to $RM:8030 from $USER] org.apache.hadoop.ipc.Client: IPC Client (4914628) connection to $RM:8030 from $USER: stopped, remaining connections 1 2013-04-29 17:47:02,667 ERROR [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Exception while registering java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:103) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:153) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.start(RMCommunicator.java:112) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.start(RMContainerAllocator.java:211) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.start(MRAppMaster.java:797) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:1014) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1369) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1365) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1318) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[KERBEROS, DIGEST] at org.apache.hadoop.ipc.Client.call(Client.java:1229) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy28.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:100) ... 12 more 2013-04-29 17:47:02,668 ERROR [main] org.apache.hadoop.yarn.service.CompositeService: Error starting services org.apache.hadoop.mapreduce.v2.app.MRAppMaster org.apache.hadoop.yarn.YarnException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:166) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.start(RMCommunicator.java:112) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.start(RMContainerAllocator.java:211) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.start(MRAppMaster.java:797) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:1014) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1369) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1365) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1318) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:103) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:153) ... 11 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): SIMPLE authentication is not enabled. Available:[KERBEROS, DIGEST] at org.apache.hadoop.ipc.Client.call(Client.java:1229) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy28.registerApplicationMaster(Unknown Source)
[jira] [Commented] (YARN-528) Make IDs read only
[ https://issues.apache.org/jira/browse/YARN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644784#comment-13644784 ] Robert Joseph Evans commented on YARN-528: -- Thanks for doing this Sid. I started pulling on the string and there was just too much involved, so I had to stop. > Make IDs read only > -- > > Key: YARN-528 > URL: https://issues.apache.org/jira/browse/YARN-528 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: y528_AppIdPart_01_Refactor.txt, > y528_AppIdPart_02_AppIdChanges.txt, y528_AppIdPart_03_fixUsage.txt, > y528_ApplicationIdComplete_WIP.txt, YARN-528.txt, YARN-528.txt > > > I really would like to rip out most if not all of the abstraction layer that > sits in-between Protocol Buffers, the RPC, and the actual user code. We have > no plans to support any other serialization type, and the abstraction layer > just, makes it more difficult to change protocols, makes changing them more > error prone, and slows down the objects themselves. > Completely doing that is a lot of work. This JIRA is a first step towards > that. It makes the various ID objects immutable. If this patch is wel > received I will try to go through other objects/classes of objects and update > them in a similar way. > This is probably the last time we will be able to make a change like this > before 2.0 stabilizes and YARN APIs will not be able to be changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644770#comment-13644770 ] Hadoop QA commented on YARN-582: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581012/YARN-582.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/836//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/836//console This message is automatically generated. > Restore appToken and clientToken for app attempt after RM restart > - > > Key: YARN-582 > URL: https://issues.apache.org/jira/browse/YARN-582 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Jian He > Attachments: YARN-582.1.patch > > > These need to be saved and restored on a per app attempt basis. This is > required only when work preserving restart is implemented for secure > clusters. In non-preserving restart app attempts are killed and so this does > not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-621) RM triggers web auth failure before first job
[ https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi reassigned YARN-621: -- Assignee: Omkar Vinit Joshi > RM triggers web auth failure before first job > - > > Key: YARN-621 > URL: https://issues.apache.org/jira/browse/YARN-621 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.4-alpha >Reporter: Allen Wittenauer >Assignee: Omkar Vinit Joshi >Priority: Critical > > On a secure YARN setup, before the first job is executed, going to the web > interface of the resource manager triggers authentication errors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644759#comment-13644759 ] Hadoop QA commented on YARN-326: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581014/YARN-326-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/837//console This message is automatically generated. > Add multi-resource scheduling to the fair scheduler > --- > > Key: YARN-326 > URL: https://issues.apache.org/jira/browse/YARN-326 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: FairSchedulerDRFDesignDoc-1.pdf, > FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326.patch, > YARN-326.patch > > > With YARN-2 in, the capacity scheduler has the ability to schedule based on > multiple resources, using dominant resource fairness. The fair scheduler > should be able to do multiple resource scheduling as well, also using > dominant resource fairness. > More details to come on how the corner cases with fair scheduler configs such > as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-326: Attachment: YARN-326-1.patch > Add multi-resource scheduling to the fair scheduler > --- > > Key: YARN-326 > URL: https://issues.apache.org/jira/browse/YARN-326 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: FairSchedulerDRFDesignDoc-1.pdf, > FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326.patch, > YARN-326.patch > > > With YARN-2 in, the capacity scheduler has the ability to schedule based on > multiple resources, using dominant resource fairness. The fair scheduler > should be able to do multiple resource scheduling as well, also using > dominant resource fairness. > More details to come on how the corner cases with fair scheduler configs such > as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644753#comment-13644753 ] Sandy Ryza commented on YARN-326: - Uploaded new patch that reflects design changes > Add multi-resource scheduling to the fair scheduler > --- > > Key: YARN-326 > URL: https://issues.apache.org/jira/browse/YARN-326 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: FairSchedulerDRFDesignDoc-1.pdf, > FairSchedulerDRFDesignDoc.pdf, YARN-326-1.patch, YARN-326.patch, > YARN-326.patch > > > With YARN-2 in, the capacity scheduler has the ability to schedule based on > multiple resources, using dominant resource fairness. The fair scheduler > should be able to do multiple resource scheduling as well, also using > dominant resource fairness. > More details to come on how the corner cases with fair scheduler configs such > as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-582) Restore appToken and clientToken for app attempt after RM restart
[ https://issues.apache.org/jira/browse/YARN-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-582: - Attachment: YARN-582.1.patch This patch restores application token on restart. And add test on both memoryStore and FileSystemStore > Restore appToken and clientToken for app attempt after RM restart > - > > Key: YARN-582 > URL: https://issues.apache.org/jira/browse/YARN-582 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Jian He > Attachments: YARN-582.1.patch > > > These need to be saved and restored on a per app attempt basis. This is > required only when work preserving restart is implemented for secure > clusters. In non-preserving restart app attempts are killed and so this does > not matter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644734#comment-13644734 ] Karthik Kambatla commented on YARN-326: --- Sandy - thanks for updating the doc. The approach is clear and fairly straight-forward. Nit: might want to add other DRF-followup papers to references. > Add multi-resource scheduling to the fair scheduler > --- > > Key: YARN-326 > URL: https://issues.apache.org/jira/browse/YARN-326 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: FairSchedulerDRFDesignDoc-1.pdf, > FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, YARN-326.patch > > > With YARN-2 in, the capacity scheduler has the ability to schedule based on > multiple resources, using dominant resource fairness. The fair scheduler > should be able to do multiple resource scheduling as well, also using > dominant resource fairness. > More details to come on how the corner cases with fair scheduler configs such > as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644720#comment-13644720 ] Hadoop QA commented on YARN-326: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581004/FairSchedulerDRFDesignDoc-1.pdf against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/835//console This message is automatically generated. > Add multi-resource scheduling to the fair scheduler > --- > > Key: YARN-326 > URL: https://issues.apache.org/jira/browse/YARN-326 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: FairSchedulerDRFDesignDoc-1.pdf, > FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, YARN-326.patch > > > With YARN-2 in, the capacity scheduler has the ability to schedule based on > multiple resources, using dominant resource fairness. The fair scheduler > should be able to do multiple resource scheduling as well, also using > dominant resource fairness. > More details to come on how the corner cases with fair scheduler configs such > as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-326) Add multi-resource scheduling to the fair scheduler
[ https://issues.apache.org/jira/browse/YARN-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-326: Attachment: FairSchedulerDRFDesignDoc-1.pdf Uploading a new design doc to reflect the discussion > Add multi-resource scheduling to the fair scheduler > --- > > Key: YARN-326 > URL: https://issues.apache.org/jira/browse/YARN-326 > Project: Hadoop YARN > Issue Type: New Feature > Components: scheduler >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: FairSchedulerDRFDesignDoc-1.pdf, > FairSchedulerDRFDesignDoc.pdf, YARN-326.patch, YARN-326.patch > > > With YARN-2 in, the capacity scheduler has the ability to schedule based on > multiple resources, using dominant resource fairness. The fair scheduler > should be able to do multiple resource scheduling as well, also using > dominant resource fairness. > More details to come on how the corner cases with fair scheduler configs such > as min and max resources will be handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-513: --- Attachment: YARN-513.3.patch > Verify all clients will wait for RM to restart > -- > > Key: YARN-513 > URL: https://issues.apache.org/jira/browse/YARN-513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch > > > When the RM is restarting, the NM, AM and Clients should wait for some time > for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644695#comment-13644695 ] Xuan Gong commented on YARN-513: bq:We could try to reuse existing RetryPolicy etc inside RMClient as long as we maintain the RMClient abstraction. Reuse the RetryPolicy in new patch. The RetryInvocationHandler provides the retry logic in its invoke method. We can reuse that bq:Are we not missing an RMClient.disconnect()? This one would internally stop the proxy? Yes, we need that. Adding the disconnect code in the new patch bq:Looks like NMStatusUpdater.getRMClient() can be removed because createRMClient() is being overridden by all tests. Removed from the new patch bq:Why are we throwing YARNException? Original code throws the YarnException, now i want to keep consistant. And I think we will change the exception thru YARN-142. bq:Is any test explicitly testing the new code with a real RM? How about manually doing it? Tested the new code in single node cluster > Verify all clients will wait for RM to restart > -- > > Key: YARN-513 > URL: https://issues.apache.org/jira/browse/YARN-513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Xuan Gong > Attachments: YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch > > > When the RM is restarting, the NM, AM and Clients should wait for some time > for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644675#comment-13644675 ] Chris Douglas commented on YARN-45: --- bq. we could express the ResourceRequest as a multiple of the minimum allocation +1 This is better > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644664#comment-13644664 ] Carlo Curino commented on YARN-45: -- [~acmurthy] I see your point, which was in fact reflected more clearly in our initial proposal. The only caveat is not to make this a capacity-only protocol (which you are not, but I wanted to reiterate that there are other use cases). I like [~bikassaha] and [~chris.douglas] spin on it (i.e., using ResourceRequest), as it gives us the immediate "capacity angle", but will eventually allow to evolve the implementations towards something richer (e.g., the preempt on behalf of a specific request that Bikas considered before) without impact to the protocols. I think there is a slightly cleaner version of Chris's proposal: use ResourceRequest and to represent a request that only cares about overall capacity we could express the ResourceRequest as a multiple of the minimum allocation (i.e., if we want 100GB of RAM back and min_container size is 1GB we ask for 100 x 1GB containers). This achieves Chris's proposal with a slightly prettier use of ResourceRequest. Note that there are size-matching issues (e.g., you have 1.5GB containers and I ask for 1x1GB containers, but we have very similar problems with Resource). I would say that as Chris pointed out [these semantics | https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950] plus the use of ResourceRequest I propose here as a minor variation on Chris's take should cover Arun's and Bika's comments (and I believe also the prior 45+ messages). Thoughts? > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644626#comment-13644626 ] Chris Douglas commented on YARN-45: --- I'm also a fan of {{ResourceRequest}}, but we're not really using all its features, yet. Similarly, {{Resource}} bakes in the fungibility of resources, which could be awkward as the RM accommodates richer requests (as in YARN-392). We could use {{ResourceRequest}}- so the API is there for extensions- but only populate the capability as an aggregate. With the convention that "\-1 containers" can mean "packed as you see fit," it expresses {{Resource}} (which we need in practice, since the priorities for requests don't always [match the preemption order|https://issues.apache.org/jira/browse/YARN-569?focusedCommentId=13638825&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13638825]), which is sufficient for the current schedulers. If we're adding the contract back with the set of containers, the [semantics|https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950] we discussed earlier still seem OK. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644596#comment-13644596 ] Chris Nauroth commented on YARN-506: I should have mentioned that my +1 is dependent on commit of HADOOP-9413 first, followed by +1 from Jenkins here on YARN-506. Thanks again! > Move to common utils FileUtil#setReadable/Writable/Executable and > FileUtil#canRead/Write/Execute > > > Key: YARN-506 > URL: https://issues.apache.org/jira/browse/YARN-506 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > Attachments: YARN-506.commonfileutils.2.patch, > YARN-506.commonfileutils.patch > > > Move to common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-576) RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations
[ https://issues.apache.org/jira/browse/YARN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644498#comment-13644498 ] Hudson commented on YARN-576: - Integrated in Hadoop-Mapreduce-trunk #1414 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1414/]) YARN-576. Modified ResourceManager to reject NodeManagers that don't satisy minimum resource requirements. Contributed by Kenji Kikushima. (Revision 1476824) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476824 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java > RM should not allow registrations from NMs that do not satisfy minimum > scheduler allocations > > > Key: YARN-576 > URL: https://issues.apache.org/jira/browse/YARN-576 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Kenji Kikushima > Labels: newbie > Fix For: 2.0.5-beta > > Attachments: YARN-576-2.patch, YARN-576-3.patch, YARN-576-4.patch, > YARN-576.patch > > > If the minimum resource allocation configured for the RM scheduler is 1 GB, > the RM should drop all NMs that register with a total capacity of less than 1 > GB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-576) RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations
[ https://issues.apache.org/jira/browse/YARN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644462#comment-13644462 ] Hudson commented on YARN-576: - Integrated in Hadoop-Hdfs-trunk #1387 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1387/]) YARN-576. Modified ResourceManager to reject NodeManagers that don't satisy minimum resource requirements. Contributed by Kenji Kikushima. (Revision 1476824) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476824 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java > RM should not allow registrations from NMs that do not satisfy minimum > scheduler allocations > > > Key: YARN-576 > URL: https://issues.apache.org/jira/browse/YARN-576 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Kenji Kikushima > Labels: newbie > Fix For: 2.0.5-beta > > Attachments: YARN-576-2.patch, YARN-576-3.patch, YARN-576-4.patch, > YARN-576.patch > > > If the minimum resource allocation configured for the RM scheduler is 1 GB, > the RM should drop all NMs that register with a total capacity of less than 1 > GB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-576) RM should not allow registrations from NMs that do not satisfy minimum scheduler allocations
[ https://issues.apache.org/jira/browse/YARN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644408#comment-13644408 ] Hudson commented on YARN-576: - Integrated in Hadoop-Yarn-trunk #198 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/198/]) YARN-576. Modified ResourceManager to reject NodeManagers that don't satisy minimum resource requirements. Contributed by Kenji Kikushima. (Revision 1476824) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1476824 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMExpiry.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestRMNMRPCResponseId.java > RM should not allow registrations from NMs that do not satisfy minimum > scheduler allocations > > > Key: YARN-576 > URL: https://issues.apache.org/jira/browse/YARN-576 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Kenji Kikushima > Labels: newbie > Fix For: 2.0.5-beta > > Attachments: YARN-576-2.patch, YARN-576-3.patch, YARN-576-4.patch, > YARN-576.patch > > > If the minimum resource allocation configured for the RM scheduler is 1 GB, > the RM should drop all NMs that register with a total capacity of less than 1 > GB. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers
[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644327#comment-13644327 ] Bikas Saha commented on YARN-45: My understanding is the the containers being presented in PreemptionMessage are going to be preempted by the RM some time in the near future if the RM cannot find free resources elsewhere. The AM's are not supposed to preempt the containers but they are encourage to checkpoint and save work. The RM can always choose to not preempt these containers and so it would be sub-optimal for the AM to kill these containers. If we want to add additional information besides the set of containers-to-be-preempted then I would prefer ResourceRequest (like it was in the original patch) and not Resource. Not only is that symmetric but also allows the RM to provide additional information about where to free containers. A smarter RM could potentially ask for resources to be preempted where the under-allocated job wants it and a smart AM could help out by choosing containers close to the desired locations. Secondly, Resource is too amorphous by itself. Asking an AM to free 50GB does not tell it whether the RM needs 10*5 or 50*1. Without that information the AM can end up freeing containers in a manner that does not help the RM to meet the request of the under-allocated job, thus failing to meet quota and wasting work at the same time. > Scheduler feedback to AM to release containers > -- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Chris Douglas >Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch, YARN-45.patch, > YARN-45.patch, YARN-45.patch, YARN-45_summary_of_alternatives.pdf > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira