[jira] [Created] (MAPREDUCE-6650) Surface error histograms from the AM
Bikas Saha created MAPREDUCE-6650: - Summary: Surface error histograms from the AM Key: MAPREDUCE-6650 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6650 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Bikas Saha Job tasks are constantly probing the cluster. So if there are some issues in the cluster then jobs would be the first to notice that. If we can make these observations surface to the user then we could quickly identify cluster issues. Lets say a set of bad machines got added to the cluster and tasks started seeing shuffle errors from those machines. This can slow down or hang the job. If the AM can surface increased errors counts from source and destination machines then that could pin point the bad machines vs having to arrive at those machines from first principles and log searching. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-3755) Add the equivalent of JobStatus to end of JobHistory file
[ https://issues.apache.org/jira/browse/MAPREDUCE-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved MAPREDUCE-3755. --- Resolution: Won't Fix Target Version/s: 2.0.0-alpha, 0.23.3, 3.0.0 (was: 0.23.3, 2.0.0-alpha, 3.0.0) Add the equivalent of JobStatus to end of JobHistory file -- Key: MAPREDUCE-3755 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3755 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobhistoryserver, mrv2 Affects Versions: 0.23.0 Reporter: Arun C Murthy Assignee: Bikas Saha Fix For: 0.23.2 In MR1 we have the notion of CompletedJobStatus store to aid fast responses to job.getStatus. We need the equivalent for MR2, an option is to add the jobStatus to the end of the JobHistory file to which the JHS can easily jump ahead to and serve the query, it should also cache this for a fair number of recently completed jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5398) MR changes for YARN-513
Bikas Saha created MAPREDUCE-5398: - Summary: MR changes for YARN-513 Key: MAPREDUCE-5398 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5398 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bikas Saha Assignee: Jian He -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5398) MR changes for YARN-513
[ https://issues.apache.org/jira/browse/MAPREDUCE-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved MAPREDUCE-5398. --- Resolution: Fixed Fix Version/s: 2.1.0-beta Target Version/s: 2.1.0-beta Hadoop Flags: Reviewed Committed to trunk, branch-2 and branch-2.1-beta. Thanks Jian! MR changes for YARN-513 --- Key: MAPREDUCE-5398 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5398 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Bikas Saha Assignee: Jian He Fix For: 2.1.0-beta Attachments: MAPREDUCE-5398.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5314) Fix build break by YARN-686 (flatten node report)
Bikas Saha created MAPREDUCE-5314: - Summary: Fix build break by YARN-686 (flatten node report) Key: MAPREDUCE-5314 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5314 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Attachments: TEZ-197.1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to classpath problems and unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved MAPREDUCE-4987. --- Resolution: Fixed Fix Version/s: 3.0.0 +1. Committed to trunk. TestMRJobs#testDistributedCache fails on Windows due to classpath problems and unexpected behavior of symlinks -- Key: MAPREDUCE-4987 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache, nodemanager Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0 Attachments: MAPREDUCE-4987.1.patch, MAPREDUCE-4987.2.patch, MAPREDUCE-4987.3.patch, MAPREDUCE-4987.4.patch, MAPREDUCE-4987.5.patch, MAPREDUCE-4987.6.patch On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while checking the length of a symlink. It expects to see the length of the target of the symlink, but Java 6 on Windows always reports that a symlink has length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5161) Merge MAPREDUCE-1806 from branch-1 to branch-1-win. CombineFileInputFormat fix for paths not on default FS
[ https://issues.apache.org/jira/browse/MAPREDUCE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved MAPREDUCE-5161. --- Resolution: Fixed Fix Version/s: 1-win +1. Committed to branch-1-win. Merge MAPREDUCE-1806 from branch-1 to branch-1-win. CombineFileInputFormat fix for paths not on default FS -- Key: MAPREDUCE-5161 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5161 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Affects Versions: 1-win Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 1-win Attachments: MAPREDUCE-5161-branch-1-win.1.patch MAPREDUCE-1806 fixed a bug related to use of {{CombineFileInputFormat}} with paths that are not on the default file system. This jira will merge the branch-1 fix to branch-1-win. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5140) MR part of YARN-514
[ https://issues.apache.org/jira/browse/MAPREDUCE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved MAPREDUCE-5140. --- Resolution: Fixed Committed to trunk and branch-2 MR part of YARN-514 --- Key: MAPREDUCE-5140 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5140 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: MAPREDUCE-5140.1.patch In YARN-514, application store needs to be delayed to unblock application submission, such that a new state of MRApp needs to be created. On mapreduce side, there's some function to map yarn states to mapreduce ones. This mapping needs to be updated due to the newly added state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4892) CombineFileInputFormat node input split can be skewed on small clusters
Bikas Saha created MAPREDUCE-4892: - Summary: CombineFileInputFormat node input split can be skewed on small clusters Key: MAPREDUCE-4892 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4892 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 3.0.0 The CombineFileInputFormat split generation logic tries to group blocks by node in order to create splits. It iterates through the nodes and creates splits on them until there aren't enough blocks left on a node that can be grouped into a valid split. If the first few nodes have a lot of blocks on them then they can end up getting a disproportionately large share of the total number of splits created. This can result in poor locality of maps. This problem is likely to happen on small clusters where its easier to create a skew in the distribution of blocks on nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4893) MR AppMaster can do sub-optimal assignment of containers to map tasks leading to poor node locality
Bikas Saha created MAPREDUCE-4893: - Summary: MR AppMaster can do sub-optimal assignment of containers to map tasks leading to poor node locality Key: MAPREDUCE-4893 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4893 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 3.0.0 Say the MR AppMaster asks the RM for 3 containers on nodes n1, n2 and n3. There are 10 node n1-n10 in the same rack. The RM can give it allocated containers in the list order n5, n2, n1. The way AM map-container assignment happens, the AM will try to assign node local maps to n5, failing which it will assign rack local maps to n5. These rack local maps could be node local on n2 and n1 and would have been assigned to containers on n1 and n2 if the AM had not made an early rack local match for them on n5. This can lead to poor locality. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4635) MR side of YARN-83. Changing package of YarnClient
Bikas Saha created MAPREDUCE-4635: - Summary: MR side of YARN-83. Changing package of YarnClient Key: MAPREDUCE-4635 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4635 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4561) Support for node health scripts on Windows
Bikas Saha created MAPREDUCE-4561: - Summary: Support for node health scripts on Windows Key: MAPREDUCE-4561 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4561 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bikas Saha TestNodeHealthService fails because NodeHealthServiceChecker tries to run a shell script directly. That wont work on Windows. Need to launch it via cmd or winutils. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4472) Support secure AM launch for unmanaged AM's
Bikas Saha created MAPREDUCE-4472: - Summary: Support secure AM launch for unmanaged AM's Key: MAPREDUCE-4472 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4472 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Bikas Saha Assignee: Bikas Saha Currently unmanaged AM launch does not get security tokens because tokens are passed by the RM to the AM via the NM during AM container launch. For unmanaged AM's the RM can send tokens in the SubmitApplicationResponse to the secure client. The client can then pass these onto the AM in a manner similar to the NM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4436) AppRejectedTransition does not unregister app from master service and scheduler
Bikas Saha created MAPREDUCE-4436: - Summary: AppRejectedTransition does not unregister app from master service and scheduler Key: MAPREDUCE-4436 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4436 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.0-alpha, 0.23.1, 3.0.0 Reporter: Bikas Saha Assignee: Bikas Saha AttemptStartedTransition() adds the app to the ApplicationMasterService and scheduler. when the scheduler rejects the app then AppRejectedTransition() forgets to unregister it from the ApplicationMasterService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4438) Add client side for UnmanagedRM
Bikas Saha created MAPREDUCE-4438: - Summary: Add client side for UnmanagedRM Key: MAPREDUCE-4438 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4438 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Bikas Saha -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4427) Enable the RM to work with AM's that are not managed by it
Bikas Saha created MAPREDUCE-4427: - Summary: Enable the RM to work with AM's that are not managed by it Key: MAPREDUCE-4427 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4427 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Bikas Saha Assignee: Bikas Saha Currently, the RM itself manages the AM by allocating a container for it and negotiating the launch on the NodeManager and manages the AM lifecycle. Thereafter, the AM negotiates resources with the RM and launches tasks to do the real work. It would be a useful improvement to enhance this model by allowing the AM to be launched independently by the client without requiring the RM. These AM's would be launched on a gateway machine that can talk to the cluster. This would open up new use cases such as the following 1) Easy debugging of AM, specially during initial development. Having the AM launched on an arbitrary cluster node makes it hard to looks at logs or attach a debugger to the AM. If it can be launched locally then these tasks would be easier. 2) Running AM's that need special privileges that may not be available on machines managed by the NodeManager -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4401) Enhancements to HDFS for Windows Server and Windows Azure development and runtime environments
Bikas Saha created MAPREDUCE-4401: - Summary: Enhancements to HDFS for Windows Server and Windows Azure development and runtime environments Key: MAPREDUCE-4401 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4401 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Bikas Saha Assignee: Bikas Saha This JIRA tracks the work that needs to be done on trunk to enable Hadoop to run on Windows Server and Azure environments. This incorporates porting relevant work from the similar effort on branch 1 tracked via HADOOP-8079. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2713) Recovery of ResourceManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved MAPREDUCE-2713. --- Resolution: Duplicate Resolving as dup of MAPREDUCE-4326 Recovery of ResourceManager --- Key: MAPREDUCE-2713 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2713 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.1 Reporter: Sharad Agarwal Assignee: Mahadev konar ResourceManager needs to recover from crashes to the state where it left off. All running applications should be able to join back the restarted RM. All running containers should not be affected and continue to run. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4263) Use taskkill /T to terminate tasks on Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha resolved MAPREDUCE-4263. --- Resolution: Fixed Fixed in the change for MAPREDUCE-4260 Use taskkill /T to terminate tasks on Windows - Key: MAPREDUCE-4263 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4263 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.0 Reporter: Bikas Saha On Linux setsid is used to link the processes spawned by the tasks into the same session. So termination of the task terminates the entire tree. We need to do the same for Windows. This is not fool proof but should be sufficient until we have a potentially better solution in MAPREDUCE-4260. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4330) TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful
Bikas Saha created MAPREDUCE-4330: - Summary: TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful Key: MAPREDUCE-4330 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4330 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha The previously completed attempt is removed from successAttemptCompletionEventNoMap and marked OBSOLETE. After that, if the newly completed attempt is successful then it is added to the successAttemptCompletionEventNoMap. This seems wrong because the newly completed attempt could be failed and thus there is no need to invalidate the successful attempt. One error case would be when a speculative attempt completes with killed/failed after the successful version has completed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4259) TestDatanodeBlockScanner and TestReplication fail intermittently on Windows
Bikas Saha created MAPREDUCE-4259: - Summary: TestDatanodeBlockScanner and TestReplication fail intermittently on Windows Key: MAPREDUCE-4259 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4259 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.0 Reporter: Bikas Saha Assignee: Bikas Saha The tests change the block length to corrupt the data block. If the block file is opened by the datanode then the test can concurrently modify it on Linux but such concurrent modification is not allowed by the default permissions on Windows. Since this is more of a test issue, the fix would be to have the tests make sure that the block is not open concurrently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4201) Getting PID not working on Windows. Termination of Task/TaskJVM's not working
Bikas Saha created MAPREDUCE-4201: - Summary: Getting PID not working on Windows. Termination of Task/TaskJVM's not working Key: MAPREDUCE-4201 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4201 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Bikas Saha Assignee: Bikas Saha Child Task not reporting PID because of Linux specific shell script implementation. Signaling task termination currently disabled by the initial Windows patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4203) Create equivalent of ProcfsBasedProcessTree for Windows
Bikas Saha created MAPREDUCE-4203: - Summary: Create equivalent of ProcfsBasedProcessTree for Windows Key: MAPREDUCE-4203 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4203 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Bikas Saha Assignee: Bikas Saha ProcfsBasedProcessTree is used by the TaskTracker to get process information like memory and cpu usage. This information is used to manage resources etc. The current implementation is based on Linux procfs functionality and hence does not work on other platforms, specifically windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira