[jira] [Created] (MAPREDUCE-5076) CombineFileInputFormat with maxSplitSize can omit data
Sandy Ryza created MAPREDUCE-5076: - Summary: CombineFileInputFormat with maxSplitSize can omit data Key: MAPREDUCE-5076 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5076 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza I ran a local job with CombineFileInputFormat using an 80 MB file and a max split size of 32 MB (the default local FS block size). The job ran with two splits of 32 MB, and the last 16 MB were just omitted. This appears to be caused by a subtle bug in getMoreSplits, in which the code that generates the splits from the blocks expects the 16 MB block to be at the end of the block list. But the code that generates the blocks does not respect this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5075) DistCp leaks input file handles
[ https://issues.apache.org/jira/browse/MAPREDUCE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603992#comment-13603992 ] Hadoop QA commented on MAPREDUCE-5075: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573975/MAPREDUCE-5075.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-distcp. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3421//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3421//console This message is automatically generated. > DistCp leaks input file handles > --- > > Key: MAPREDUCE-5075 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5075 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5075.1.patch > > > DistCp wraps the {{InputStream}} for each input file it reads in an instance > of {{ThrottledInputStream}}. This class does not close the wrapped > {{InputStream}}. {{RetriableFileCopyCommand}} guarantees that the > {{ThrottledInputStream}} gets closed, but without closing the underlying > wrapped stream, it still leaks a file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5075) DistCp leaks input file handles
[ https://issues.apache.org/jira/browse/MAPREDUCE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-5075: - Status: Patch Available (was: Open) > DistCp leaks input file handles > --- > > Key: MAPREDUCE-5075 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5075 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5075.1.patch > > > DistCp wraps the {{InputStream}} for each input file it reads in an instance > of {{ThrottledInputStream}}. This class does not close the wrapped > {{InputStream}}. {{RetriableFileCopyCommand}} guarantees that the > {{ThrottledInputStream}} gets closed, but without closing the underlying > wrapped stream, it still leaks a file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5075) DistCp leaks input file handles
[ https://issues.apache.org/jira/browse/MAPREDUCE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-5075: - Attachment: MAPREDUCE-5075.1.patch Here is a patch with the following changes: # {{RetriableFileCopyCommand}} - This is just code clean-up. The {{copyBytes}} private method accepted a flag as an argument to control whether or not to close the streams after copying. This method was only ever called from {{copyToTmpFile}} with a hard-coded true. I removed the flag from the method signature and changed the code so that it closes the streams unconditionally. # {{ThrottledInputStream}} - Override {{close}} so that it closes the wrapped stream. # {{TestIntegration}} - This code was not creating the target file correctly. {{target}} contains a fully qualified path. Inside {{createFiles}}, it prepends the test root again. This would be 2 fully qualified paths appended to each other. On Windows, the result would look like C:\project\target\C:\project\target. The second ':' makes the filename invalid. With this patch, all DistCp tests pass consistently on Mac and Windows. > DistCp leaks input file handles > --- > > Key: MAPREDUCE-5075 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5075 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5075.1.patch > > > DistCp wraps the {{InputStream}} for each input file it reads in an instance > of {{ThrottledInputStream}}. This class does not close the wrapped > {{InputStream}}. {{RetriableFileCopyCommand}} guarantees that the > {{ThrottledInputStream}} gets closed, but without closing the underlying > wrapped stream, it still leaks a file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5038) old API CombineFileInputFormat missing fixes that are in new API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603940#comment-13603940 ] Sandy Ryza commented on MAPREDUCE-5038: --- It looks like I messed this up and left out part of MAPREDUCE-1597. Working on a replacement patch. > old API CombineFileInputFormat missing fixes that are in new API > - > > Key: MAPREDUCE-5038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.2.0 > > Attachments: MAPREDUCE-5038-1.patch, MAPREDUCE-5038.patch > > > The following changes patched the CombineFileInputFormat in mapreduce, but > neglected the one in mapred > MAPREDUCE-1597 enabled the CombineFileInputFormat to work on splittable files > MAPREDUCE-2021 solved returning duplicate hostnames in split locations > MAPREDUCE-1806 CombineFileInputFormat does not work with paths not on default > FS > In trunk this is not an issue as the one in mapred extends the one in > mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5075) DistCp leaks input file handles
[ https://issues.apache.org/jira/browse/MAPREDUCE-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603903#comment-13603903 ] Chris Nauroth commented on MAPREDUCE-5075: -- I discovered this while testing on Windows, where file locking is enforced more strictly. The DistCp tests would fail sporadically due to not being able to delete the temp files. I have a patch in progress. > DistCp leaks input file handles > --- > > Key: MAPREDUCE-5075 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5075 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > > DistCp wraps the {{InputStream}} for each input file it reads in an instance > of {{ThrottledInputStream}}. This class does not close the wrapped > {{InputStream}}. {{RetriableFileCopyCommand}} guarantees that the > {{ThrottledInputStream}} gets closed, but without closing the underlying > wrapped stream, it still leaks a file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5075) DistCp leaks input file handles
Chris Nauroth created MAPREDUCE-5075: Summary: DistCp leaks input file handles Key: MAPREDUCE-5075 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5075 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 3.0.0 Reporter: Chris Nauroth Assignee: Chris Nauroth DistCp wraps the {{InputStream}} for each input file it reads in an instance of {{ThrottledInputStream}}. This class does not close the wrapped {{InputStream}}. {{RetriableFileCopyCommand}} guarantees that the {{ThrottledInputStream}} gets closed, but without closing the underlying wrapped stream, it still leaks a file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603885#comment-13603885 ] Hitesh Shah commented on MAPREDUCE-5066: Job notification also exists in 2.x which may face the same set of issues. > JobTracker should set a timeout when calling into job.end.notification.url > -- > > Key: MAPREDUCE-5066 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > > In current code, timeout is not specified when JobTracker (JobEndNotifier) > calls into the notification URL. When the given URL points to a server that > will not respond for a long time, job notifications are completely stuck > (given that we have only a single thread processing all notifications). We've > seen this cause noticeable delays in job execution in components that rely on > job end notifications (like Oozie workflows). > I propose we introduce a configurable timeout option and set a default to a > reasonably small value. > If we want, we can also introduce a configurable number of workers processing > the notification queue (not sure if this is needed though at this point). > I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5066) JobTracker should set a timeout when calling into job.end.notification.url
[ https://issues.apache.org/jira/browse/MAPREDUCE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-5066: --- Affects Version/s: 2.0.3-alpha > JobTracker should set a timeout when calling into job.end.notification.url > -- > > Key: MAPREDUCE-5066 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5066 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1-win, 2.0.3-alpha, 1.3.0 >Reporter: Ivan Mitic >Assignee: Ivan Mitic > > In current code, timeout is not specified when JobTracker (JobEndNotifier) > calls into the notification URL. When the given URL points to a server that > will not respond for a long time, job notifications are completely stuck > (given that we have only a single thread processing all notifications). We've > seen this cause noticeable delays in job execution in components that rely on > job end notifications (like Oozie workflows). > I propose we introduce a configurable timeout option and set a default to a > reasonably small value. > If we want, we can also introduce a configurable number of workers processing > the notification queue (not sure if this is needed though at this point). > I will prepare a patch soon. Please comment back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603872#comment-13603872 ] Hudson commented on MAPREDUCE-5042: --- Integrated in Hadoop-trunk-Commit #3483 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3483/]) MAPREDUCE-5042. Reducer unable to fetch for a map task that was recovered (Jason Lowe via bobby) (Revision 1457119) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457119 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/pipes/TestPipeApplication.java > Reducer unable to fetch for a map task that was recovered > - > > Key: MAPREDUCE-5042 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, security >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Fix For: 3.0.0, 0.23.7, 2.0.5-beta > > Attachments: MAPREDUCE-5042.patch, MAPREDUCE-5042.patch, > MAPREDUCE-5042.patch, MAPREDUCE-5042.patch > > > If an application attempt fails and is relaunched the AM will try to recover > previously completed tasks. If a reducer needs to fetch the output of a map > task attempt that was recovered then it will fail with a 401 error like this: > {noformat} > java.io.IOException: Server returned HTTP response code: 401 for URL: > http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_16_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156) > {noformat} > Looking at the corresponding NM's logs, we see the shuffle failed due to > "Verification of the hashReply failed". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5042) Reducer unable to fetch for a map task that was recovered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-5042: --- Resolution: Fixed Fix Version/s: 2.0.5-beta 0.23.7 3.0.0 Status: Resolved (was: Patch Available) Thanks Jason, I put this into trunk, branch-2, and branch-0.23 > Reducer unable to fetch for a map task that was recovered > - > > Key: MAPREDUCE-5042 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5042 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, security >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Fix For: 3.0.0, 0.23.7, 2.0.5-beta > > Attachments: MAPREDUCE-5042.patch, MAPREDUCE-5042.patch, > MAPREDUCE-5042.patch, MAPREDUCE-5042.patch > > > If an application attempt fails and is relaunched the AM will try to recover > previously completed tasks. If a reducer needs to fetch the output of a map > task attempt that was recovered then it will fail with a 401 error like this: > {noformat} > java.io.IOException: Server returned HTTP response code: 401 for URL: > http://xx:xx/mapOutput?job=job_1361569180491_21845&reduce=0&map=attempt_1361569180491_21845_m_16_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1615) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:231) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:156) > {noformat} > Looking at the corresponding NM's logs, we see the shuffle failed due to > "Verification of the hashReply failed". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4875) coverage fixing for org.apache.hadoop.mapred
[ https://issues.apache.org/jira/browse/MAPREDUCE-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603834#comment-13603834 ] Robert Joseph Evans commented on MAPREDUCE-4875: For the most part everything looks good here. My only concern is that in some places it looks like we are just testing dead code. Things like TaskLog are used by pipes, but only part of it are used by pipes so ripping out the unused code I think would be preferable. > coverage fixing for org.apache.hadoop.mapred > > > Key: MAPREDUCE-4875 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4875 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 >Reporter: Aleksey Gorshkov > Fix For: 3.0.0, 0.23.6, 2.0.5-beta > > Attachments: MAPREDUCE-4875-branch-0.23.patch, > MAPREDUCE-4875-trunk.patch > > > added some tests for org.apache.hadoop.mapred > MAPREDUCE-4875-trunk.patch for trunk and branch-2 > MAPREDUCE-4875-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5073) TestJobStatusPersistency.testPersistency fails on JDK7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5073: -- Resolution: Fixed Fix Version/s: (was: 1.2.0) 1.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Sandy. Committed to branch-1. > TestJobStatusPersistency.testPersistency fails on JDK7 > -- > > Key: MAPREDUCE-5073 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5073 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.1.2 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.3.0 > > Attachments: MAPREDUCE-5073.patch > > > TestJobStatusPersistency is sensitive to the order that the tests are run in. > If testLocalPersistency runs before testPersistency, testPersistency will > fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5073) TestJobStatusPersistency.testPersistency fails on JDK7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603789#comment-13603789 ] Alejandro Abdelnur commented on MAPREDUCE-5073: --- +1 > TestJobStatusPersistency.testPersistency fails on JDK7 > -- > > Key: MAPREDUCE-5073 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5073 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.1.2 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.2.0 > > Attachments: MAPREDUCE-5073.patch > > > TestJobStatusPersistency is sensitive to the order that the tests are run in. > If testLocalPersistency runs before testPersistency, testPersistency will > fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated MAPREDUCE-4980: --- Issue Type: Test (was: Improvement) > Parallel test execution of hadoop-mapreduce-client-core > --- > > Key: MAPREDUCE-4980 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0 >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980.patch > > > The maven surefire plugin supports parallel testing feature. By using it, the > tests can be run more faster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603718#comment-13603718 ] Arpit Agarwal commented on MAPREDUCE-4987: -- +1 Chris explained to me offline about the change in FileUtil#createJarWithClassPath. Quoting here since I found it helpful to understand the change. {quote} In the method sanitizeEnv, you'll see that nodemanager does various things to set up a new environment for the container to be launched. The final state of this environment will be different from the environment of the currently running process (the nodemanager itself). The most glaring problem with this bug was the setting of PWD to the new container work directory. There are various classpath entries for the distributed cache files that are of the form $PWD/file on Mac or %PWD%/file on Windows, and FileUtil#createJarWithClassPath needs to expand this to /file. Without this change, the variable expansion would be incorrect: /file on Mac or just /file on Windows (since Windows doesn't intrinsically have %PWD% defined until nodemanager sets it in sanitizeEnv). {quote} > TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior > of symlinks > --- > > Key: MAPREDUCE-4987 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, nodemanager >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-4987.1.patch > > > On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while > checking the length of a symlink. It expects to see the length of the target > of the symlink, but Java 6 on Windows always reports that a symlink has > length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4571) TestHsWebServicesJobs fails on jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603705#comment-13603705 ] Hudson commented on MAPREDUCE-4571: --- Integrated in Hadoop-trunk-Commit #3481 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3481/]) MAPREDUCE-4571. TestHsWebServicesJobs fails on jdk7. (tgraves via tucu) (Revision 1457061) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457061 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/MockHistoryJobs.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobs.java > TestHsWebServicesJobs fails on jdk7 > --- > > Key: MAPREDUCE-4571 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4571 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: webapps >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Fix For: 2.0.5-beta > > Attachments: MAPREDUCE-4571.patch > > > TestHsWebServicesJobs fails on jdk7. > Tests run: 22, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.561 sec > <<< > FAILURE!testJobIdSlash(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs) > Time elapsed: 0.334 sec <<< FAILURE! > java.lang.AssertionError: mapsTotal incorrect expected:<0> but was:<1> -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4716) TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603704#comment-13603704 ] Hudson commented on MAPREDUCE-4716: --- Integrated in Hadoop-trunk-Commit #3481 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3481/]) MAPREDUCE-4716. TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7. (tgraves via tucu) (Revision 1457065) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1457065 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobsQuery.java > TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7 > > > Key: MAPREDUCE-4716 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4716 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Fix For: 2.0.5-beta > > Attachments: MAPREDUCE-4716.patch > > > Using jdk7 TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails. > It looks like the string changed from "const class" to "constant" in jdk7. > Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.713 sec > <<< FAILURE! > testJobsQueryStateInvalid(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery) > Time elapsed: 0.371 sec <<< FAILURE! > java.lang.AssertionError: exception message doesn't match, got: No enum > constant org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > expected: No enum const class > org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > at org.junit.Assert.fail(Assert.java:91)at > org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.yarn.webapp.WebServicesTestUtils.checkStringMatch(WebServicesTestUtils.java:77) > at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery.testJobsQueryStateInvalid(TestHsWebServicesJobsQuery.java:286) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603700#comment-13603700 ] Hadoop QA commented on MAPREDUCE-4987: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573897/MAPREDUCE-4987.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3420//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3420//console This message is automatically generated. > TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior > of symlinks > --- > > Key: MAPREDUCE-4987 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, nodemanager >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-4987.1.patch > > > On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while > checking the length of a symlink. It expects to see the length of the target > of the symlink, but Java 6 on Windows always reports that a symlink has > length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4716) TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4716: -- Resolution: Fixed Fix Version/s: 2.0.5-beta Target Version/s: 2.0.3-alpha, 3.0.0, 0.23.7 (was: 3.0.0, 2.0.3-alpha, 0.23.7) Status: Resolved (was: Patch Available) Thanks Thomas (and Sandy for verifying still applies/works). Committed to trunk and branch-2. > TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7 > > > Key: MAPREDUCE-4716 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4716 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Fix For: 2.0.5-beta > > Attachments: MAPREDUCE-4716.patch > > > Using jdk7 TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails. > It looks like the string changed from "const class" to "constant" in jdk7. > Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.713 sec > <<< FAILURE! > testJobsQueryStateInvalid(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery) > Time elapsed: 0.371 sec <<< FAILURE! > java.lang.AssertionError: exception message doesn't match, got: No enum > constant org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > expected: No enum const class > org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > at org.junit.Assert.fail(Assert.java:91)at > org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.yarn.webapp.WebServicesTestUtils.checkStringMatch(WebServicesTestUtils.java:77) > at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery.testJobsQueryStateInvalid(TestHsWebServicesJobsQuery.java:286) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4716) TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603692#comment-13603692 ] Alejandro Abdelnur commented on MAPREDUCE-4716: --- +1 > TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7 > > > Key: MAPREDUCE-4716 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4716 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Attachments: MAPREDUCE-4716.patch > > > Using jdk7 TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails. > It looks like the string changed from "const class" to "constant" in jdk7. > Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.713 sec > <<< FAILURE! > testJobsQueryStateInvalid(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery) > Time elapsed: 0.371 sec <<< FAILURE! > java.lang.AssertionError: exception message doesn't match, got: No enum > constant org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > expected: No enum const class > org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > at org.junit.Assert.fail(Assert.java:91)at > org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.yarn.webapp.WebServicesTestUtils.checkStringMatch(WebServicesTestUtils.java:77) > at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery.testJobsQueryStateInvalid(TestHsWebServicesJobsQuery.java:286) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4571) TestHsWebServicesJobs fails on jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4571: -- Resolution: Fixed Fix Version/s: 2.0.5-beta Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Thomas (and Sandy for verifying it still applies/works). Committed to trunk and branch-2. > TestHsWebServicesJobs fails on jdk7 > --- > > Key: MAPREDUCE-4571 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4571 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: webapps >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Fix For: 2.0.5-beta > > Attachments: MAPREDUCE-4571.patch > > > TestHsWebServicesJobs fails on jdk7. > Tests run: 22, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.561 sec > <<< > FAILURE!testJobIdSlash(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs) > Time elapsed: 0.334 sec <<< FAILURE! > java.lang.AssertionError: mapsTotal incorrect expected:<0> but was:<1> -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4571) TestHsWebServicesJobs fails on jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603679#comment-13603679 ] Alejandro Abdelnur commented on MAPREDUCE-4571: --- +1 > TestHsWebServicesJobs fails on jdk7 > --- > > Key: MAPREDUCE-4571 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4571 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: webapps >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Attachments: MAPREDUCE-4571.patch > > > TestHsWebServicesJobs fails on jdk7. > Tests run: 22, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.561 sec > <<< > FAILURE!testJobIdSlash(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs) > Time elapsed: 0.334 sec <<< FAILURE! > java.lang.AssertionError: mapsTotal incorrect expected:<0> but was:<1> -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5074) Remove limits on number of counters and counter groups in MapReduce
Ravi Prakash created MAPREDUCE-5074: --- Summary: Remove limits on number of counters and counter groups in MapReduce Key: MAPREDUCE-5074 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5074 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Affects Versions: 2.0.3-alpha, 3.0.0, 0.23.6 Reporter: Ravi Prakash Can we please consider removing limits on the number of counters and counter groups now that it is all user code? Thanks to the much better architecture of YARN in which there is no single Job Tracker we have to worry about overloading, I feel we should do away with this (now arbitrary) constraint on users' capabilities. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4716) TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603608#comment-13603608 ] Sandy Ryza commented on MAPREDUCE-4716: --- I just tried the patch on top of trunk and it still applies and works. > TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7 > > > Key: MAPREDUCE-4716 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4716 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Attachments: MAPREDUCE-4716.patch > > > Using jdk7 TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails. > It looks like the string changed from "const class" to "constant" in jdk7. > Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.713 sec > <<< FAILURE! > testJobsQueryStateInvalid(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery) > Time elapsed: 0.371 sec <<< FAILURE! > java.lang.AssertionError: exception message doesn't match, got: No enum > constant org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > expected: No enum const class > org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > at org.junit.Assert.fail(Assert.java:91)at > org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.yarn.webapp.WebServicesTestUtils.checkStringMatch(WebServicesTestUtils.java:77) > at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery.testJobsQueryStateInvalid(TestHsWebServicesJobsQuery.java:286) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4571) TestHsWebServicesJobs fails on jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603605#comment-13603605 ] Sandy Ryza commented on MAPREDUCE-4571: --- I just tried the patch on top of trunk and it still applies and works. > TestHsWebServicesJobs fails on jdk7 > --- > > Key: MAPREDUCE-4571 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4571 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: webapps >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Attachments: MAPREDUCE-4571.patch > > > TestHsWebServicesJobs fails on jdk7. > Tests run: 22, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.561 sec > <<< > FAILURE!testJobIdSlash(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs) > Time elapsed: 0.334 sec <<< FAILURE! > java.lang.AssertionError: mapsTotal incorrect expected:<0> but was:<1> -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5072) TestDelegationTokenRenewal.testDTRenewal fails in MR1 on jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5072: -- Resolution: Fixed Fix Version/s: (was: 1.2.0) 1.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Sandy. Committed to branch-1. > TestDelegationTokenRenewal.testDTRenewal fails in MR1 on jdk7 > - > > Key: MAPREDUCE-5072 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5072 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.1.2 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.3.0 > > Attachments: MAPREDUCE-5072.patch > > > TestDelegationTokenRenewal.testDTRenewal fails in MR1 for the reasons that > TestDelegationTokenRenewer.testDTRenewal fails described in YARN-31. The fix > is the same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5070) TestClusterStatus.testClusterMetrics fails on JDK7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603589#comment-13603589 ] Alejandro Abdelnur commented on MAPREDUCE-5070: --- typo in commit message, said '(tucu)' when it should have been '(sandyr via tucu)' > TestClusterStatus.testClusterMetrics fails on JDK7 > -- > > Key: MAPREDUCE-5070 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5070 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.1.2 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.3.0 > > Attachments: MAPREDUCE-5070.patch > > > TestClusterStatus is sensitive to the order that the tests are run in. If > testReservedSlots is called before testClusterMetrics, testClusterMetrics > will fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Deleted] (MAPREDUCE-4716) TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4716: -- Comment: was deleted (was: typo in commit message, said '(tucu)' when it should have been '(sandyr via tucu)', sorry about that.) > TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7 > > > Key: MAPREDUCE-4716 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4716 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Attachments: MAPREDUCE-4716.patch > > > Using jdk7 TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails. > It looks like the string changed from "const class" to "constant" in jdk7. > Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.713 sec > <<< FAILURE! > testJobsQueryStateInvalid(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery) > Time elapsed: 0.371 sec <<< FAILURE! > java.lang.AssertionError: exception message doesn't match, got: No enum > constant org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > expected: No enum const class > org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > at org.junit.Assert.fail(Assert.java:91)at > org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.yarn.webapp.WebServicesTestUtils.checkStringMatch(WebServicesTestUtils.java:77) > at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery.testJobsQueryStateInvalid(TestHsWebServicesJobsQuery.java:286) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4716) TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603585#comment-13603585 ] Alejandro Abdelnur commented on MAPREDUCE-4716: --- typo in commit message, said '(tucu)' when it should have been '(sandyr via tucu)', sorry about that. > TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails with jdk7 > > > Key: MAPREDUCE-4716 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4716 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 0.23.3, 3.0.0, 2.0.2-alpha >Reporter: Thomas Graves >Assignee: Thomas Graves > Labels: java7 > Attachments: MAPREDUCE-4716.patch > > > Using jdk7 TestHsWebServicesJobsQuery.testJobsQueryStateInvalid fails. > It looks like the string changed from "const class" to "constant" in jdk7. > Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 9.713 sec > <<< FAILURE! > testJobsQueryStateInvalid(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery) > Time elapsed: 0.371 sec <<< FAILURE! > java.lang.AssertionError: exception message doesn't match, got: No enum > constant org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > expected: No enum const class > org.apache.hadoop.mapreduce.v2.api.records.JobState.InvalidState > at org.junit.Assert.fail(Assert.java:91)at > org.junit.Assert.assertTrue(Assert.java:43) > at > org.apache.hadoop.yarn.webapp.WebServicesTestUtils.checkStringMatch(WebServicesTestUtils.java:77) > at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery.testJobsQueryStateInvalid(TestHsWebServicesJobsQuery.java:286) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5072) TestDelegationTokenRenewal.testDTRenewal fails in MR1 on jdk7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603580#comment-13603580 ] Alejandro Abdelnur commented on MAPREDUCE-5072: --- +1 > TestDelegationTokenRenewal.testDTRenewal fails in MR1 on jdk7 > - > > Key: MAPREDUCE-5072 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5072 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.1.2 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.2.0 > > Attachments: MAPREDUCE-5072.patch > > > TestDelegationTokenRenewal.testDTRenewal fails in MR1 for the reasons that > TestDelegationTokenRenewer.testDTRenewal fails described in YARN-31. The fix > is the same. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5070) TestClusterStatus.testClusterMetrics fails on JDK7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-5070: -- Resolution: Fixed Fix Version/s: 1.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Sandy. Committed to branch-1. > TestClusterStatus.testClusterMetrics fails on JDK7 > -- > > Key: MAPREDUCE-5070 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5070 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.1.2 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.3.0 > > Attachments: MAPREDUCE-5070.patch > > > TestClusterStatus is sensitive to the order that the tests are run in. If > testReservedSlots is called before testClusterMetrics, testClusterMetrics > will fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5070) TestClusterStatus.testClusterMetrics fails on JDK7
[ https://issues.apache.org/jira/browse/MAPREDUCE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603573#comment-13603573 ] Alejandro Abdelnur commented on MAPREDUCE-5070: --- +1 > TestClusterStatus.testClusterMetrics fails on JDK7 > -- > > Key: MAPREDUCE-5070 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5070 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.1.2 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5070.patch > > > TestClusterStatus is sensitive to the order that the tests are run in. If > testReservedSlots is called before testClusterMetrics, testClusterMetrics > will fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-4987: - Attachment: MAPREDUCE-4987.1.patch I'm attaching a patch. This fixes the issue of symlink handling on Windows by copying the files instead of truly symlinking, similar to the approach taken in prior patches like HADOOP-9061. This also fixes the logic for bundling the classpath into a jar manifest by guaranteeing that localized resources get added to the classpath, even if those localized resource don't exist in the container path yet. (The classpath jar must get created before the container launch script runs to symlink or copy files from filecache, so this was a chicken-and-egg problem.) With these changes in place, {{TestMRJobs#testDistributedCache}} passes on Mac and Windows. Here is a summary of the changes in each file: {{FileUtil#createJarWithClassPath}} - Accept environment provided by caller, because YARN will construct an environment different from the current system environment. Provide a way to maintain a classpath entry with a trailing '/' even though the directory doesn't exist, because the container launch script hasn't run yet. {{TestFileUtil#testCreateJarWithClassPath}} - Change test to cover new logic. {{TestMRJobs}} - Initialize {{MiniDFSCluster}} in a @BeforeClass method instead of a static initialization block. This test uses an inner class, {{DistributedCacheChecker}}, as the job's mapper. Since this is an inner class, it has a back-reference to the {{TestMRJobs}} class. This means that the {{TestMRJobs}} static initialization runs for each mapper task in addition to running in the JUnit runner. Therefore, this would start multiple instances of {{MiniDFSCluster}} pointing at the same directories, which would sometimes cause deadlocks. Moving the initialization to a @BeforeClass method prevents it from running in the mappers. I also needed to add a special check that a path is a symlinked directory, because {{FileUtils#isSymlink}} does not work as expected on Windows. {{ContainerLaunch}} - Copy files instead of symlinking on Windows. Guarantee that localized resources get added to the classpath correctly, even if the paths do not exist yet. > TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior > of symlinks > --- > > Key: MAPREDUCE-4987 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, nodemanager >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-4987.1.patch > > > On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while > checking the length of a symlink. It expects to see the length of the target > of the symlink, but Java 6 on Windows always reports that a symlink has > length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4987) TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior of symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-4987: - Status: Patch Available (was: Open) > TestMRJobs#testDistributedCache fails on Windows due to unexpected behavior > of symlinks > --- > > Key: MAPREDUCE-4987 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4987 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, nodemanager >Affects Versions: 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-4987.1.patch > > > On Windows, {{TestMRJobs#testDistributedCache}} fails on an assertion while > checking the length of a symlink. It expects to see the length of the target > of the symlink, but Java 6 on Windows always reports that a symlink has > length 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603502#comment-13603502 ] Kihwal Lee commented on MAPREDUCE-5065: --- Filed HDFS-4605 for block-size independent FileChecksum in HDFS. > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603494#comment-13603494 ] Kihwal Lee commented on MAPREDUCE-5065: --- bq. So I don't think there won't be any significant changes in performance or overhead. Sorry, unintended double negation. > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603493#comment-13603493 ] Kihwal Lee commented on MAPREDUCE-5065: --- bq. Another option might be to implement a checksum that's blocksize-independent... Reading whole metadata may be too much, especially for huge files. It will be better if we make computation happen where the data is. :) Most hashing is incremental, so if DFSClient feeds the last state of hash into the next datanode and let it continue updating it, the result will be independent of block size. The current way of doing file checksum allows calculating individual block checksums in parallel, but we are not taking advantage of it in DFSClient anyway. So I don't think there won't be any significant changes in performance or overhead. We should probably continue this discussion in a separate jira. > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5069) add concrete common implementations of CombineFileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603476#comment-13603476 ] Robert Joseph Evans commented on MAPREDUCE-5069: +1 on the idea too. > add concrete common implementations of CombineFileInputFormat > - > > Key: MAPREDUCE-5069 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5069 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2 >Affects Versions: 2.0.3-alpha >Reporter: Sangjin Lee >Priority: Minor > > CombineFileInputFormat is abstract, and its specific equivalents to > TextInputFormat, SequenceFileInputFormat, etc. are currently not in the > hadoop code base. > These sound like very common need wherever CombineFileInputFormat is used, > and different folks would write the same code over and over to achieve the > same goal. It sounds very natural for hadoop to provide at least the text and > sequence file implementations of the CombineFileInputFormat class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated MAPREDUCE-5065: Attachment: (was: MAPREDUCE-5065.branch-0.23.patch) > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated MAPREDUCE-5065: Attachment: (was: MAPREDUCE-5065.branch-2.patch) > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603458#comment-13603458 ] Karthik Kambatla commented on MAPREDUCE-5028: - [~chris.douglas], will you be able to take a look at the latest patch? Your valuable insights from MAPREDUCE-64 experience will surely help. > Maps fail when io.sort.mb is set to high value > -- > > Key: MAPREDUCE-5028 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 1.2.0 > > Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, > mr-5028-branch1.patch, mr-5028-trunk.patch > > > Verified the problem exists on branch-1 with the following configuration: > Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, > io.sort.mb=1280, dfs.block.size=2147483648 > Run teragen to generate 4 GB data > Maps fail when you run wordcount on this configuration with the following > error: > {noformat} > java.io.IOException: Spill failed > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45) > at > org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) > at > org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116) > at > org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) > at > org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated MAPREDUCE-5065: Status: Open (was: Patch Available) Sorry it took so long, but I think I see your argument now, Doug. We'd rather have the false positive (and re-run), rather than silently skip CRC-checks and risk a bad data-copy. Making -pb default is probably still a bad thing (because there'd be no option *not* to preserve block-size). And the cost of the re-run can be mitigated with -update. I'll change the patches. > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 0.23.5, 2.0.3-alpha >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: MAPREDUCE-5065.branch-0.23.patch, > MAPREDUCE-5065.branch-2.patch > > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603437#comment-13603437 ] Hadoop QA commented on MAPREDUCE-5065: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573882/MAPREDUCE-5065.branch-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-distcp. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3419//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3419//console This message is automatically generated. > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: MAPREDUCE-5065.branch-0.23.patch, > MAPREDUCE-5065.branch-2.patch > > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated MAPREDUCE-5065: Attachment: MAPREDUCE-5065.branch-2.patch MAPREDUCE-5065.branch-0.23.patch Updated patches so that post-copy checksum comparisons are dropped with -skipCrc, on Hadoop-0.23. This brings 0.23 implementation to parity with 2.0/. > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: MAPREDUCE-5065.branch-0.23.patch, > MAPREDUCE-5065.branch-2.patch > > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated MAPREDUCE-5065: Attachment: (was: MAPREDUCE-5065.branch2.patch) > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated MAPREDUCE-5065: Attachment: (was: MAPREDUCE-5065.branch23.patch) > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603418#comment-13603418 ] Mithun Radhakrishnan commented on MAPREDUCE-5065: - I'm with you on the need for a blocksize-independent checksum. I wasn't convinced that combining CRC32-checksums together to form a higher-level checksum could be correct. (Thanks for the explanation.) {quote} instruct her to run with -pb, not -skipCrc. {quote} Yep, that should take care of #2 (above), but not #1. The user will still need to fail first and rerun, because she's unlikely to know that some of her source-files might have non-default block-sizes. Unless the checksum calculation is fixed (or -pb is default), I don't think DistCp should enforce a check that's a guaranteed failure, under unforeseeable circumstances. > DistCp should skip checksum comparisons if block-sizes are different on > source/target. > -- > > Key: MAPREDUCE-5065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: MAPREDUCE-5065.branch23.patch, > MAPREDUCE-5065.branch2.patch > > > When copying files between 2 clusters with different default block-sizes, one > sees that the copy fails with a checksum-mismatch, even though the files have > identical contents. > The reason is that on HDFS, a file's checksum is unfortunately a function of > the block-size of the file. So you could have 2 different files with > identical contents (but different block-sizes) have different checksums. > (Thus, it's also possible for DistCp to fail to copy files on the same > file-system, if the source-file's block-size differs from HDFS default, and > -pb isn't used.) > I propose that we skip checksum comparisons under the following conditions: > 1. -skipCrc is specified. > 2. File-size is 0 (in which case the call to the checksum-servlet is moot). > 3. source.getBlockSize() != target.getBlockSize(), since the checksums are > guaranteed to differ in this case. > I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-4978: --- Fix Version/s: 1.2.0 > Add a updateJobWithSplit() method for new-api job > - > > Key: MAPREDUCE-4978 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.1.2 >Reporter: Liyin Liang >Assignee: Liyin Liang > Fix For: 1.2.0 > > Attachments: 4978-1.diff > > > HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api > job. It's better to add another method for new-api job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-4978: --- Affects Version/s: (was: 1.1.1) 1.1.2 > Add a updateJobWithSplit() method for new-api job > - > > Key: MAPREDUCE-4978 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.1.2 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: 4978-1.diff > > > HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api > job. It's better to add another method for new-api job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5068) Fair Scheduler preemption fails if the other queue has a mapreduce job with some tasks in excess of cluster capacity
[ https://issues.apache.org/jira/browse/MAPREDUCE-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitaly Kruglikov updated MAPREDUCE-5068: Description: This is reliably reproduced while running CDH4.1.2 or CDH4.2.0 on a single Mac OS X machine. # Two queues are being configured: cjmQ and slotsQ. Both queues are configured with tiny minResources. The intention is for the task(s) of the job in cjmQ to be able to preempt tasks of the job in slotsQ. # yarn.nodemanager.resource.memory-mb = 24576 # First, a long-running 6-map-task (0 reducers) mapreduce job is started in slotsQ with mapreduce.map.memory.mb=4096. Because MRAppMaster's container consumes some memory, only 5 of its 6 map tasks are able to start, and the 6th is pending, but will never run. # Then, a short-running 1-map-task (0 reducers) mapreduce job is submitted via cjmQ with mapreduce.map.memory.mb=2048. Expected behavior: At this point, because the minimum share of cjmQ has not been met, I expected Fair Scheduler to preempt one of the executing map tasks from the single slotsQ mapreduce job to make room for the single map tasks of the cjmQ mapreduce job. However, Fair Scheduler didn't preempt any of the running map tasks of the slotsQ job. Instead, the cjmQ job was being starved perpetually. Since slotsQ had far more than its minimum share allocated to it and already running, while cjmQ was far below its minimum share (0 actually), Fair Scheduler should have started preempting, regardless of there being one task container from the slotsQ job (the 6th map container) that was not being allocated. Additional useful info: # If I summit a second 1-map-task mapreduce job via cjmQ, the first cjmQ mapreduce job in that Q gets scheduled and its state changes to RUNNING; once that that first job completes, then the second job submitted via cjmQ gets starved until a third job is submitted into cjmQ, and so on. This happens regardless of the values of maxRunningApps in the queue configurations. # If, instead of requesting 6 map tasks for the slotsQ job, I only request 5 so that everything fits nicely into yarn.nodemanager.resource.memory-mb - without that 6th pending, but not running task - then preemption works as I would have expected. However, I cannot rely on this arrangement because in a production cluster that is running at full capacity, if a machine dies, the mapreduce job from slotsQ will request new containers for the failed tasks and because the cluster was already at capacity, those containers will end up as pending and will never run, recreating my original scenario of the starving cjmQ job. # I initially wrote this up on https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/0zv62pkN5lM, so it would be good to update that group with the resolution. Configuration: In yarn-site.xml: {code} Scheduler plug-in class to use instead of the default scheduler. yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler {code} fair-scheduler.xml: {code} Absolute path to allocation file. An allocation file is an XML manifest describing queues and their properties, in addition to certain policy defaults. This file must be in XML format as described in http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. yarn.scheduler.fair.allocation.file [obfuscated]/current/conf/site/default/hadoop/fair-scheduler-allocations.xml Whether to use preemption. Note that preemption is experimental in the current version. Defaults to false. yarn.scheduler.fair.preemption true Whether to allow multiple container assignments in one heartbeat. Defaults to false. yarn.scheduler.fair.assignmultiple true {code} My fair-scheduler-allocations.xml: {code} 2048 1 fifo 5 1.0 1 1 5 1.0 5 {code} was: This is reliably reproduced while running CDH4.1.2 on a single Mac OS X machine. # Two queues are being configured: cjmQ and slotsQ. Both queues are configured with tiny minResources. The intention is for the task(s) of the job in cjmQ to be able to preempt tasks of the job in slotsQ. # yarn.nodemanager.resource.memory-mb = 24576 # First, a long-running 6-map-task (0 reducers) mapreduce job is started in slotsQ with mapreduce.map.memory.mb=4096. Because MRAppMaster's container consumes some memory, only 5 of its 6 map tasks are able to start, and the 6th is pending, but will never run. # Then, a short-running 1-map-task (0 reducers) mapreduce job is submitted via cjmQ with mapreduce.map.memory.mb=2048. Expected behavior: At this point, because the minimum share of cjmQ has not been met, I expected Fair Scheduler to preempt one of the execut
[jira] [Commented] (MAPREDUCE-5068) Fair Scheduler preemption fails if the other queue has a mapreduce job with some tasks in excess of cluster capacity
[ https://issues.apache.org/jira/browse/MAPREDUCE-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603245#comment-13603245 ] Vitaly Kruglikov commented on MAPREDUCE-5068: - Per Sandy's recommendation, I opened the CDH JIRA https://issues.cloudera.org/browse/DISTRO-466 > Fair Scheduler preemption fails if the other queue has a mapreduce job with > some tasks in excess of cluster capacity > > > Key: MAPREDUCE-5068 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5068 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, scheduler > Environment: Mac OS X; CDH4.1.2; CDH4.2.0 >Reporter: Vitaly Kruglikov > Labels: hadoop > > This is reliably reproduced while running CDH4.1.2 on a single Mac OS X > machine. > # Two queues are being configured: cjmQ and slotsQ. Both queues are > configured with tiny minResources. The intention is for the task(s) of the > job in cjmQ to be able to preempt tasks of the job in slotsQ. > # yarn.nodemanager.resource.memory-mb = 24576 > # First, a long-running 6-map-task (0 reducers) mapreduce job is started in > slotsQ with mapreduce.map.memory.mb=4096. Because MRAppMaster's container > consumes some memory, only 5 of its 6 map tasks are able to start, and the > 6th is pending, but will never run. > # Then, a short-running 1-map-task (0 reducers) mapreduce job is submitted > via cjmQ with mapreduce.map.memory.mb=2048. > Expected behavior: > At this point, because the minimum share of cjmQ has not been met, I expected > Fair Scheduler to preempt one of the executing map tasks from the single > slotsQ mapreduce job to make room for the single map tasks of the cjmQ > mapreduce job. However, Fair Scheduler didn't preempt any of the running map > tasks of the slotsQ job. Instead, the cjmQ job was being starved perpetually. > Since slotsQ had far more than its minimum share allocated to it and already > running, while cjmQ was far below its minimum share (0 actually), Fair > Scheduler should have started preempting, regardless of there being one task > container from the slotsQ job (the 6th map container) that was not being > allocated. > Additional useful info: > # If I summit a second 1-map-task mapreduce job via cjmQ, the first cjmQ > mapreduce job in that Q gets scheduled and its state changes to RUNNING; once > that that first job completes, then the second job submitted via cjmQ gets > starved until a third job is submitted into cjmQ, and so on. This happens > regardless of the values of maxRunningApps in the queue configurations. > # If, instead of requesting 6 map tasks for the slotsQ job, I only request 5 > so that everything fits nicely into yarn.nodemanager.resource.memory-mb - > without that 6th pending, but not running task - then preemption works as I > would have expected. However, I cannot rely on this arrangement because in a > production cluster that is running at full capacity, if a machine dies, the > mapreduce job from slotsQ will request new containers for the failed tasks > and because the cluster was already at capacity, those containers will end up > as pending and will never run, recreating my original scenario of the > starving cjmQ job. > # I initially wrote this up on > https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/0zv62pkN5lM, > so it would be good to update that group with the resolution. > Configuration: > In yarn-site.xml: > {code} > > Scheduler plug-in class to use instead of the default > scheduler. > yarn.resourcemanager.scheduler.class > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > > {code} > fair-scheduler.xml: > {code} > > > > Absolute path to allocation file. An allocation file is an > XML > manifest describing queues and their properties, in addition to certain > policy defaults. This file must be in XML format as described in > > http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. > > yarn.scheduler.fair.allocation.file > > [obfuscated]/current/conf/site/default/hadoop/fair-scheduler-allocations.xml > > > Whether to use preemption. Note that preemption is > experimental > in the current version. Defaults to false. > yarn.scheduler.fair.preemption > true > > > Whether to allow multiple container assignments in one > heartbeat. Defaults to false. > yarn.scheduler.fair.assignmultiple > true > > > > {code} > My fair-scheduler-allocations.xml: > {code} > > > > 2048 > > 1 > > > fifo > > 5 > > 1.0 > > > > 1 >
[jira] [Updated] (MAPREDUCE-5068) Fair Scheduler preemption fails if the other queue has a mapreduce job with some tasks in excess of cluster capacity
[ https://issues.apache.org/jira/browse/MAPREDUCE-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitaly Kruglikov updated MAPREDUCE-5068: Environment: Mac OS X; CDH4.1.2; CDH4.2.0 (was: Mac OS X; CDH4.1.2) > Fair Scheduler preemption fails if the other queue has a mapreduce job with > some tasks in excess of cluster capacity > > > Key: MAPREDUCE-5068 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5068 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, scheduler > Environment: Mac OS X; CDH4.1.2; CDH4.2.0 >Reporter: Vitaly Kruglikov > Labels: hadoop > > This is reliably reproduced while running CDH4.1.2 on a single Mac OS X > machine. > # Two queues are being configured: cjmQ and slotsQ. Both queues are > configured with tiny minResources. The intention is for the task(s) of the > job in cjmQ to be able to preempt tasks of the job in slotsQ. > # yarn.nodemanager.resource.memory-mb = 24576 > # First, a long-running 6-map-task (0 reducers) mapreduce job is started in > slotsQ with mapreduce.map.memory.mb=4096. Because MRAppMaster's container > consumes some memory, only 5 of its 6 map tasks are able to start, and the > 6th is pending, but will never run. > # Then, a short-running 1-map-task (0 reducers) mapreduce job is submitted > via cjmQ with mapreduce.map.memory.mb=2048. > Expected behavior: > At this point, because the minimum share of cjmQ has not been met, I expected > Fair Scheduler to preempt one of the executing map tasks from the single > slotsQ mapreduce job to make room for the single map tasks of the cjmQ > mapreduce job. However, Fair Scheduler didn't preempt any of the running map > tasks of the slotsQ job. Instead, the cjmQ job was being starved perpetually. > Since slotsQ had far more than its minimum share allocated to it and already > running, while cjmQ was far below its minimum share (0 actually), Fair > Scheduler should have started preempting, regardless of there being one task > container from the slotsQ job (the 6th map container) that was not being > allocated. > Additional useful info: > # If I summit a second 1-map-task mapreduce job via cjmQ, the first cjmQ > mapreduce job in that Q gets scheduled and its state changes to RUNNING; once > that that first job completes, then the second job submitted via cjmQ gets > starved until a third job is submitted into cjmQ, and so on. This happens > regardless of the values of maxRunningApps in the queue configurations. > # If, instead of requesting 6 map tasks for the slotsQ job, I only request 5 > so that everything fits nicely into yarn.nodemanager.resource.memory-mb - > without that 6th pending, but not running task - then preemption works as I > would have expected. However, I cannot rely on this arrangement because in a > production cluster that is running at full capacity, if a machine dies, the > mapreduce job from slotsQ will request new containers for the failed tasks > and because the cluster was already at capacity, those containers will end up > as pending and will never run, recreating my original scenario of the > starving cjmQ job. > # I initially wrote this up on > https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/0zv62pkN5lM, > so it would be good to update that group with the resolution. > Configuration: > In yarn-site.xml: > {code} > > Scheduler plug-in class to use instead of the default > scheduler. > yarn.resourcemanager.scheduler.class > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > > {code} > fair-scheduler.xml: > {code} > > > > Absolute path to allocation file. An allocation file is an > XML > manifest describing queues and their properties, in addition to certain > policy defaults. This file must be in XML format as described in > > http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. > > yarn.scheduler.fair.allocation.file > > [obfuscated]/current/conf/site/default/hadoop/fair-scheduler-allocations.xml > > > Whether to use preemption. Note that preemption is > experimental > in the current version. Defaults to false. > yarn.scheduler.fair.preemption > true > > > Whether to allow multiple container assignments in one > heartbeat. Defaults to false. > yarn.scheduler.fair.assignmultiple > true > > > > {code} > My fair-scheduler-allocations.xml: > {code} > > > > 2048 > > 1 > > > fifo > > 5 > > 1.0 > > > > 1 > > 1 > > 5 > > 1.0 > > > > 5 > > {code} -- This message is
[jira] [Commented] (MAPREDUCE-5068) Fair Scheduler preemption fails if the other queue has a mapreduce job with some tasks in excess of cluster capacity
[ https://issues.apache.org/jira/browse/MAPREDUCE-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603239#comment-13603239 ] Vitaly Kruglikov commented on MAPREDUCE-5068: - [~sandyr]: Confirmed -- the same issue is also present in the latest hadoop CDH4.2.0 distribution. > Fair Scheduler preemption fails if the other queue has a mapreduce job with > some tasks in excess of cluster capacity > > > Key: MAPREDUCE-5068 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5068 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, scheduler > Environment: Mac OS X; CDH4.1.2 >Reporter: Vitaly Kruglikov > Labels: hadoop > > This is reliably reproduced while running CDH4.1.2 on a single Mac OS X > machine. > # Two queues are being configured: cjmQ and slotsQ. Both queues are > configured with tiny minResources. The intention is for the task(s) of the > job in cjmQ to be able to preempt tasks of the job in slotsQ. > # yarn.nodemanager.resource.memory-mb = 24576 > # First, a long-running 6-map-task (0 reducers) mapreduce job is started in > slotsQ with mapreduce.map.memory.mb=4096. Because MRAppMaster's container > consumes some memory, only 5 of its 6 map tasks are able to start, and the > 6th is pending, but will never run. > # Then, a short-running 1-map-task (0 reducers) mapreduce job is submitted > via cjmQ with mapreduce.map.memory.mb=2048. > Expected behavior: > At this point, because the minimum share of cjmQ has not been met, I expected > Fair Scheduler to preempt one of the executing map tasks from the single > slotsQ mapreduce job to make room for the single map tasks of the cjmQ > mapreduce job. However, Fair Scheduler didn't preempt any of the running map > tasks of the slotsQ job. Instead, the cjmQ job was being starved perpetually. > Since slotsQ had far more than its minimum share allocated to it and already > running, while cjmQ was far below its minimum share (0 actually), Fair > Scheduler should have started preempting, regardless of there being one task > container from the slotsQ job (the 6th map container) that was not being > allocated. > Additional useful info: > # If I summit a second 1-map-task mapreduce job via cjmQ, the first cjmQ > mapreduce job in that Q gets scheduled and its state changes to RUNNING; once > that that first job completes, then the second job submitted via cjmQ gets > starved until a third job is submitted into cjmQ, and so on. This happens > regardless of the values of maxRunningApps in the queue configurations. > # If, instead of requesting 6 map tasks for the slotsQ job, I only request 5 > so that everything fits nicely into yarn.nodemanager.resource.memory-mb - > without that 6th pending, but not running task - then preemption works as I > would have expected. However, I cannot rely on this arrangement because in a > production cluster that is running at full capacity, if a machine dies, the > mapreduce job from slotsQ will request new containers for the failed tasks > and because the cluster was already at capacity, those containers will end up > as pending and will never run, recreating my original scenario of the > starving cjmQ job. > # I initially wrote this up on > https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/0zv62pkN5lM, > so it would be good to update that group with the resolution. > Configuration: > In yarn-site.xml: > {code} > > Scheduler plug-in class to use instead of the default > scheduler. > yarn.resourcemanager.scheduler.class > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > > {code} > fair-scheduler.xml: > {code} > > > > Absolute path to allocation file. An allocation file is an > XML > manifest describing queues and their properties, in addition to certain > policy defaults. This file must be in XML format as described in > > http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html. > > yarn.scheduler.fair.allocation.file > > [obfuscated]/current/conf/site/default/hadoop/fair-scheduler-allocations.xml > > > Whether to use preemption. Note that preemption is > experimental > in the current version. Defaults to false. > yarn.scheduler.fair.preemption > true > > > Whether to allow multiple container assignments in one > heartbeat. Defaults to false. > yarn.scheduler.fair.assignmultiple > true > > > > {code} > My fair-scheduler-allocations.xml: > {code} > > > > 2048 > > 1 > > > fifo > > 5 > > 1.0 > > > > 1 > >
[jira] [Commented] (MAPREDUCE-5062) MR AM should read max-retries information from the RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603193#comment-13603193 ] Hadoop QA commented on MAPREDUCE-5062: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573838/MAPREDUCE-5062.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3418//console This message is automatically generated. > MR AM should read max-retries information from the RM > - > > Key: MAPREDUCE-5062 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5062 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: MAPREDUCE-5062.1.patch > > > Change MR AM to use app-retry maximum limit that is made available by RM > after YARN-378. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira