[jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy
[ https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493814#comment-13493814 ] Todd Lipcon commented on MAPREDUCE-4469: The problem with the getrusage approach is that it only includes terminated children, which means it doesn't track usage as the process progresses. That said, maybe we really don't care, and we should just tally our own resource usage and then at the time of cleanup, add the children? Resource calculation in child tasks is CPU-heavy Key: MAPREDUCE-4469 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 1.0.3 Reporter: Todd Lipcon Assignee: Ahmed Radwan Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage. As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted
[ https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493899#comment-13493899 ] Hudson commented on MAPREDUCE-4772: --- Integrated in Hadoop-Yarn-trunk #31 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/31/]) MAPREDUCE-4772. Fetch failures can take way too long for a map to be restarted (bobby) (Revision 1407118) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407118 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java Fetch failures can take way too long for a map to be restarted -- Key: MAPREDUCE-4772 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt In one particular case we saw a NM go down at just the right time, that most of the reducers got the output of the map tasks, but not all of them. The ones that failed to get the output reported to the AM rather quickly that they could not fetch from the NM, but because the other reducers were still running the AM would not relaunch the map task because there weren't more than 50% of the running reducers that had reported fetch failures. Then because of the exponential back-off for fetches on the reducers it took until 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and report in again. At that point the other reducers had finished and the job relaunched the map task. If the reducers had still been running at 1:45 I have no idea how long it would have taken for each of the tasks to get to 30 fetch failures. We need to trigger the map based off of percentage of reducers shuffling, not percentage of reducers running, we also need to have a maximum limit of the back off, so that we don't ever have the reducer waiting for days to try and fetch map output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4764) repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493959#comment-13493959 ] Ivan A. Veselovsky commented on MAPREDUCE-4764: --- Hi, Daryn, I'd like to clarify our plan of improvements in this test. Currently the test writes the token into a file, then sets the file name as MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY value in the config, and also passes the same file name as a value of a dedicated config property (KEY_SECURITY_TOKEN). In the job: it gets the tokens from the job context (context.getCredentials().getAllTokens()), and gets the delegation token from there by the known key: let it be token X. After that it gets the binary file name from the job config (key KEY_SECURITY_TOKEN), reads the file, de-serializing the token: let it be token Y. Then the job asserts X.equals(Y). This way the binary token propagation and serialization/de-serialization is checked, and this pretty much corresponds to the test name. As I understand, you suggested to check also that the same delegation token is present in UserGroupInformation.getCurrentUser().getTokens(), right? So, If I add this check, will you be okay with that test? Or, do you have other suggestions on how to improve it? repair test org.apache.hadoop.mapreduce.security.TestBinaryTokenFile Key: MAPREDUCE-4764 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4764 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ivan A. Veselovsky Attachments: MAPREDUCE-4764-trunk.patch the test is @Ignore-ed, and fails being enabled. Suggested to repair it to fill the coverage gap. Problems fixed in the test: (1) MRConfig.FRAMEWORK_NAME and YarnConfiguration.RM_PRINCIPAL properties must be correctly set in the configuration to correctly enable the security in the way this test implies. (2) The property MRJobConfig.MAPREDUCE_JOB_CREDENTIALS_BINARY now is not passed into the Job configuration -- it is intentionally deleted from there. So, we pass the binary file name in another dedicated property. (3) The test was using deprecated cluster classes. All them are updated to the modern analogs. (4) The delegation token found in the job context is now correctly compared to the one deserialized from the binary file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted
[ https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493972#comment-13493972 ] Hudson commented on MAPREDUCE-4772: --- Integrated in Hadoop-Hdfs-0.23-Build #430 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/430/]) svn merge -c 1407118 FIXES: MAPREDUCE-4772. Fetch failures can take way too long for a map to be restarted (bobby) (Revision 1407128) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407128 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java Fetch failures can take way too long for a map to be restarted -- Key: MAPREDUCE-4772 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt In one particular case we saw a NM go down at just the right time, that most of the reducers got the output of the map tasks, but not all of them. The ones that failed to get the output reported to the AM rather quickly that they could not fetch from the NM, but because the other reducers were still running the AM would not relaunch the map task because there weren't more than 50% of the running reducers that had reported fetch failures. Then because of the exponential back-off for fetches on the reducers it took until 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and report in again. At that point the other reducers had finished and the job relaunched the map task. If the reducers had still been running at 1:45 I have no idea how long it would have taken for each of the tasks to get to 30 fetch failures. We need to trigger the map based off of percentage of reducers shuffling, not percentage of reducers running, we also need to have a maximum limit of the back off, so that we don't ever have the reducer waiting for days to try and fetch map output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted
[ https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493981#comment-13493981 ] Hudson commented on MAPREDUCE-4772: --- Integrated in Hadoop-Hdfs-trunk #1221 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1221/]) MAPREDUCE-4772. Fetch failures can take way too long for a map to be restarted (bobby) (Revision 1407118) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407118 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java Fetch failures can take way too long for a map to be restarted -- Key: MAPREDUCE-4772 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt In one particular case we saw a NM go down at just the right time, that most of the reducers got the output of the map tasks, but not all of them. The ones that failed to get the output reported to the AM rather quickly that they could not fetch from the NM, but because the other reducers were still running the AM would not relaunch the map task because there weren't more than 50% of the running reducers that had reported fetch failures. Then because of the exponential back-off for fetches on the reducers it took until 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and report in again. At that point the other reducers had finished and the job relaunched the map task. If the reducers had still been running at 1:45 I have no idea how long it would have taken for each of the tasks to get to 30 fetch failures. We need to trigger the map based off of percentage of reducers shuffling, not percentage of reducers running, we also need to have a maximum limit of the back off, so that we don't ever have the reducer waiting for days to try and fetch map output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted
[ https://issues.apache.org/jira/browse/MAPREDUCE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494006#comment-13494006 ] Hudson commented on MAPREDUCE-4772: --- Integrated in Hadoop-Mapreduce-trunk #1251 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1251/]) MAPREDUCE-4772. Fetch failures can take way too long for a map to be restarted (bobby) (Revision 1407118) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407118 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestFetchFailure.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleScheduler.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java Fetch failures can take way too long for a map to be restarted -- Key: MAPREDUCE-4772 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MR-4772-0.23.txt, MR-4772-trunk.txt In one particular case we saw a NM go down at just the right time, that most of the reducers got the output of the map tasks, but not all of them. The ones that failed to get the output reported to the AM rather quickly that they could not fetch from the NM, but because the other reducers were still running the AM would not relaunch the map task because there weren't more than 50% of the running reducers that had reported fetch failures. Then because of the exponential back-off for fetches on the reducers it took until 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and report in again. At that point the other reducers had finished and the job relaunched the map task. If the reducers had still been running at 1:45 I have no idea how long it would have taken for each of the tasks to get to 30 fetch failures. We need to trigger the map based off of percentage of reducers shuffling, not percentage of reducers running, we also need to have a maximum limit of the back off, so that we don't ever have the reducer waiting for days to try and fetch map output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned MAPREDUCE-4782: Assignee: Mark Fuhs NLineInputFormat skips first line of last InputSplit Key: MAPREDUCE-4782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk Environment: job.setMapperClass(Mapper.class); // just pass text lines through to output job.setInputFormatClass(NLineInputFormat.class); NLineInputFormat.setNumLinesPerSplit(job, 100); NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt); Reporter: Mark Fuhs Assignee: Mark Fuhs Priority: Critical Attachments: MAPREDUCE-4782.patch, MR-4782.txt NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest. After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created. This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4782: --- Attachment: MR-4782-branch-1.txt Patch for branch-1. The patch is identical to the one for trunk except for line numbers and the location of the files. NLineInputFormat skips first line of last InputSplit Key: MAPREDUCE-4782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk Environment: job.setMapperClass(Mapper.class); // just pass text lines through to output job.setInputFormatClass(NLineInputFormat.class); NLineInputFormat.setNumLinesPerSplit(job, 100); NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt); Reporter: Mark Fuhs Assignee: Mark Fuhs Priority: Critical Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest. After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created. This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4782: --- Priority: Blocker (was: Critical) NLineInputFormat skips first line of last InputSplit Key: MAPREDUCE-4782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk Environment: job.setMapperClass(Mapper.class); // just pass text lines through to output job.setInputFormatClass(NLineInputFormat.class); NLineInputFormat.setNumLinesPerSplit(job, 100); NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt); Reporter: Mark Fuhs Assignee: Mark Fuhs Priority: Blocker Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest. After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created. This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494050#comment-13494050 ] Robert Joseph Evans commented on MAPREDUCE-4782: Also now that I think about it more this really is a Blocker, not a critical. NLineInputFormat skips first line of last InputSplit Key: MAPREDUCE-4782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk Environment: job.setMapperClass(Mapper.class); // just pass text lines through to output job.setInputFormatClass(NLineInputFormat.class); NLineInputFormat.setNumLinesPerSplit(job, 100); NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt); Reporter: Mark Fuhs Assignee: Mark Fuhs Priority: Blocker Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest. After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created. This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494052#comment-13494052 ] Hadoop QA commented on MAPREDUCE-4782: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552844/MR-4782-branch-1.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3003//console This message is automatically generated. NLineInputFormat skips first line of last InputSplit Key: MAPREDUCE-4782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk Environment: job.setMapperClass(Mapper.class); // just pass text lines through to output job.setInputFormatClass(NLineInputFormat.class); NLineInputFormat.setNumLinesPerSplit(job, 100); NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt); Reporter: Mark Fuhs Assignee: Mark Fuhs Priority: Blocker Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest. After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created. This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494059#comment-13494059 ] Jason Lowe commented on MAPREDUCE-4782: --- +1, thanks Mark and Bobby. Bobby or Matt, feel free to commit. NLineInputFormat skips first line of last InputSplit Key: MAPREDUCE-4782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk Environment: job.setMapperClass(Mapper.class); // just pass text lines through to output job.setInputFormatClass(NLineInputFormat.class); NLineInputFormat.setNumLinesPerSplit(job, 100); NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt); Reporter: Mark Fuhs Assignee: Mark Fuhs Priority: Blocker Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest. After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created. This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4266) remove Ant remnants from MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated MAPREDUCE-4266: - Status: Patch Available (was: Open) remove Ant remnants from MR --- Key: MAPREDUCE-4266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266 Project: Hadoop Map/Reduce Issue Type: Task Components: build Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh Remove: hadoop-mapreduce-project/src/* hadoop-mapreduce-project/ivy/* hadoop-mapreduce-project/build.xml hadoop-mapreduce-project/ivy.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4266) remove Ant remnants from MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated MAPREDUCE-4266: - Attachment: MAPREDUCE-4266.sh shell script to remove the directories and xml files. You run it like ./MAPREDUCE-4266.sh svn. remove Ant remnants from MR --- Key: MAPREDUCE-4266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266 Project: Hadoop Map/Reduce Issue Type: Task Components: build Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh Remove: hadoop-mapreduce-project/src/* hadoop-mapreduce-project/ivy/* hadoop-mapreduce-project/build.xml hadoop-mapreduce-project/ivy.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494062#comment-13494062 ] Hadoop QA commented on MAPREDUCE-4266: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552845/MAPREDUCE-4266.sh against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3004//console This message is automatically generated. remove Ant remnants from MR --- Key: MAPREDUCE-4266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266 Project: Hadoop Map/Reduce Issue Type: Task Components: build Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh Remove: hadoop-mapreduce-project/src/* hadoop-mapreduce-project/ivy/* hadoop-mapreduce-project/build.xml hadoop-mapreduce-project/ivy.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4782: --- Resolution: Fixed Fix Version/s: 0.23.5 2.0.3-alpha 3.0.0 1.2.0 1.1.1 Status: Resolved (was: Patch Available) Thanks Mark, This is a great catch, I just wish we had found it sooner. I put this into trunk, branch-2, branch-0.23, branch-1, and branch-1.1. If I missed any branches that people want it in please let me know and I will see what I can do. NLineInputFormat skips first line of last InputSplit Key: MAPREDUCE-4782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk Environment: job.setMapperClass(Mapper.class); // just pass text lines through to output job.setInputFormatClass(NLineInputFormat.class); NLineInputFormat.setNumLinesPerSplit(job, 100); NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt); Reporter: Mark Fuhs Assignee: Mark Fuhs Priority: Blocker Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest. After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created. This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494080#comment-13494080 ] Hudson commented on MAPREDUCE-4782: --- Integrated in Hadoop-trunk-Commit #2988 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2988/]) MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407505 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java NLineInputFormat skips first line of last InputSplit Key: MAPREDUCE-4782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk Environment: job.setMapperClass(Mapper.class); // just pass text lines through to output job.setInputFormatClass(NLineInputFormat.class); NLineInputFormat.setNumLinesPerSplit(job, 100); NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt); Reporter: Mark Fuhs Assignee: Mark Fuhs Priority: Blocker Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest. After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created. This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494115#comment-13494115 ] Mark Fuhs commented on MAPREDUCE-4782: -- I'm glad I could contribute! NLineInputFormat skips first line of last InputSplit Key: MAPREDUCE-4782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk Environment: job.setMapperClass(Mapper.class); // just pass text lines through to output job.setInputFormatClass(NLineInputFormat.class); NLineInputFormat.setNumLinesPerSplit(job, 100); NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt); Reporter: Mark Fuhs Assignee: Mark Fuhs Priority: Blocker Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest. After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created. This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494139#comment-13494139 ] Robert Joseph Evans commented on MAPREDUCE-4266: The shell script looks good and does what we want. +1. I'll check this in. I'll also take a look at Jenkins to see if there are any builds still calling into ant for trunk. remove Ant remnants from MR --- Key: MAPREDUCE-4266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266 Project: Hadoop Map/Reduce Issue Type: Task Components: build Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh Remove: hadoop-mapreduce-project/src/* hadoop-mapreduce-project/ivy/* hadoop-mapreduce-project/build.xml hadoop-mapreduce-project/ivy.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494141#comment-13494141 ] Robert Joseph Evans commented on MAPREDUCE-4266: Oh and I'll also update the build/release instructions on twiki to remove ant :) remove Ant remnants from MR --- Key: MAPREDUCE-4266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266 Project: Hadoop Map/Reduce Issue Type: Task Components: build Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh Remove: hadoop-mapreduce-project/src/* hadoop-mapreduce-project/ivy/* hadoop-mapreduce-project/build.xml hadoop-mapreduce-project/ivy.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494145#comment-13494145 ] Hudson commented on MAPREDUCE-4266: --- Integrated in Hadoop-trunk-Commit #2989 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2989/]) MAPREDUCE-4266. remove Ant remnants from MR (tgraves via bobby) (Revision 1407551) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407551 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/build-utils.xml * /hadoop/common/trunk/hadoop-mapreduce-project/build.xml * /hadoop/common/trunk/hadoop-mapreduce-project/ivy * /hadoop/common/trunk/hadoop-mapreduce-project/ivy.xml * /hadoop/common/trunk/hadoop-mapreduce-project/src remove Ant remnants from MR --- Key: MAPREDUCE-4266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266 Project: Hadoop Map/Reduce Issue Type: Task Components: build Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh Remove: hadoop-mapreduce-project/src/* hadoop-mapreduce-project/ivy/* hadoop-mapreduce-project/build.xml hadoop-mapreduce-project/ivy.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4266) remove Ant remnants from MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494162#comment-13494162 ] Hudson commented on MAPREDUCE-4266: --- Integrated in Hadoop-Mapreduce-trunk #1252 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1252/]) MAPREDUCE-4266. remove Ant remnants from MR (tgraves via bobby) (Revision 1407551) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407551 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/build-utils.xml * /hadoop/common/trunk/hadoop-mapreduce-project/build.xml * /hadoop/common/trunk/hadoop-mapreduce-project/ivy * /hadoop/common/trunk/hadoop-mapreduce-project/ivy.xml * /hadoop/common/trunk/hadoop-mapreduce-project/src remove Ant remnants from MR --- Key: MAPREDUCE-4266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266 Project: Hadoop Map/Reduce Issue Type: Task Components: build Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 2.0.3-alpha Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh Remove: hadoop-mapreduce-project/src/* hadoop-mapreduce-project/ivy/* hadoop-mapreduce-project/build.xml hadoop-mapreduce-project/ivy.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4782) NLineInputFormat skips first line of last InputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494163#comment-13494163 ] Hudson commented on MAPREDUCE-4782: --- Integrated in Hadoop-Mapreduce-trunk #1252 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1252/]) MAPREDUCE-4782. NLineInputFormat skips first line of last InputSplit (Mark Fuhs via bobby) (Revision 1407505) Result = FAILURE bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407505 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestNLineInputFormat.java NLineInputFormat skips first line of last InputSplit Key: MAPREDUCE-4782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4782 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0, 0.23.0, 1.0.0, 2.0.0-alpha, trunk Environment: job.setMapperClass(Mapper.class); // just pass text lines through to output job.setInputFormatClass(NLineInputFormat.class); NLineInputFormat.setNumLinesPerSplit(job, 100); NLineInputFormat.setInputPaths(job, /path/to/a_file_with_many_lines.txt); Reporter: Mark Fuhs Assignee: Mark Fuhs Priority: Blocker Fix For: 1.1.1, 1.2.0, 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4782.patch, MR-4782-branch-1.txt, MR-4782.txt NLineInputFormat creates FileSplits that are then used by LineRecordReader to generate Text values. To deal with an idiosyncrasy of LineRecordReader, the begin and length fields of the FileSplit are constructed differently for the first FileSplit vs. the rest. After looping through all lines of a file, the final FileSplit is created, but the creation does not respect the difference of how the first vs. the rest of the FileSplits are created. This results in the first line of the final InputSplit being skipped. I've created a patch to NLineInputFormat, and this fixes the problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4266) remove Ant remnants from MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4266: --- Resolution: Fixed Fix Version/s: 0.23.5 3.0.0 Status: Resolved (was: Patch Available) Thanks Tom, I put this into trunk, branch-2, and branch-0.23. I also updated Jenkis and wiki. remove Ant remnants from MR --- Key: MAPREDUCE-4266 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4266 Project: Hadoop Map/Reduce Issue Type: Task Components: build Affects Versions: 2.0.0-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4266.patch, MAPREDUCE-4266.sh Remove: hadoop-mapreduce-project/src/* hadoop-mapreduce-project/ivy/* hadoop-mapreduce-project/build.xml hadoop-mapreduce-project/ivy.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-2454: Status: Open (was: Patch Available) Allow external sorter plugin for MR --- Key: MAPREDUCE-2454 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.2-alpha, 2.0.0-alpha, 3.0.0 Reporter: Mariappan Asokan Assignee: Mariappan Asokan Priority: Minor Labels: features, performance, plugin, sort Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java Define interfaces and some abstract classes in the Hadoop framework to facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-2454: Attachment: mapreduce-2454.patch Hi Alejandro, Thanks for catching the unused imports. I updated Fetcher.java. I have also added a test in the latest patch. -- Asokan Allow external sorter plugin for MR --- Key: MAPREDUCE-2454 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha Reporter: Mariappan Asokan Assignee: Mariappan Asokan Priority: Minor Labels: features, performance, plugin, sort Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java Define interfaces and some abstract classes in the Hadoop framework to facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-2454: Status: Patch Available (was: Open) Allow external sorter plugin for MR --- Key: MAPREDUCE-2454 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.2-alpha, 2.0.0-alpha, 3.0.0 Reporter: Mariappan Asokan Assignee: Mariappan Asokan Priority: Minor Labels: features, performance, plugin, sort Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java Define interfaces and some abstract classes in the Hadoop framework to facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494276#comment-13494276 ] Hadoop QA commented on MAPREDUCE-2454: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552875/mapreduce-2454.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.TestJobMonitorAndPrint org.apache.hadoop.mapred.TestClusterMRNotification {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3005//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3005//console This message is automatically generated. Allow external sorter plugin for MR --- Key: MAPREDUCE-2454 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha Reporter: Mariappan Asokan Assignee: Mariappan Asokan Priority: Minor Labels: features, performance, plugin, sort Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java Define interfaces and some abstract classes in the Hadoop framework to facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days
[ https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494279#comment-13494279 ] Robert Joseph Evans commented on MAPREDUCE-4751: I have been doing a quick once over on this, and I have a few comments. # I think it would be cleaner for KillWaitAttemptKilledTransition to have a constructor that takes a TaskAttemptCompletionEventStatus, instead of having the subclasses set it directly themselves. # Remove the commented out if statement. # I am not sure if HashSet is the correct data type for success, failed, etc. They are likely to be sparse arrays with small amounts of data in them. Probably not very important, but if there are thousands of tasks it starts to add up. Over all it looks OK. I would like to see more tests though. AM stuck in KILL_WAIT for days -- Key: MAPREDUCE-4751 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.3, 2.0.2-alpha Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days
[ https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494295#comment-13494295 ] Jason Lowe commented on MAPREDUCE-4751: --- Part of the issue is that the job is hanging around waiting for all tasks to be killed rather than just exiting and letting YARN shoot any straggling containers. I think it would be simpler/safer for the AM to just write out the final state stuff and exit, much like it does for the FAILED state. If job's KILL_WAIT really is necessary then we'd need a corresponding FAILED_WAIT state to handle waiting for task cleanup when a job fails. If we don't need the job's KILL_WAIT state then similarly we can probably ditch the task KILL_WAIT state -- it could just send off kills to all the corresponding task attempts and sit in the KILLED state. Does it really need to wait? Removing KILL_WAIT is quite a bit bigger change than the current one. as it would break a lot of tests that know and expect the KILL_WAIT state. However I think it would be more robust in the long-term, as KILL_WAIT seems like a state primed for hanging if we don't really need it. Since we're eager to get a fix for this in soon we could address that in a followup JIRA. AM stuck in KILL_WAIT for days -- Key: MAPREDUCE-4751 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.3, 2.0.2-alpha Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days
[ https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494303#comment-13494303 ] Robert Joseph Evans commented on MAPREDUCE-4751: Yes I think that would be better. But that is a much larger change that would need more tests. Perhaps we do that in a follow on JIRA. AM stuck in KILL_WAIT for days -- Key: MAPREDUCE-4751 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.3, 2.0.2-alpha Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4783) data_join mavenization broke the mr1 build
[ https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494306#comment-13494306 ] Robert Joseph Evans commented on MAPREDUCE-4783: I think this can be dupes to MAPREDUCE-4266. It removed all of the ant code. data_join mavenization broke the mr1 build -- Key: MAPREDUCE-4783 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Attachments: mapreduce-4783.txt MR-4238 didn't update build.xml and forgot to nuke the old data_join directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4783) data_join mavenization broke the mr1 build
[ https://issues.apache.org/jira/browse/MAPREDUCE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated MAPREDUCE-4783: --- Resolution: Duplicate Status: Resolved (was: Patch Available) Great, thanks. data_join mavenization broke the mr1 build -- Key: MAPREDUCE-4783 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4783 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Attachments: mapreduce-4783.txt MR-4238 didn't update build.xml and forgot to nuke the old data_join directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4666) JVM metrics for history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494353#comment-13494353 ] Jonathan Eagles commented on MAPREDUCE-4666: +1. simple change that works for me. JVM metrics for history server -- Key: MAPREDUCE-4666 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.0.2-alpha Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor Attachments: MAPREDUCE-4666.patch It would be nice if the job history server provided the same JVM metrics via metrics2 that other Hadoop daemons are already providing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4666) JVM metrics for history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated MAPREDUCE-4666: --- Resolution: Fixed Fix Version/s: 0.23.5 2.0.3-alpha 3.0.0 Status: Resolved (was: Patch Available) JVM metrics for history server -- Key: MAPREDUCE-4666 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.0.2-alpha Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4666.patch It would be nice if the job history server provided the same JVM metrics via metrics2 that other Hadoop daemons are already providing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4774) repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR
[ https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4774: -- Attachment: MAPREDUCE-4774.patch This test failure is pretty pervasive and annoying, so taking this to get it fixed quickly. Patch ignores some asynchronous task events in the FAILED state much like we do in the ERROR state, along with corresponding unit tests to verify we're handling them properly. repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR - Key: MAPREDUCE-4774 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ivan A. Veselovsky Attachments: MAPREDUCE-4774.patch The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently fails in mapred build (e.g. see https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/ , or https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/). The test aims to check Job status notifications received through HTTP Servlet. It runs 3 jobs: successfull, killed, and failed. The test expects the servlet to receive some expected notifications in some expected order. It also tries to test the retry-on-failure notification functionality, so on each 1st notification the servlet answers 400 forcing error, and on each 2nd notification attempt it answers ok. In general, the test fails because the actual number and/or type of the notifications differs from the expected. Investigation shows that actual root cause of the problem is an incorrect job state transition: the 3rd job mapred task fails (by intentionally thrown RuntimeException, see UtilsForTests#runJobFail()), and the state of the task changes from RUNNING to FAILED. At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in method org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId, TaskAttemptCompletionEventStatus)), and this event gets processed in AsyncDispatcher, but this transition is impossible according to the event transition map (see org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). This causes the following exception to be thrown upon the event processing: 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309) at org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79) at java.lang.Thread.run(Thread.java:662) So, the job gets into state INTERNAL_ERROR, the job end notification like this is sent: http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR (here we can see ERROR status instead of FAILED) After that the notification servlet receives either only ERROR notification, or one more notification ERROR after FAILED, which finally causes the test to fail. (Some variation in the test behavior caused by racing conditions because there are many asynchronous processings there, and the test is flaky, in fact). In any way, it looks like the root cause of the problem is the possibility of the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED. Need an expert advice on how that should be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state
[ https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4774: -- Component/s: mrv2 applicationmaster Affects Version/s: 0.23.3 2.0.1-alpha Summary: JobImpl does not handle asynchronous task events in FAILED state (was: repair test org.apache.hadoop.mapred.TestClusterMRNotification.testMR) Editing headline to more accurately reflect the root cause. JobImpl does not handle asynchronous task events in FAILED state Key: MAPREDUCE-4774 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Ivan A. Veselovsky Attachments: MAPREDUCE-4774.patch The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently fails in mapred build (e.g. see https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/ , or https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/). The test aims to check Job status notifications received through HTTP Servlet. It runs 3 jobs: successfull, killed, and failed. The test expects the servlet to receive some expected notifications in some expected order. It also tries to test the retry-on-failure notification functionality, so on each 1st notification the servlet answers 400 forcing error, and on each 2nd notification attempt it answers ok. In general, the test fails because the actual number and/or type of the notifications differs from the expected. Investigation shows that actual root cause of the problem is an incorrect job state transition: the 3rd job mapred task fails (by intentionally thrown RuntimeException, see UtilsForTests#runJobFail()), and the state of the task changes from RUNNING to FAILED. At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in method org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId, TaskAttemptCompletionEventStatus)), and this event gets processed in AsyncDispatcher, but this transition is impossible according to the event transition map (see org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). This causes the following exception to be thrown upon the event processing: 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309) at org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79) at java.lang.Thread.run(Thread.java:662) So, the job gets into state INTERNAL_ERROR, the job end notification like this is sent: http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR (here we can see ERROR status instead of FAILED) After that the notification servlet receives either only ERROR notification, or one more notification ERROR after FAILED, which finally causes the test to fail. (Some variation in the test behavior caused by racing conditions because there are many asynchronous processings there, and the test is flaky, in fact). In any way, it looks like the root cause of the problem is the possibility of the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED. Need an expert advice on how that should be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see:
[jira] [Commented] (MAPREDUCE-4666) JVM metrics for history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494368#comment-13494368 ] Hudson commented on MAPREDUCE-4666: --- Integrated in Hadoop-trunk-Commit #2996 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2996/]) MAPREDUCE-4666. JVM metrics for history server. (jlowe via jeagles) (Revision 1407669) Result = SUCCESS jeagles : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407669 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java JVM metrics for history server -- Key: MAPREDUCE-4666 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4666 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.0.2-alpha Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4666.patch It would be nice if the job history server provided the same JVM metrics via metrics2 that other Hadoop daemons are already providing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state
[ https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4774: -- Assignee: Jason Lowe Target Version/s: 2.0.3-alpha, 0.23.5 Status: Patch Available (was: Open) JobImpl does not handle asynchronous task events in FAILED state Key: MAPREDUCE-4774 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 2.0.1-alpha, 0.23.3 Reporter: Ivan A. Veselovsky Assignee: Jason Lowe Attachments: MAPREDUCE-4774.patch The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently fails in mapred build (e.g. see https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/ , or https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/). The test aims to check Job status notifications received through HTTP Servlet. It runs 3 jobs: successfull, killed, and failed. The test expects the servlet to receive some expected notifications in some expected order. It also tries to test the retry-on-failure notification functionality, so on each 1st notification the servlet answers 400 forcing error, and on each 2nd notification attempt it answers ok. In general, the test fails because the actual number and/or type of the notifications differs from the expected. Investigation shows that actual root cause of the problem is an incorrect job state transition: the 3rd job mapred task fails (by intentionally thrown RuntimeException, see UtilsForTests#runJobFail()), and the state of the task changes from RUNNING to FAILED. At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in method org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId, TaskAttemptCompletionEventStatus)), and this event gets processed in AsyncDispatcher, but this transition is impossible according to the event transition map (see org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). This causes the following exception to be thrown upon the event processing: 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309) at org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79) at java.lang.Thread.run(Thread.java:662) So, the job gets into state INTERNAL_ERROR, the job end notification like this is sent: http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR (here we can see ERROR status instead of FAILED) After that the notification servlet receives either only ERROR notification, or one more notification ERROR after FAILED, which finally causes the test to fail. (Some variation in the test behavior caused by racing conditions because there are many asynchronous processings there, and the test is flaky, in fact). In any way, it looks like the root cause of the problem is the possibility of the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED. Need an expert advice on how that should be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state
[ https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494377#comment-13494377 ] Robert Joseph Evans commented on MAPREDUCE-4774: The change looks simple enough and does fix the failing test. I am +1 p[ending Jenkins approval. JobImpl does not handle asynchronous task events in FAILED state Key: MAPREDUCE-4774 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Ivan A. Veselovsky Assignee: Jason Lowe Attachments: MAPREDUCE-4774.patch The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently fails in mapred build (e.g. see https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/ , or https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/). The test aims to check Job status notifications received through HTTP Servlet. It runs 3 jobs: successfull, killed, and failed. The test expects the servlet to receive some expected notifications in some expected order. It also tries to test the retry-on-failure notification functionality, so on each 1st notification the servlet answers 400 forcing error, and on each 2nd notification attempt it answers ok. In general, the test fails because the actual number and/or type of the notifications differs from the expected. Investigation shows that actual root cause of the problem is an incorrect job state transition: the 3rd job mapred task fails (by intentionally thrown RuntimeException, see UtilsForTests#runJobFail()), and the state of the task changes from RUNNING to FAILED. At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in method org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId, TaskAttemptCompletionEventStatus)), and this event gets processed in AsyncDispatcher, but this transition is impossible according to the event transition map (see org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). This causes the following exception to be thrown upon the event processing: 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309) at org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79) at java.lang.Thread.run(Thread.java:662) So, the job gets into state INTERNAL_ERROR, the job end notification like this is sent: http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR (here we can see ERROR status instead of FAILED) After that the notification servlet receives either only ERROR notification, or one more notification ERROR after FAILED, which finally causes the test to fail. (Some variation in the test behavior caused by racing conditions because there are many asynchronous processings there, and the test is flaky, in fact). In any way, it looks like the root cause of the problem is the possibility of the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED. Need an expert advice on how that should be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state
[ https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494389#comment-13494389 ] Hadoop QA commented on MAPREDUCE-4774: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552903/MAPREDUCE-4774.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app: org.apache.hadoop.mapreduce.v2.app.TestRecovery {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3006//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3006//console This message is automatically generated. JobImpl does not handle asynchronous task events in FAILED state Key: MAPREDUCE-4774 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Ivan A. Veselovsky Assignee: Jason Lowe Attachments: MAPREDUCE-4774.patch The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently fails in mapred build (e.g. see https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/ , or https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/). The test aims to check Job status notifications received through HTTP Servlet. It runs 3 jobs: successfull, killed, and failed. The test expects the servlet to receive some expected notifications in some expected order. It also tries to test the retry-on-failure notification functionality, so on each 1st notification the servlet answers 400 forcing error, and on each 2nd notification attempt it answers ok. In general, the test fails because the actual number and/or type of the notifications differs from the expected. Investigation shows that actual root cause of the problem is an incorrect job state transition: the 3rd job mapred task fails (by intentionally thrown RuntimeException, see UtilsForTests#runJobFail()), and the state of the task changes from RUNNING to FAILED. At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in method org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId, TaskAttemptCompletionEventStatus)), and this event gets processed in AsyncDispatcher, but this transition is impossible according to the event transition map (see org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). This causes the following exception to be thrown upon the event processing: 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309) at org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1) at
[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state
[ https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494392#comment-13494392 ] Robert Joseph Evans commented on MAPREDUCE-4774: I ran TestRecovery Manually and it looks like it is a spurious failure. We should file a JIRA to fix it. Checking in the patch now. JobImpl does not handle asynchronous task events in FAILED state Key: MAPREDUCE-4774 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Ivan A. Veselovsky Assignee: Jason Lowe Attachments: MAPREDUCE-4774.patch The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently fails in mapred build (e.g. see https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/ , or https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/). The test aims to check Job status notifications received through HTTP Servlet. It runs 3 jobs: successfull, killed, and failed. The test expects the servlet to receive some expected notifications in some expected order. It also tries to test the retry-on-failure notification functionality, so on each 1st notification the servlet answers 400 forcing error, and on each 2nd notification attempt it answers ok. In general, the test fails because the actual number and/or type of the notifications differs from the expected. Investigation shows that actual root cause of the problem is an incorrect job state transition: the 3rd job mapred task fails (by intentionally thrown RuntimeException, see UtilsForTests#runJobFail()), and the state of the task changes from RUNNING to FAILED. At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in method org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId, TaskAttemptCompletionEventStatus)), and this event gets processed in AsyncDispatcher, but this transition is impossible according to the event transition map (see org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). This causes the following exception to be thrown upon the event processing: 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309) at org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79) at java.lang.Thread.run(Thread.java:662) So, the job gets into state INTERNAL_ERROR, the job end notification like this is sent: http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR (here we can see ERROR status instead of FAILED) After that the notification servlet receives either only ERROR notification, or one more notification ERROR after FAILED, which finally causes the test to fail. (Some variation in the test behavior caused by racing conditions because there are many asynchronous processings there, and the test is flaky, in fact). In any way, it looks like the root cause of the problem is the possibility of the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED. Need an expert advice on how that should be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state
[ https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4774: --- Resolution: Fixed Fix Version/s: 0.23.5 2.0.3-alpha 3.0.0 Status: Resolved (was: Patch Available) Thanks Jason, I put this into trunk, branch-2, and branch-0.23 JobImpl does not handle asynchronous task events in FAILED state Key: MAPREDUCE-4774 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Ivan A. Veselovsky Assignee: Jason Lowe Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4774.patch The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently fails in mapred build (e.g. see https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/ , or https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/). The test aims to check Job status notifications received through HTTP Servlet. It runs 3 jobs: successfull, killed, and failed. The test expects the servlet to receive some expected notifications in some expected order. It also tries to test the retry-on-failure notification functionality, so on each 1st notification the servlet answers 400 forcing error, and on each 2nd notification attempt it answers ok. In general, the test fails because the actual number and/or type of the notifications differs from the expected. Investigation shows that actual root cause of the problem is an incorrect job state transition: the 3rd job mapred task fails (by intentionally thrown RuntimeException, see UtilsForTests#runJobFail()), and the state of the task changes from RUNNING to FAILED. At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in method org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId, TaskAttemptCompletionEventStatus)), and this event gets processed in AsyncDispatcher, but this transition is impossible according to the event transition map (see org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). This causes the following exception to be thrown upon the event processing: 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309) at org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79) at java.lang.Thread.run(Thread.java:662) So, the job gets into state INTERNAL_ERROR, the job end notification like this is sent: http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR (here we can see ERROR status instead of FAILED) After that the notification servlet receives either only ERROR notification, or one more notification ERROR after FAILED, which finally causes the test to fail. (Some variation in the test behavior caused by racing conditions because there are many asynchronous processings there, and the test is flaky, in fact). In any way, it looks like the root cause of the problem is the possibility of the forbidden transition Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED. Need an expert advice on how that should be fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494423#comment-13494423 ] Mariappan Asokan commented on MAPREDUCE-2454: - Hi Alejandro, I ran the tests on my box. The failing tests are failing without my patch. The failure does not seem to be related to my patch. -- Asokan Allow external sorter plugin for MR --- Key: MAPREDUCE-2454 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha Reporter: Mariappan Asokan Assignee: Mariappan Asokan Priority: Minor Labels: features, performance, plugin, sort Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, ReduceInputSorter.java Define interfaces and some abstract classes in the Hadoop framework to facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4774) JobImpl does not handle asynchronous task events in FAILED state
[ https://issues.apache.org/jira/browse/MAPREDUCE-4774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494459#comment-13494459 ] Hudson commented on MAPREDUCE-4774: --- Integrated in Hadoop-trunk-Commit #2997 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2997/]) MAPREDUCE-4774. JobImpl does not handle asynchronous task events in FAILED state (jlowe via bobby) (Revision 1407679) Result = SUCCESS bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1407679 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java JobImpl does not handle asynchronous task events in FAILED state Key: MAPREDUCE-4774 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4774 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.3, 2.0.1-alpha Reporter: Ivan A. Veselovsky Assignee: Jason Lowe Fix For: 3.0.0, 2.0.3-alpha, 0.23.5 Attachments: MAPREDUCE-4774.patch The test org.apache.hadoop.mapred.TestClusterMRNotification.testMR frequently fails in mapred build (e.g. see https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2988/testReport/junit/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/ , or https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2982//testReport/org.apache.hadoop.mapred/TestClusterMRNotification/testMR/). The test aims to check Job status notifications received through HTTP Servlet. It runs 3 jobs: successfull, killed, and failed. The test expects the servlet to receive some expected notifications in some expected order. It also tries to test the retry-on-failure notification functionality, so on each 1st notification the servlet answers 400 forcing error, and on each 2nd notification attempt it answers ok. In general, the test fails because the actual number and/or type of the notifications differs from the expected. Investigation shows that actual root cause of the problem is an incorrect job state transition: the 3rd job mapred task fails (by intentionally thrown RuntimeException, see UtilsForTests#runJobFail()), and the state of the task changes from RUNNING to FAILED. At this point JobEventType.JOB_TASK_ATTEMPT_COMPLETED event is submitted (in method org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handleTaskAttemptCompletion(TaskAttemptId, TaskAttemptCompletionEventStatus)), and this event gets processed in AsyncDispatcher, but this transition is impossible according to the event transition map (see org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl#stateMachineFactory). This causes the following exception to be thrown upon the event processing: 2012-11-06 12:22:02,335 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_ATTEMPT_COMPLETED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:309) at org.apache.hadoop.yarn.state.StateMachineFactory.access$3(StateMachineFactory.java:290) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:454) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:716) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:917) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:79) at java.lang.Thread.run(Thread.java:662) So, the job gets into state INTERNAL_ERROR, the job end notification like this is sent: http://localhost:48656/notification/mapred?jobId=job_1352199715842_0002amp;jobStatus=ERROR (here we can see ERROR status instead of FAILED) After that the notification servlet receives either only ERROR notification, or one more notification ERROR after FAILED, which finally causes the test to fail. (Some variation in the test behavior caused by racing
[jira] [Updated] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days
[ https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-4751: --- Attachment: MAPREDUCE-4751-20121109.txt Address Bobby's comments on my earlier patch. - Agree about Hashset. Started doing bitmaps, but it made code unreadable. Keeping HashSet but with an explicit initial capacity of 2 instead of the default 16. Could've been 1, but HashSet/HashMap immediately resizes it to two. - Addressed other changes. - Wrote up a test which passes with the changes and fails without. Had to spend a lot of time to get it right. AM stuck in KILL_WAIT for days -- Key: MAPREDUCE-4751 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.3, 2.0.2-alpha Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Attachments: MAPREDUCE-4751-20121108.txt, MAPREDUCE-4751-20121109.txt, TaskAttemptStateGraph.jpg We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days
[ https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-4751: --- Status: Patch Available (was: Open) AM stuck in KILL_WAIT for days -- Key: MAPREDUCE-4751 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.2-alpha, 0.23.3 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Attachments: MAPREDUCE-4751-20121108.txt, MAPREDUCE-4751-20121109.txt, TaskAttemptStateGraph.jpg We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days
[ https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494518#comment-13494518 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-4751: bq. Part of the issue is that the job is hanging around waiting for all tasks to be killed rather than just exiting and letting YARN shoot any straggling containers. I think it would be simpler/safer for the AM to just write out the final state stuff and exit, much like it does for the FAILED state. If job's KILL_WAIT really is necessary then we'd need a corresponding FAILED_WAIT state to handle waiting for task cleanup when a job fails. I agree. Sharad/I debated this for a while when we wrote this initially. We let it be like it is now, just to be sure that AM's sanely exit, but we can change it. The only catch I can think of is, while the AM tries to do the remaining cleanup work (jobhistory etc), tasks will keep on bombarding AM with more updates. Didn't realize that we don't have fail_wait state. The change isn't much bigger but it can break tests. Let's pursue that separately. The current bug is caused by Tasks waiting on TAs which should be fixed by my patch. Of course, it then opens up the job bug, let's fix that separately. Regarding doing away with Task's kill_wait, I disagree. Tasks can get kill signal during the AM is running, so we should handle it explicitly by killing and waiting for all attempts, otherwise we run the risk of dangling JVMs doing nothing but occupying slots till AM exits. AM stuck in KILL_WAIT for days -- Key: MAPREDUCE-4751 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.3, 2.0.2-alpha Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Attachments: MAPREDUCE-4751-20121108.txt, MAPREDUCE-4751-20121109.txt, TaskAttemptStateGraph.jpg We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4749) Killing multiple attempts of a task taker longer as more attempts are killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494523#comment-13494523 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-4749: Looks good to me too. The algo is clean. And nice tests too! Checking it in. Killing multiple attempts of a task taker longer as more attempts are killed Key: MAPREDUCE-4749 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4749 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.0 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch The following was noticed on a mr job running on hadoop 1.1.0 1. Start an mr job with 1 mapper 2. Wait for a min 3. Kill the first attempt of the mapper and then subsequently kill the other 3 attempts in order to fail the job The time taken to kill the task grew exponentially. 1st attempt was killed immediately. 2nd attempt took a little over a min 3rd attempt took approx. 20 mins 4th attempt took around 3 hrs. The command used to kill the attempt was hadoop job -fail-task Note that the command returned immediately as soon as the fail attempt was accepted but the time the attempt was actually killed was as stated above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days
[ https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13494522#comment-13494522 ] Hadoop QA commented on MAPREDUCE-4751: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552951/MAPREDUCE-4751-20121109.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3007//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3007//console This message is automatically generated. AM stuck in KILL_WAIT for days -- Key: MAPREDUCE-4751 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.3, 2.0.2-alpha Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Attachments: MAPREDUCE-4751-20121108.txt, MAPREDUCE-4751-20121109.txt, TaskAttemptStateGraph.jpg We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4749) Killing multiple attempts of a task taker longer as more attempts are killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved MAPREDUCE-4749. Resolution: Fixed Fix Version/s: 1.1.1 Hadoop Flags: Reviewed I just committed this to branch-1 and branch-1.1. Thanks Arpit! Killing multiple attempts of a task taker longer as more attempts are killed Key: MAPREDUCE-4749 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4749 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.0 Reporter: Arpit Gupta Assignee: Arpit Gupta Fix For: 1.1.1 Attachments: MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch, MAPREDUCE-4749.branch-1.patch The following was noticed on a mr job running on hadoop 1.1.0 1. Start an mr job with 1 mapper 2. Wait for a min 3. Kill the first attempt of the mapper and then subsequently kill the other 3 attempts in order to fail the job The time taken to kill the task grew exponentially. 1st attempt was killed immediately. 2nd attempt took a little over a min 3rd attempt took approx. 20 mins 4th attempt took around 3 hrs. The command used to kill the attempt was hadoop job -fail-task Note that the command returned immediately as soon as the fail attempt was accepted but the time the attempt was actually killed was as stated above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira