[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592160#comment-13592160 ] Arun A K commented on MAPREDUCE-4974: - @ All, Can we mark this issue as resolved? > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, > MAPREDUCE-4974.4.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589434#comment-13589434 ] Arun A K commented on MAPREDUCE-4974: - Updated the review request url with the latest patch. Please find the same at - https://reviews.apache.org/r/9440/ > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, > MAPREDUCE-4974.4.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585797#comment-13585797 ] Arun A K commented on MAPREDUCE-4974: - As [~gelesh] has mentioned, we had in mind, elimination of repeated null checks, while trying to optimize the code. If it is of not much significance, please go ahead with the latest available patch containing the rest of changes. > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, > MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13579453#comment-13579453 ] Arun A K commented on MAPREDUCE-4974: - Please find the review request. https://reviews.apache.org/r/9440/ > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, > MAPREDUCE-4974.3.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun A K updated MAPREDUCE-4974: Attachment: MAPREDUCE-4974.2.patch Key & Value null assignment is in nextKeyValue(), is moved to close() to avoid NPE, as per the review comments. Also, if (newSize == 0) check is voided inside the loop, since, if (newSize < maxLineLength)includes the same check. How ever, if(newSize == 0) condition is checked outside the while loop. Hope this would also improve performance. Combined effort with Gelesh. > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574334#comment-13574334 ] Arun A K commented on MAPREDUCE-4974: - If someone could add their review comments, we could look on for the mentioned changes. https://reviews.apache.org/r/9287/ > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Fix For: 0.20.204.0, 0.24.0 > > Attachments: MAPREDUCE-4974.1.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572436#comment-13572436 ] Arun A K commented on MAPREDUCE-4974: - Kindly advice if the optimization is worth. > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Fix For: 0.20.204.0, 0.24.0 > > Attachments: MAPREDUCE-4974.1.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570161#comment-13570161 ] Arun A K commented on MAPREDUCE-4974: - Quoting the review request url for this issue - https://reviews.apache.org/r/9287/ > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Fix For: 0.20.204.0, 0.24.0 > > Attachments: MAPREDUCE-4974.1.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun A K updated MAPREDUCE-4974: Summary: Optimising the LineRecordReader initialize() method (was: optimising the LineRecordReader initialize method) > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Fix For: 0.20.204.0, 0.24.0 > > Attachments: MAPREDUCE-4974.1.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
Arun A K created MAPREDUCE-4974: --- Summary: optimising the LineRecordReader initialize method Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 0.23.5, 2.0.2-alpha Environment: Hadoop Linux Reporter: Arun A K Assignee: Arun A K Fix For: 0.20.204.0, 0.24.0 I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs & codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key & value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key & value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4709) Counters that track max values
[ https://issues.apache.org/jira/browse/MAPREDUCE-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562567#comment-13562567 ] Arun A K commented on MAPREDUCE-4709: - @Jeremy Lewi, Could you please elaborate on the problem with an example? > Counters that track max values > -- > > Key: MAPREDUCE-4709 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4709 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeremy Lewi >Priority: Minor > > A nice feature to help monitor MR jobs would be mapreduce counters that track > the maximum of some metric across all workers. These trackers would work just > like regular counters except it would track the max value of all arguments > passed to the "increment" function as opposed to summing them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4770) Hadoop jobs failing with FileNotFound Exception while the job is still running
[ https://issues.apache.org/jira/browse/MAPREDUCE-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562387#comment-13562387 ] Arun A K commented on MAPREDUCE-4770: - Not sure if this could be the solution - IsolationRunner is a utility to help debug MapReduce programs. To use the IsolationRunner, first set keep.failed.task.files to true (also see keep.task.files.pattern). Next, go to the node on which the failed task ran and go to the TaskTracker's local directory and run the IsolationRunner: $ cd /taskTracker/${taskid}/work $ bin/hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml IsolationRunner will run the failed task in a single jvm, which can be in the debugger, over precisely the same input. Note that currently IsolationRunner will only re-run map tasks. Reference : http://hadoop.apache.org/docs/r1.1.1/mapred_tutorial.html > Hadoop jobs failing with FileNotFound Exception while the job is still running > -- > > Key: MAPREDUCE-4770 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4770 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.203.0 >Reporter: Jaikannan Ramamoorthy > > We are having a strange issue in our Hadoop cluster. We have noticed that > some of our jobs fail with the with a file not found exception[see below]. > Basically the files in the "attempt_*" directory and the directory itself are > getting deleted while the task is still being run on the host. Looking > through some of the hadoop documentation I see that the job directory gets > wiped out when it gets a KillJobAction however I am not sure why it gets > wiped out while the job is still running. > My question is what could be deleting it while the job is running? Any > thoughts or pointers on how to debug this would be helpful. > Thanks! > java.io.FileNotFoundException: > /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out > (Permission denied) at java.io.FileInputStream.open(Native Method) at > java.io.FileInputStream.(FileInputStream.java:120) at > org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107) > at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400) at > org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at > org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165) at > org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at > org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at > org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at > org.apache.hadoop.mapred.Child$4.run(Child.java:259) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.mapred.Child.main(Child.java:253) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4519) In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character
Arun A K created MAPREDUCE-4519: --- Summary: In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character sequence in delimiter were found missing in certain cases in the Map Output Key: MAPREDUCE-4519 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4519 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Environment: Linux- Ubuntu 10.04 Reporter: Arun A K Fix For: 0.20.2 Set textinputformat.record.delimiter as "" Suppose the input is a text file with the following content 1User12User23User34User45User5 Mapper was expected to get value as Value 1 - 1User1 Value 2 - 2User2 Value 3 - 3User3 Value 4 - 4User4 Value 5 - 5User5 According to this bug Mapper gets value Value 1 - entity>1User1 Value 2 - id>2User2 Value 3 - 3id>User3 Value 4 - 4User4name> Value 5 - 5User5 The pattern shown above need not occur for value 1,2,3 necessarily. The bug occurs at some random positions in the map input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira