[jira] [Commented] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter
[ https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506465#comment-14506465 ] Gelesh commented on MAPREDUCE-5733: --- to use a single public constant, which class and package it should belong ? I thing if Map Red is also to be fixed we need to place a seperate similar static final decleration. Instead if we would use a single refrence in any one package either mapred or mapreduce the users of other package would be forced to have a import statment just to accesss this string call. Define and use a constant for property textinputformat.record.delimiter - Key: MAPREDUCE-5733 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Gelesh Assignee: Gelesh Priority: Trivial Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch Original Estimate: 10m Remaining Estimate: 10m (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter
[ https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-5733: -- Attachment: MAPREDUCE-5733_2.patch Patch Define and use a constant for property textinputformat.record.delimiter - Key: MAPREDUCE-5733 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Gelesh Assignee: Gelesh Priority: Trivial Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch Original Estimate: 10m Remaining Estimate: 10m (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter
[ https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502627#comment-14502627 ] Gelesh commented on MAPREDUCE-5733: --- This feature could be tested using a MR Unit, or with a get and set on Configuration object like conf.set(TextInputFormat.DELIMITER,/record) Asset(conf.get(textInputFormat.record.delimiter,/record) Since its just a Static variable decleration, I dont think we need to place a test case for the same. Define and use a constant for property textinputformat.record.delimiter - Key: MAPREDUCE-5733 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Gelesh Assignee: Gelesh Priority: Trivial Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch Original Estimate: 10m Remaining Estimate: 10m (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter
[ https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502626#comment-14502626 ] Gelesh commented on MAPREDUCE-5733: --- This feature could be tested using a MR Unit, or with a get and set on Configuration object like conf.set(TextInputFormat.DELIMITER,/record) Asset(conf.get(textInputFormat.record.delimiter,/record) Since its just a Static variable decleration, I dont think we need to place a test case for the same. Define and use a constant for property textinputformat.record.delimiter - Key: MAPREDUCE-5733 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Gelesh Assignee: Gelesh Priority: Trivial Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch Original Estimate: 10m Remaining Estimate: 10m (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter
[ https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-5733: -- Assignee: Gelesh (was: Abhilash S R) Status: Patch Available (was: Open) Define and use a constant for property textinputformat.record.delimiter - Key: MAPREDUCE-5733 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Gelesh Assignee: Gelesh Priority: Trivial Attachments: MAPREDUCE-5733.patch Original Estimate: 10m Remaining Estimate: 10m (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5143) TestLineRecordReader has no test case for compressed files
[ https://issues.apache.org/jira/browse/MAPREDUCE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-5143: -- Summary: TestLineRecordReader has no test case for compressed files (was: TestLineRecordReader was no test case for compressed files) TestLineRecordReader has no test case for compressed files -- Key: MAPREDUCE-5143 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5143 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, trunk, 2.1.0-beta Reporter: Sonu Prathap Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: MAPREDUCE-5143.1.patch, MAPREDUCE-5143.2.patch TestLineRecordReader was no test case for compressed files -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5733) (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such
Gelesh created MAPREDUCE-5733: - Summary: (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. Key: MAPREDUCE-5733 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Gelesh Priority: Trivial (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5733) (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise suc
[ https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879874#comment-13879874 ] Gelesh commented on MAPREDUCE-5733: --- [~abhilashsr2008] Thanks a lot :-) (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. Key: MAPREDUCE-5733 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Gelesh Assignee: Abhilash S R Priority: Trivial Labels: patch Attachments: MAPREDUCE-5733.patch Original Estimate: 10m Remaining Estimate: 10m (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such error. This would also help in IDE like eclipse suggesting the String. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5143) TestLineRecordReader was no test case for compressed files
[ https://issues.apache.org/jira/browse/MAPREDUCE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668192#comment-13668192 ] Gelesh commented on MAPREDUCE-5143: --- [~ozawa], Could you please add this as diff to https://reviews.apache.org/r/11456/ I tried, but failed, The file 'hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java' (r69253f4) could not be found in the repository TestLineRecordReader was no test case for compressed files -- Key: MAPREDUCE-5143 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5143 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, trunk, 2.0.5-beta Reporter: Sonu Prathap Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: MAPREDUCE-5143.1.patch TestLineRecordReader was no test case for compressed files -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5216) While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits.
Gelesh created MAPREDUCE-5216: - Summary: While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits. Key: MAPREDUCE-5216 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5216 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Gelesh -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5216) While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-5216: -- Due Date: 7/May/13 Description: While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits. ie, Split 1 Start =A, End = M, Split 2 Start =A, End = P, Split 3 Start =A, End = S, instead of Split 1 Start =A, End = M, Split 2 Start =M, End = P, Split 3 Start =P, End = S, Remaining Estimate: 1h Original Estimate: 1h Nithin is working on the same. While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits. --- Key: MAPREDUCE-5216 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5216 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Gelesh Original Estimate: 1h Remaining Estimate: 1h While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits. ie, Split 1 Start =A, End = M, Split 2 Start =A, End = P, Split 3 Start =A, End = S, instead of Split 1 Start =A, End = M, Split 2 Start =M, End = P, Split 3 Start =P, End = S, -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628855#comment-13628855 ] Gelesh commented on MAPREDUCE-4974: --- [~jira.shegalov] .. thanks for sharing your thoughts, I have tested using JUnit run of TestLineRecordReader , but as of now, for compressed input test case is not incorporated in TestLineRecordReader. Thats a place we need to cross check, but hope the code would hold good, because modification in this area is minimal. The aim was to perfomance enhance, by removing the null check .. but the incompatibility with any build happen upon the existing may give NPE , as discussed above ([~snihalani]'s comments, The patch was limited to 1) removing the null assignments for the key Value 2) limiting CompressionCodecFactory , and Codec to method local scope 3) removing line 170-173 if (newSize == 0) { break; } Unnecessary ==0 check inside a look. ... Because the code to handle this is there iut side the loop, and the code which does the same seems of no value add. 4) in order to achieve point 2 , private boolen isCompressedInput variable was introduces instead if private boolean isCompressedInput(); method. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.23.7, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626095#comment-13626095 ] Gelesh commented on MAPREDUCE-4974: --- [~jlowe], [~revans2], [~jira.shegalov] , [~snihalani], [~kkambatl] Could any body please share views / suggestions ... Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.23.7, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625123#comment-13625123 ] Gelesh commented on MAPREDUCE-4879: --- [~jira.shegalov] Since TeraOutputFormat extends FileOutputFormat, it must be in compliance with checkOutputSpecs(), of FileOutputFormat. And checkOutputSpecs() is supposed to be called from Job Client. If there is any issue, I feel it would be better to fix it within the existing control flow. TeraOutputFormat may overwrite an existing output directory --- Key: MAPREDUCE-4879 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Affects Versions: trunk Reporter: Gera Shegalov Attachments: MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879-trunk-rev1.patch Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs from writing into an existing directory, and potentially overwriting previous runs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625124#comment-13625124 ] Gelesh commented on MAPREDUCE-4879: --- Could you please , post the same over review board, so that we could get a better insight about the code change. TeraOutputFormat may overwrite an existing output directory --- Key: MAPREDUCE-4879 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Affects Versions: trunk Reporter: Gera Shegalov Attachments: MAPREDUCE-4879-trunk.patch, MAPREDUCE-4879-trunk-rev1.patch Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs from writing into an existing directory, and potentially overwriting previous runs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622452#comment-13622452 ] Gelesh commented on MAPREDUCE-4974: --- Since this is just an optimization and existing test case would suffice, hope this is +1 Could some body kindly review. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.23.7, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4974: -- Target Version/s: 0.23.5, 0.23.4, 2.0.1-alpha, 2.0.0-alpha, 1.1.1, 1.0.4, 1.0.0 (was: 1.0.0, 1.0.4, 1.1.1, 2.0.0-alpha, 2.0.1-alpha, 0.23.4, 0.23.5) Status: Patch Available (was: Reopened) Reduced the scope of compressionCodecs codec Introduced boolean isCompressedInput instead of boolean isCompressedInput() Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 0.23.5, 2.0.2-alpha Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.23.7, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4974: -- Attachment: MAPREDUCE-4974.5.patch CompressionCodecFactory compressionCodecs, and CompressionCodec codec, object made local to initialise(), private boolean isCompressedInput introduced instead of private boolean isCompressedInput() Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.23.7, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620877#comment-13620877 ] Gelesh commented on MAPREDUCE-4974: --- [~hadoopqa] Automated check is yet to act over patch .5. Kindly advice. Please refer the review board https://reviews.apache.org/r/9440/diff/3/ Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.23.7, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619812#comment-13619812 ] Gelesh commented on MAPREDUCE-4974: --- [~jira.shegalov], I too apologise for not noticing you comments over review board. I had not much idea about review board, and was expecting the review comments over here(Jira). Thanks for sharing your thoughts. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.23.7, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619813#comment-13619813 ] Gelesh commented on MAPREDUCE-4974: --- [~jira.shegalov], [~revans2], I would suggest to have isCompressedInput a private boolean variable by default false, instead of isCompressedInput() method. This would help us to reduce the scope of Codec object along with CompressionCodecFactory object, to local. Which as of now is a class variable ? I would be patching this modification shortly. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.23.7, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618339#comment-13618339 ] Gelesh commented on MAPREDUCE-4974: --- [~revans2], [~jlowe], Could any of you please act on this ? Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589433#comment-13589433 ] Gelesh commented on MAPREDUCE-4974: --- [~ak.a...@aol.com] has put this patch on review board.(Thanks AK) [~snihalani], Please reffer this link, to visualize the patch diffrence https://reviews.apache.org/r/9440/diff/#index_header Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586946#comment-13586946 ] Gelesh commented on MAPREDUCE-4974: --- [~snihalani], I think you reffred an old patch, Please look at MAPREDUCE-4974.4.patch Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4974: -- Attachment: (was: MAPREDUCE-4974.1.patch) Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4974: -- Attachment: MAPREDUCE-4974.4.patch Two Changes, 1) if (newSize == 0) { break; } if (newSize maxLineLength) { break; } The newSize==0 check is eliminated since, (newSize maxLineLength) check includes that condition as well. The (newSize == 0) check outside the loop is retained as such. 2) compressionCodecs = new coompressionCodecFactory(job); codec = compressionCodecs.getCodec(file); These lines of code are placed inside if (isCompressedInput()) { } Block So that , these objects would only be instantiated, if the input file is of a compressed format. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584291#comment-13584291 ] Gelesh commented on MAPREDUCE-4974: --- [~snihalani], Please review, Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585001#comment-13585001 ] Gelesh commented on MAPREDUCE-4974: --- My self and [~ak.a...@aol.com] , are of.the opinion that we should also do something upon therepeated null check, Ans per the discussions over here , that part of optimization, seems to be non atracrtive. Hence the latest patch , we had eliminated null chech change. The remaining changes done, are mentioned iin comment. Please review Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583403#comment-13583403 ] Gelesh commented on MAPREDUCE-4974: --- [~snihalani], Thanks for bring up that very valid point. In That Case, What if we eliminate the null check for Value alone, And keep the Null Check for Key as such .. ? Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583406#comment-13583406 ] Gelesh commented on MAPREDUCE-4974: --- And Also, as [~ak.a...@aol.com] has mentioned, 1) To avoid ' if (newSize == 0) ' check inside the loop, 2) if we have ' compressionCodecs codec instantiated only if its a compressed input. ' Hope These two points are valid, Please share your thoughts... Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581170#comment-13581170 ] Gelesh commented on MAPREDUCE-4974: --- [~tlipcon], [~snihalani], [~kkambatl] Please share your thoughts Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577552#comment-13577552 ] Gelesh commented on MAPREDUCE-4974: --- The Existing test case is enough , because its just a code optimization, Could any body, have a look and comment please .. ? Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4974: -- Attachment: MAPREDUCE-4974.3.patch [~ak.a...@aol.com]'s patch 4974.2 had shown all the lines as new lines, because of code reformatting. The same changes were captured, and a patch was build against previous commit. This time the size of patch is 3+KB. Please review. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575754#comment-13575754 ] Gelesh commented on MAPREDUCE-4974: --- [~hadoopqa], as mentioned before, this is just an improvement. No new features added or removed. Existing test case holds for this as well. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576007#comment-13576007 ] Gelesh commented on MAPREDUCE-4974: --- [~ak.a...@aol.com] Seems like reformatting has shown entire LOC as new LOC, instead of changes alone. [~kkambatl], do we need to really re put the patch, so that the patch size would reduce. If not, could you please act or advice over the course of action. In case, if some changes over the code is required, please mention that too, our next patch would incorporate the same. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572627#comment-13572627 ] Gelesh commented on MAPREDUCE-4974: --- [~snihalani], .. while condition of getFilePosition = end evaluates to true, then, we'll hit NPE .. The Text object value, which is pased to readLine, would not be null, since that is taken care at initialize method, which is called prior to nextKeyValue(). While(nextKeyValue()) loop would end at once, the newSize (the size of newly fetched value equals zero. Here Key And Value , are set to null. But they aren't referred any more after While(nextKeyValue()) loop, and so NPE is not likely to occur. Please verify, and kindly correct me if we have gone wrong, some where. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573199#comment-13573199 ] Gelesh commented on MAPREDUCE-4974: --- As [~snihalani] has mentioned, a buggy programs that may call next KeyValue.. condition though being a little hypothetical, but still possible. 1) Inorder to avoid that, shall we have the null assignment of key value in close() method.? 2) Also shall, we have compressionCodecs also assigned as null ? Either me or [~ak.a...@aol.com] would upload a re work on the same. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573202#comment-13573202 ] Gelesh commented on MAPREDUCE-4974: --- Also, this change has instantiated objects related to compression, only if its a compressed file Inorder to ship the first line, a readLine is called, and this change would not create a new Text, but use the available 'value' for the method call. Hope some body could share their thoughts on this two changes as well. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4974: -- Assignee: Gelesh (was: Arun A K) Target Version/s: 0.23.5, 0.23.4, 2.0.1-alpha, 2.0.0-alpha, 1.1.1, 1.0.4, 1.0.0 (was: 1.0.0, 1.0.4, 1.1.1, 2.0.0-alpha, 2.0.1-alpha, 0.23.4, 0.23.5) Status: Patch Available (was: Open) optimising the LineRecordReader initialize method - Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 0.23.5, 2.0.2-alpha Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4974: -- Attachment: MAPREDUCE-4974.1.patch Combined thoughts of mine Arun AK's, optimising the LineRecordReader initialize method - Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570140#comment-13570140 ] Gelesh commented on MAPREDUCE-4974: --- Some body please review the patch, I couldnt even see the hadoop QA running on this. Kindly advice optimising the LineRecordReader initialize method - Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570154#comment-13570154 ] Gelesh commented on MAPREDUCE-4974: --- Its a improvement to the existing, no new features added or deleted, And hence, existing test case would suffice. optimising the LineRecordReader initialize method - Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570567#comment-13570567 ] Gelesh commented on MAPREDUCE-4974: --- [~tlipcon] nextKeyValue() is called as many number of times, the delimiter, or the new line has occurred, with in a given split. Each Time, it executes the below code, -if (key == null) { - key = new LongWritable(); -} -key.set(pos); -if (value == null) { - value = new Text(); -} Only at the first iteration, the condition would hold true, and Key Value objects would be created. This could also be done, if we have Key Value objects created at the initialize phase, and we can skip this null check. Also, -compressionCodecs = new CompressionCodecFactory(job); -codec = compressionCodecs.getCodec(file); Need to be done , only when it uses a compressed input file. This change is also brought. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571123#comment-13571123 ] Gelesh commented on MAPREDUCE-4974: --- [~tlipcon] I tried out an estimation,on Local, with small data, subtracting the the long value obtained from System.nanoTime() at the beginning and at the end of the method. Average time difference was 200 Nano Seconds per each anomic call made to nextKeyValue(), excluding the very first call, since it involves the object creation. The total time difference would be 200 * number of Key Value pairs generated per each Map Task. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562621#comment-13562621 ] Gelesh commented on MAPREDUCE-4882: --- Could you please share how is it impacting ? Error in estimating the length of the output file in Spill Phase Key: MAPREDUCE-4882 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 1.0.3 Environment: Any Environment Reporter: Lijie Xu Labels: patch Original Estimate: 1h Remaining Estimate: 1h The sortAndSpill() method in MapTask.java has an error in estimating the length of the output file. The long size should be (bufvoid - bufstart) + bufend not (bufvoid - bufend) + bufstart when bufend bufstart. Here is the original code in MapTask.java. private void sortAndSpill() throws IOException, ClassNotFoundException, InterruptedException { //approximate the length of the output file to be the length of the //buffer + header lengths for the partitions long size = (bufend = bufstart ? bufend - bufstart : (bufvoid - bufend) + bufstart) + partitions * APPROX_HEADER_LENGTH; FSDataOutputStream out = null; -- I had a test on TeraSort. A snippet from mapper's log is as follows: MapTask: Spilling map output: record full = true MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440 MapTask: kvstart = 262142; kvend = 131069; length = 655360 MapTask: Finished spill 3 In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 52428700 (52 MB) because the number of spilled records is 524287 and each record costs 100B. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4519) In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting characte
[ https://issues.apache.org/jira/browse/MAPREDUCE-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429082#comment-13429082 ] Gelesh commented on MAPREDUCE-4519: --- I have found a similar Bug And a fix, MAPREDUCE-4512. Please reffer the patch, and kindly encorporate the same. While fixing I too have encounted such a senario, I think this occur at the end of the buffer which would capture 4096 Charactors. My understanding is the ending and begining of next buffer can and the delimiter indexses are not properly handled. This is resulting in some or the other bugs. Tried solving , but the fix resulted in some new bugs. The once all the senario is caught we can ensure a posible fix. In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character sequence in delimiter were found missing in certain cases in the Map Output - Key: MAPREDUCE-4519 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4519 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Environment: Linux- Ubuntu 10.04 Reporter: Arun A K Labels: hadoop, mapreduce, textinputformat, textinputformat.record.delimiter Fix For: 0.20.2 Original Estimate: 168h Remaining Estimate: 168h Set textinputformat.record.delimiter as /entity Suppose the input is a text file with the following content entityid1/idnameUser1/name/entityentityid2/idnameUser2/name/entityentityid3/idnameUser3/name/entityentityid4/idnameUser4/name/entityentityid5/idnameUser5/name/entity Mapper was expected to get value as Value 1 - entityid1/idnameUser1/name Value 2 - entityid2/idnameUser2/name Value 3 - entityid3/idnameUser3/name Value 4 - entityid4/idnameUser4/name Value 5 - entityid5/idnameUser5/name According to this bug Mapper gets value Value 1 - entityid1/idnameUser1/name Value 2 - entityid2/idnameUser2/name Value 3 - entityid3idnameUser3/name Value 4 - entityid4/idnameUser4name Value 5 - entityid5/idnameUser5/name The pattern shown above need not occur for value 1,2,3 necessarily. The bug occurs at some random positions in the map input. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence
[ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4512: -- Description: TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and the remaining input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter. eg delimiter =record; and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name Here string =Bangalorrecord 3: satisfy two conditions 1) contains the delimiter record 2) The character / character sequence immediately before the delimiter (ie ' r ') matches with first character (or character sequence ) of delimiter. (ie =Bangalor ends with and Delimiter starts with same character/char sequence 'r' ), Here the delimiter is not encountered by the program resulting in improper value text in map that contains the delimiter was: TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and reaming input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter. eg delimiter =record; and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name Here string =Bangalorrecord 3: satisfy two condition 1) contains the delimiter record 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter. (ie =Bangalor ends with and Delimiter starts with same character/char sequence 'r' ), Hear the delimiter is skipped Environment: Linux (was: Lynux) Affects Version/s: 0.20.204.0 0.21.0 1.0.3 Test case input file text record 1 name: Java Location:UAErecord 2 name:Gelesh Location:Bangalorrecord 3 name Hadoop Location:Kerala Delimiter = record expected values in map 1 name: Java Location:UAE 2 name:Gelesh Location:Bangalor 3 name Hadoop Location:Kerala Actual values received in map 1 name: Java Location:UAE 2 name:Gelesh Location:Bangalorrecord 3 name Hadoop Location:Kerala TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence - Key: MAPREDUCE-4512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/mumak, mr-am, mrv1, mrv2, task Affects Versions: 0.20.204.0, 0.21.0, 1.0.3, 2.0.0-alpha Environment: Linux Reporter: Gelesh Labels: patch Fix For: 0.20.204.0 Attachments: MAPREDUCE-4512.txt Original Estimate: 1m Remaining Estimate: 1m TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and the remaining input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter. eg delimiter =record; and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name Here string =Bangalorrecord 3: satisfy two conditions 1) contains the delimiter record 2) The character / character sequence immediately before the delimiter (ie ' r ') matches with first character (or character sequence ) of delimiter. (ie =Bangalor ends with and Delimiter starts with same character/char sequence 'r' ), Here the delimiter is not encountered by the program resulting in improper value text in map that contains the delimiter -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence
Gelesh created MAPREDUCE-4512: - Summary: TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence Key: MAPREDUCE-4512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/mumak, mr-am, mrv1, mrv2, task Affects Versions: 2.0.0-alpha Environment: Lynux Reporter: Gelesh Fix For: 0.20.204.0 TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and reaming input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter. eg delimiter =record; and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name Here string =Bangalorrecord 3: satisfy two condition 1) contains the delimiter record 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter. (ie =Bangalor ends with and Delimiter starts with same character/char sequence 'r' ), Hear the delimiter is skipped -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence
[ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4512: -- Status: Patch Available (was: Open) just one line of code change @ LineReader, would do. Tested Any issues please let me know to help further gelesh.had...@gmail.com TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence - Key: MAPREDUCE-4512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/mumak, mr-am, mrv1, mrv2, task Affects Versions: 2.0.0-alpha Environment: Lynux Reporter: Gelesh Labels: patch Fix For: 0.20.204.0 Original Estimate: 1m Remaining Estimate: 1m TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and reaming input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter. eg delimiter =record; and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name Here string =Bangalorrecord 3: satisfy two condition 1) contains the delimiter record 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter. (ie =Bangalor ends with and Delimiter starts with same character/char sequence 'r' ), Hear the delimiter is skipped -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence
[ https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4512: -- Attachment: MAPREDUCE-4512.txt Just One line code change at LineRecord. Tested in case there is any issue please mail me gelesh.had...@gmail.com TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence - Key: MAPREDUCE-4512 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/mumak, mr-am, mrv1, mrv2, task Affects Versions: 2.0.0-alpha Environment: Lynux Reporter: Gelesh Labels: patch Fix For: 0.20.204.0 Attachments: MAPREDUCE-4512.txt Original Estimate: 1m Remaining Estimate: 1m TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and reaming input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter. eg delimiter =record; and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name Here string =Bangalorrecord 3: satisfy two condition 1) contains the delimiter record 2) The character / character sequence immediately b4 the delimiter (ie 'r') matches with first character (or character sequence ) of delimiter. (ie =Bangalor ends with and Delimiter starts with same character/char sequence 'r' ), Hear the delimiter is skipped -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira