[jira] [Commented] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter

2015-04-21 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506465#comment-14506465
 ] 

Gelesh commented on MAPREDUCE-5733:
---

 to use a single public constant, which class and package it should belong ?
I thing if Map Red is also to be fixed we need to place a seperate similar 
static final decleration.
Instead if we would use a single refrence in any one package either mapred or 
mapreduce the users of other package would be forced to have a import statment 
just to accesss this string call.  

 Define and use a constant for property textinputformat.record.delimiter
 -

 Key: MAPREDUCE-5733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Gelesh
Assignee: Gelesh
Priority: Trivial
 Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , 
 is bound to typo error. Lets have it as a Static String in some class, to 
 minimise such error. This would also help in IDE like eclipse suggesting the 
 String.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter

2015-04-20 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-5733:
--
Attachment: MAPREDUCE-5733_2.patch

Patch

 Define and use a constant for property textinputformat.record.delimiter
 -

 Key: MAPREDUCE-5733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Gelesh
Assignee: Gelesh
Priority: Trivial
 Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , 
 is bound to typo error. Lets have it as a Static String in some class, to 
 minimise such error. This would also help in IDE like eclipse suggesting the 
 String.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter

2015-04-20 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502627#comment-14502627
 ] 

Gelesh commented on MAPREDUCE-5733:
---

This feature could be tested using a MR Unit, or with a get and set on 
Configuration object like
conf.set(TextInputFormat.DELIMITER,/record) 
Asset(conf.get(textInputFormat.record.delimiter,/record)

Since its just a Static variable decleration, I dont think we need to place a 
test case for the same.

 Define and use a constant for property textinputformat.record.delimiter
 -

 Key: MAPREDUCE-5733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Gelesh
Assignee: Gelesh
Priority: Trivial
 Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , 
 is bound to typo error. Lets have it as a Static String in some class, to 
 minimise such error. This would also help in IDE like eclipse suggesting the 
 String.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter

2015-04-20 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502626#comment-14502626
 ] 

Gelesh commented on MAPREDUCE-5733:
---

This feature could be tested using a MR Unit, or with a get and set on 
Configuration object like
conf.set(TextInputFormat.DELIMITER,/record) 
Asset(conf.get(textInputFormat.record.delimiter,/record)

Since its just a Static variable decleration, I dont think we need to place a 
test case for the same.

 Define and use a constant for property textinputformat.record.delimiter
 -

 Key: MAPREDUCE-5733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Gelesh
Assignee: Gelesh
Priority: Trivial
 Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , 
 is bound to typo error. Lets have it as a Static String in some class, to 
 minimise such error. This would also help in IDE like eclipse suggesting the 
 String.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5733) Define and use a constant for property textinputformat.record.delimiter

2015-04-20 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-5733:
--
Assignee: Gelesh  (was: Abhilash S R)
  Status: Patch Available  (was: Open)

 Define and use a constant for property textinputformat.record.delimiter
 -

 Key: MAPREDUCE-5733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Gelesh
Assignee: Gelesh
Priority: Trivial
 Attachments: MAPREDUCE-5733.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , 
 is bound to typo error. Lets have it as a Static String in some class, to 
 minimise such error. This would also help in IDE like eclipse suggesting the 
 String.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5143) TestLineRecordReader has no test case for compressed files

2014-07-06 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-5143:
--

Summary: TestLineRecordReader has no test case for compressed files  (was: 
TestLineRecordReader was no test case for compressed files)

 TestLineRecordReader has no test case for compressed files
 --

 Key: MAPREDUCE-5143
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5143
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, trunk, 2.1.0-beta
Reporter: Sonu Prathap
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: MAPREDUCE-5143.1.patch, MAPREDUCE-5143.2.patch


 TestLineRecordReader was no test case for compressed files



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5733) (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise such

2014-01-23 Thread Gelesh (JIRA)
Gelesh created MAPREDUCE-5733:
-

 Summary: (Configugration) 
conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo 
error. Lets have it as a Static String in some class, to minimise such error. 
This would also help in IDE like eclipse suggesting the String.
 Key: MAPREDUCE-5733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Gelesh
Priority: Trivial


(Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , 
is bound to typo error. Lets have it as a Static String in some class, to 
minimise such error. This would also help in IDE like eclipse suggesting the 
String.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5733) (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , is bound to typo error. Lets have it as a Static String in some class, to minimise suc

2014-01-23 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879874#comment-13879874
 ] 

Gelesh commented on MAPREDUCE-5733:
---

[~abhilashsr2008] Thanks a lot :-)


 (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , 
 is bound to typo error. Lets have it as a Static String in some class, to 
 minimise such error. This would also help in IDE like eclipse suggesting the 
 String.
 

 Key: MAPREDUCE-5733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Gelesh
Assignee: Abhilash S R
Priority: Trivial
  Labels: patch
 Attachments: MAPREDUCE-5733.patch

   Original Estimate: 10m
  Remaining Estimate: 10m

 (Configugration) conf.set(textinputformat.record.delimiter,myDelimiter) , 
 is bound to typo error. Lets have it as a Static String in some class, to 
 minimise such error. This would also help in IDE like eclipse suggesting the 
 String.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5143) TestLineRecordReader was no test case for compressed files

2013-05-28 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668192#comment-13668192
 ] 

Gelesh commented on MAPREDUCE-5143:
---

[~ozawa],
Could you please add this as diff to https://reviews.apache.org/r/11456/
I tried, but failed,

The file 
'hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java'
 (r69253f4) could not be found in the repository

 TestLineRecordReader was no test case for compressed files
 --

 Key: MAPREDUCE-5143
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5143
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, trunk, 2.0.5-beta
Reporter: Sonu Prathap
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: MAPREDUCE-5143.1.patch


 TestLineRecordReader was no test case for compressed files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5216) While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits.

2013-05-07 Thread Gelesh (JIRA)
Gelesh created MAPREDUCE-5216:
-

 Summary: While using TextSplitter in DataDrivenDBInputformat, the 
lower limit (split start) always remains the same, for all splits.
 Key: MAPREDUCE-5216
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5216
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Gelesh




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5216) While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits.

2013-05-07 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-5216:
--

  Due Date: 7/May/13
   Description: 
While using TextSplitter in DataDrivenDBInputformat, the lower limit (split 
start) always remains the same, for all splits.
ie, 
Split 1 Start =A, End = M, Split 2 Start =A, End = P, Split 3 Start =A, End = S,

instead of
Split 1 Start =A, End = M, Split 2 Start =M, End = P, Split 3 Start =P, End = S,
Remaining Estimate: 1h
 Original Estimate: 1h

Nithin is working on the same.

 While using TextSplitter in DataDrivenDBInputformat, the lower limit (split 
 start) always remains the same, for all splits.
 ---

 Key: MAPREDUCE-5216
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5216
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Gelesh
   Original Estimate: 1h
  Remaining Estimate: 1h

 While using TextSplitter in DataDrivenDBInputformat, the lower limit (split 
 start) always remains the same, for all splits.
 ie, 
 Split 1 Start =A, End = M, Split 2 Start =A, End = P, Split 3 Start =A, End = 
 S,
 instead of
 Split 1 Start =A, End = M, Split 2 Start =M, End = P, Split 3 Start =P, End = 
 S,

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-04-11 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628855#comment-13628855
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~jira.shegalov] .. thanks for sharing  your thoughts,

I have tested using JUnit run of TestLineRecordReader , but as of now, for 
compressed input test case is not incorporated in TestLineRecordReader. Thats a 
place we need to cross check, but hope the code would hold good, because 
modification in this area is minimal.

The aim was to perfomance enhance, by removing the null check ..  but the 
incompatibility with any build happen upon the existing may give NPE , as 
discussed above ([~snihalani]'s comments,

The patch was limited to
1) removing the null assignments for the key  Value  
2) limiting CompressionCodecFactory ,  and Codec to method local scope
3) removing line 170-173

 if (newSize == 0) {
   break;
  }
Unnecessary ==0 check inside a look. ... Because the code to handle this is 
there iut side the loop, and the code which does the same seems of no value add.

4)  in order to achieve point 2 , private boolen isCompressedInput variable was 
introduces instead if 
private boolean isCompressedInput();
 method.

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-04-08 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626095#comment-13626095
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~jlowe], [~revans2], [~jira.shegalov] , [~snihalani], [~kkambatl] 
Could any body please share views / suggestions ...  

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory

2013-04-07 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625123#comment-13625123
 ] 

Gelesh commented on MAPREDUCE-4879:
---

[~jira.shegalov]
Since TeraOutputFormat extends FileOutputFormat, it must be in compliance with 
checkOutputSpecs(), of FileOutputFormat. And checkOutputSpecs() is supposed to 
be called from Job Client. If there is any issue, I feel it would be better to 
fix it within the existing control flow.

 TeraOutputFormat may overwrite an existing output directory
 ---

 Key: MAPREDUCE-4879
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: trunk
Reporter: Gera Shegalov
 Attachments: MAPREDUCE-4879-trunk.patch, 
 MAPREDUCE-4879-trunk-rev1.patch


 Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs 
 from writing into an existing directory, and potentially overwriting previous 
 runs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4879) TeraOutputFormat may overwrite an existing output directory

2013-04-07 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625124#comment-13625124
 ] 

Gelesh commented on MAPREDUCE-4879:
---

Could you please , post the same over review board, so that we could get a 
better insight about the code change.

 TeraOutputFormat may overwrite an existing output directory
 ---

 Key: MAPREDUCE-4879
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4879
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: trunk
Reporter: Gera Shegalov
 Attachments: MAPREDUCE-4879-trunk.patch, 
 MAPREDUCE-4879-trunk-rev1.patch


 Unlike FileOutputFormat, TeraOutputFormat does not prevent TeraGen/Sort jobs 
 from writing into an existing directory, and potentially overwriting previous 
 runs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-04-04 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622452#comment-13622452
 ] 

Gelesh commented on MAPREDUCE-4974:
---

Since this is just an optimization and existing test case would suffice, hope 
this is +1
Could some body kindly review.

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-04-03 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4974:
--

Target Version/s: 0.23.5, 0.23.4, 2.0.1-alpha, 2.0.0-alpha, 1.1.1, 1.0.4, 
1.0.0  (was: 1.0.0, 1.0.4, 1.1.1, 2.0.0-alpha, 2.0.1-alpha, 0.23.4, 0.23.5)
  Status: Patch Available  (was: Reopened)

Reduced the scope of compressionCodecs  codec

Introduced boolean isCompressedInput instead of boolean isCompressedInput()

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 0.23.5, 2.0.2-alpha
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-04-03 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4974:
--

Attachment: MAPREDUCE-4974.5.patch

CompressionCodecFactory compressionCodecs, and CompressionCodec codec, object 
made local to initialise(), private boolean isCompressedInput  introduced 
instead of private boolean isCompressedInput()

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-04-03 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620877#comment-13620877
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~hadoopqa] Automated check is yet to act over patch .5. Kindly advice.
Please refer the review board https://reviews.apache.org/r/9440/diff/3/

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch, MAPREDUCE-4974.5.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-04-02 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619812#comment-13619812
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~jira.shegalov], I too apologise for not noticing you comments over review 
board. I had not much idea about review board, and was expecting the review 
comments over here(Jira).

Thanks for sharing your thoughts.

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-04-02 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619813#comment-13619813
 ] 

Gelesh commented on MAPREDUCE-4974:
---


[~jira.shegalov], [~revans2],
I would suggest to have isCompressedInput a private boolean variable by default 
false, instead of isCompressedInput() method.
This would help us to reduce the scope of Codec object along with 
CompressionCodecFactory object, to local. Which as of now is a class variable ?

I would be patching this modification shortly.

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-03-31 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618339#comment-13618339
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~revans2], [~jlowe],
Could any of you please act on this ? 

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-28 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589433#comment-13589433
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~ak.a...@aol.com] has put this patch on review board.(Thanks AK)
[~snihalani], Please reffer this link, to visualize the patch diffrence
https://reviews.apache.org/r/9440/diff/#index_header

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-26 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586946#comment-13586946
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~snihalani], I think you reffred an old patch,
Please look at  MAPREDUCE-4974.4.patch

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
 MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-26 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4974:
--

Attachment: (was: MAPREDUCE-4974.1.patch)

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-22 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4974:
--

Attachment: MAPREDUCE-4974.4.patch

Two Changes,

1)
if (newSize == 0) {  break; }
if (newSize  maxLineLength) { break; }

The newSize==0 check is eliminated since,
(newSize  maxLineLength) check includes that condition as well.
The (newSize == 0) check outside the loop is retained as such.

2)
compressionCodecs = new coompressionCodecFactory(job); 
codec = compressionCodecs.getCodec(file);

These lines of code are placed inside
if (isCompressedInput()) { } Block
So that , these objects would only be instantiated, if the input file is of a 
compressed format. 

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
 MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-22 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584291#comment-13584291
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~snihalani],
Please review,

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
 MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-22 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585001#comment-13585001
 ] 

Gelesh commented on MAPREDUCE-4974:
---

My self and [~ak.a...@aol.com] , are of.the opinion that we should also do 
something upon therepeated null check, 
Ans per the discussions over here ,  that part of optimization, seems to be non 
atracrtive. Hence the latest patch , we had eliminated null chech change. The 
remaining changes done, are mentioned iin comment. Please review


 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
 MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-21 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583403#comment-13583403
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~snihalani],
Thanks for bring up that very valid point.
In That Case, What if we eliminate the null check for Value alone,
And keep the Null Check for Key as such .. ?


 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
 MAPREDUCE-4974.3.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-21 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583406#comment-13583406
 ] 

Gelesh commented on MAPREDUCE-4974:
---

And Also, as [~ak.a...@aol.com] has mentioned,
1) To avoid ' if (newSize == 0) ' check inside the loop,
2) if we have ' compressionCodecs  codec instantiated only if its a compressed 
input. '

Hope These two points are valid,
Please share your thoughts...

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
 MAPREDUCE-4974.3.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-19 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13581170#comment-13581170
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~tlipcon], [~snihalani], [~kkambatl]
Please share your thoughts

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
 MAPREDUCE-4974.3.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-13 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577552#comment-13577552
 ] 

Gelesh commented on MAPREDUCE-4974:
---

The  Existing test case is enough , because its just a code optimization,
Could any body, have a look and comment please .. ?


 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
 MAPREDUCE-4974.3.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-12 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4974:
--

Attachment: MAPREDUCE-4974.3.patch

[~ak.a...@aol.com]'s patch 4974.2 had shown all the lines as new lines, because 
of code reformatting. The same changes were captured, and a patch was build 
against previous commit. This time the size of patch is 3+KB. Please review.  

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
 MAPREDUCE-4974.3.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-11 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575754#comment-13575754
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~hadoopqa],
 as mentioned before, this is just an improvement.
 No new features added or removed.
 Existing test case holds for this as well.


 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-11 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576007#comment-13576007
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~ak.a...@aol.com] Seems like reformatting has shown entire LOC as new LOC, 
instead of changes alone.
[~kkambatl], do we need to really re put the patch, so that the patch size 
would reduce. If not, could you please act or advice over the course of action.
In case, if some changes over the code is required, please mention that too, 
our next patch would incorporate the same.
  

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-06 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572627#comment-13572627
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~snihalani],

 .. while condition of getFilePosition = end evaluates to true, then, we'll 
hit NPE ..
The Text object value, which is pased to readLine, would not be null, since 
that is taken care at initialize method, which is called prior to 
nextKeyValue().

While(nextKeyValue()) loop would end at once, the newSize (the size of newly 
fetched value equals zero.
Here Key And Value , are set to null.
But they aren't referred any more after While(nextKeyValue()) loop, and so NPE 
is not likely to occur.

Please verify, and kindly correct me if we have gone wrong, some where.

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.20.204.0, 0.24.0

 Attachments: MAPREDUCE-4974.1.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-06 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573199#comment-13573199
 ] 

Gelesh commented on MAPREDUCE-4974:
---

As [~snihalani] has mentioned, a buggy programs that may call next KeyValue..
condition though being a little hypothetical, but still possible.

1) Inorder to avoid that, shall we have the null assignment of key  value in 
close() method.?
2) Also shall, we have compressionCodecs also assigned as null  ?

Either me or [~ak.a...@aol.com] would upload a re work on the same.

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.20.204.0, 0.24.0

 Attachments: MAPREDUCE-4974.1.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-06 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573202#comment-13573202
 ] 

Gelesh commented on MAPREDUCE-4974:
---


Also, this change has instantiated objects related to compression, only if its 
a compressed file

Inorder to ship the first line, a readLine is called, and this change would not 
create a new Text, but use the available 'value' for the method call. 

Hope some body could share their thoughts on this two changes as well.

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.20.204.0, 0.24.0

 Attachments: MAPREDUCE-4974.1.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4974) optimising the LineRecordReader initialize method

2013-02-04 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4974:
--

Assignee: Gelesh  (was: Arun A K)
Target Version/s: 0.23.5, 0.23.4, 2.0.1-alpha, 2.0.0-alpha, 1.1.1, 1.0.4, 
1.0.0  (was: 1.0.0, 1.0.4, 1.1.1, 2.0.0-alpha, 2.0.1-alpha, 0.23.4, 0.23.5)
  Status: Patch Available  (was: Open)

 optimising the LineRecordReader initialize method
 -

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 0.23.5, 2.0.2-alpha
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.20.204.0, 0.24.0

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4974) optimising the LineRecordReader initialize method

2013-02-04 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4974:
--

Attachment: MAPREDUCE-4974.1.patch

Combined thoughts of mine  Arun AK's,

 optimising the LineRecordReader initialize method
 -

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.20.204.0, 0.24.0

 Attachments: MAPREDUCE-4974.1.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) optimising the LineRecordReader initialize method

2013-02-04 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570140#comment-13570140
 ] 

Gelesh commented on MAPREDUCE-4974:
---

Some body please review the patch,
I couldnt even see the hadoop QA running on this.
Kindly advice

 optimising the LineRecordReader initialize method
 -

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.20.204.0, 0.24.0

 Attachments: MAPREDUCE-4974.1.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) optimising the LineRecordReader initialize method

2013-02-04 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570154#comment-13570154
 ] 

Gelesh commented on MAPREDUCE-4974:
---

Its a improvement to the existing, no new features added or deleted,
And hence, existing test case would suffice.

 optimising the LineRecordReader initialize method
 -

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.20.204.0, 0.24.0

 Attachments: MAPREDUCE-4974.1.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-04 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570567#comment-13570567
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~tlipcon]
nextKeyValue() is called as many number of times, the delimiter, or the new 
line has occurred, with in a given split.
Each Time, it executes the below code,

-if (key == null) {
-  key = new LongWritable();
-}
-key.set(pos);
-if (value == null) {
-  value = new Text();
-}

Only at the first iteration, the condition would hold true, and Key Value 
objects would be created.
This could also be done, if we have Key  Value objects created at the 
initialize phase, and we can skip this null check.

Also,
-compressionCodecs = new CompressionCodecFactory(job);
-codec = compressionCodecs.getCodec(file);
Need to be done , only when it uses a compressed input file. This change is 
also brought. 

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.20.204.0, 0.24.0

 Attachments: MAPREDUCE-4974.1.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-04 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571123#comment-13571123
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~tlipcon]
I tried out an estimation,on Local, with small data, subtracting the the long 
value obtained from System.nanoTime() at the beginning and at the end of the 
method.

Average time difference was 200 Nano Seconds per each anomic call made to 
nextKeyValue(), excluding the very first call, since it involves the object 
creation.

The total time difference would be 200 * number of Key Value pairs generated 
per each Map Task.

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.20.204.0, 0.24.0

 Attachments: MAPREDUCE-4974.1.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2013-01-25 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562621#comment-13562621
 ] 

Gelesh commented on MAPREDUCE-4882:
---

Could you please share how is it impacting ?

 Error in estimating the length of the output file in Spill Phase
 

 Key: MAPREDUCE-4882
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2, 1.0.3
 Environment: Any Environment
Reporter: Lijie Xu
  Labels: patch
   Original Estimate: 1h
  Remaining Estimate: 1h

 The sortAndSpill() method in MapTask.java has an error in estimating the 
 length of the output file. 
 The long size should be (bufvoid - bufstart) + bufend not (bufvoid - 
 bufend) + bufstart when bufend  bufstart.
 Here is the original code in MapTask.java.
  private void sortAndSpill() throws IOException, ClassNotFoundException,
InterruptedException {
   //approximate the length of the output file to be the length of the
   //buffer + header lengths for the partitions
   long size = (bufend = bufstart
   ? bufend - bufstart
   : (bufvoid - bufend) + bufstart) +
   partitions * APPROX_HEADER_LENGTH;
   FSDataOutputStream out = null;
 --
 I had a test on TeraSort. A snippet from mapper's log is as follows:
 MapTask: Spilling map output: record full = true
 MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
 MapTask: kvstart = 262142; kvend = 131069; length = 655360
 MapTask: Finished spill 3
 In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
 52428700 (52 MB) because the number of spilled records is 524287 and each 
 record costs 100B.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4519) In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting characte

2012-08-06 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429082#comment-13429082
 ] 

Gelesh commented on MAPREDUCE-4519:
---

I have found a similar Bug And a fix, MAPREDUCE-4512. Please reffer the patch, 
and kindly encorporate the same.
While fixing I too have encounted such a senario, I think this occur at the end 
of the buffer which would capture 4096 Charactors.
My understanding is the ending and begining of next buffer can and the 
delimiter indexses are not properly handled.
This is resulting in some or the other bugs.

Tried solving , but the fix resulted in some new bugs. The once all the senario 
is caught we can ensure a posible fix.

 In TextInputFormat, while specifying textinputformat.record.delimiter the 
 character/character sequences in data file similar to starting 
 character/starting character sequence in delimiter were found missing in 
 certain cases in the Map Output
 -

 Key: MAPREDUCE-4519
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4519
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2
 Environment: Linux- Ubuntu 10.04
Reporter: Arun A K
  Labels: hadoop, mapreduce, textinputformat, 
 textinputformat.record.delimiter
 Fix For: 0.20.2

   Original Estimate: 168h
  Remaining Estimate: 168h

 Set textinputformat.record.delimiter as /entity
 Suppose the input is a text file with the following content
 entityid1/idnameUser1/name/entityentityid2/idnameUser2/name/entityentityid3/idnameUser3/name/entityentityid4/idnameUser4/name/entityentityid5/idnameUser5/name/entity
 Mapper was expected to get value as 
 Value 1 - entityid1/idnameUser1/name
 Value 2 - entityid2/idnameUser2/name
 Value 3 - entityid3/idnameUser3/name
 Value 4 - entityid4/idnameUser4/name
 Value 5 - entityid5/idnameUser5/name
 According to this bug Mapper gets value
 Value 1 - entityid1/idnameUser1/name
 Value 2 - entityid2/idnameUser2/name
 Value 3 - entityid3idnameUser3/name
 Value 4 - entityid4/idnameUser4name
 Value 5 - entityid5/idnameUser5/name
 The pattern shown above need not occur for value 1,2,3 necessarily. The bug 
 occurs at some random positions in the map input.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence

2012-08-04 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4512:
--

  Description: 
TextInputFormat delimiter  bug scenario , a character sequence of the input 
text,  in which the first character matches with the first character of 
delimiter, and the remaining input text character sequence  matches with the 
entire delimiter character sequence from the  starting position of the 
delimiter.

eg   delimiter =record;
and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location 
Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name   

Here string =Bangalorrecord 3:  satisfy two conditions 
1) contains the delimiter record
2) The character / character sequence immediately before the delimiter (ie ' r 
') matches with first character (or character sequence ) of delimiter.  (ie 
=Bangalor ends with and Delimiter starts with same character/char sequence 
'r' ),

Here the delimiter is not encountered by the program resulting in improper 
value text in map that contains the delimiter   

  was:
TextInputFormat delimiter  bug scenario , a character sequence of the input 
text,  in which the first character matches with the first character of 
delimiter, and reaming input text character sequence  matches with the entire 
delimiter character sequence from the  starting position of the delimiter.

eg   delimiter =record;
and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location 
Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name  

Here string =Bangalorrecord 3:  satisfy two condition 
1) contains the delimiter record
2) The character / character sequence immediately b4 the delimiter (ie 'r') 
matches with first character (or character sequence ) of delimiter.  (ie 
=Bangalor ends with and Delimiter starts with same character/char sequence 
'r' ),

Hear the delimiter is skipped

  Environment: Linux  (was: Lynux)
Affects Version/s: 0.20.204.0
   0.21.0
   1.0.3

Test case
input file text
record 1 name: Java Location:UAErecord 2 name:Gelesh Location:Bangalorrecord 3 
name Hadoop Location:Kerala

Delimiter = record

expected values in map
 1 name: Java Location:UAE
 2 name:Gelesh Location:Bangalor
 3 name Hadoop Location:Kerala

Actual values received in map
 1 name: Java Location:UAE
 2 name:Gelesh Location:Bangalorrecord 3 name Hadoop Location:Kerala



 TextInputFormat delimiter  bug:- Input Text portion ends with  Delimiter 
 starts with same char/char sequence
 -

 Key: MAPREDUCE-4512
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/mumak, mr-am, mrv1, mrv2, task
Affects Versions: 0.20.204.0, 0.21.0, 1.0.3, 2.0.0-alpha
 Environment: Linux
Reporter: Gelesh
  Labels: patch
 Fix For: 0.20.204.0

 Attachments: MAPREDUCE-4512.txt

   Original Estimate: 1m
  Remaining Estimate: 1m

 TextInputFormat delimiter  bug scenario , a character sequence of the input 
 text,  in which the first character matches with the first character of 
 delimiter, and the remaining input text character sequence  matches with the 
 entire delimiter character sequence from the  starting position of the 
 delimiter.
 eg   delimiter =record;
 and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com 
 Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name 
   
 Here string =Bangalorrecord 3:  satisfy two conditions 
 1) contains the delimiter record
 2) The character / character sequence immediately before the delimiter (ie ' 
 r ') matches with first character (or character sequence ) of delimiter.  (ie 
 =Bangalor ends with and Delimiter starts with same character/char sequence 
 'r' ),
 Here the delimiter is not encountered by the program resulting in improper 
 value text in map that contains the delimiter   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence

2012-08-03 Thread Gelesh (JIRA)
Gelesh created MAPREDUCE-4512:
-

 Summary: TextInputFormat delimiter  bug:- Input Text portion ends 
with  Delimiter starts with same char/char sequence
 Key: MAPREDUCE-4512
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/mumak, mr-am, mrv1, mrv2, task
Affects Versions: 2.0.0-alpha
 Environment: Lynux
Reporter: Gelesh
 Fix For: 0.20.204.0


TextInputFormat delimiter  bug scenario , a character sequence of the input 
text,  in which the first character matches with the first character of 
delimiter, and reaming input text character sequence  matches with the entire 
delimiter character sequence from the  starting position of the delimiter.

eg   delimiter =record;
and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com Location 
Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name  

Here string =Bangalorrecord 3:  satisfy two condition 
1) contains the delimiter record
2) The character / character sequence immediately b4 the delimiter (ie 'r') 
matches with first character (or character sequence ) of delimiter.  (ie 
=Bangalor ends with and Delimiter starts with same character/char sequence 
'r' ),

Hear the delimiter is skipped

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence

2012-08-03 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4512:
--

Status: Patch Available  (was: Open)

just one line of code change @ LineReader, would do. Tested
Any issues please let me know to help further
gelesh.had...@gmail.com

 TextInputFormat delimiter  bug:- Input Text portion ends with  Delimiter 
 starts with same char/char sequence
 -

 Key: MAPREDUCE-4512
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/mumak, mr-am, mrv1, mrv2, task
Affects Versions: 2.0.0-alpha
 Environment: Lynux
Reporter: Gelesh
  Labels: patch
 Fix For: 0.20.204.0

   Original Estimate: 1m
  Remaining Estimate: 1m

 TextInputFormat delimiter  bug scenario , a character sequence of the input 
 text,  in which the first character matches with the first character of 
 delimiter, and reaming input text character sequence  matches with the entire 
 delimiter character sequence from the  starting position of the delimiter.
 eg   delimiter =record;
 and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com 
 Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name 
  
 Here string =Bangalorrecord 3:  satisfy two condition 
 1) contains the delimiter record
 2) The character / character sequence immediately b4 the delimiter (ie 'r') 
 matches with first character (or character sequence ) of delimiter.  (ie 
 =Bangalor ends with and Delimiter starts with same character/char sequence 
 'r' ),
 Hear the delimiter is skipped

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4512) TextInputFormat delimiter bug:- Input Text portion ends with Delimiter starts with same char/char sequence

2012-08-03 Thread Gelesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gelesh updated MAPREDUCE-4512:
--

Attachment: MAPREDUCE-4512.txt

Just One line code change at LineRecord. Tested  in case there is any issue 
please mail me gelesh.had...@gmail.com

 TextInputFormat delimiter  bug:- Input Text portion ends with  Delimiter 
 starts with same char/char sequence
 -

 Key: MAPREDUCE-4512
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4512
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/mumak, mr-am, mrv1, mrv2, task
Affects Versions: 2.0.0-alpha
 Environment: Lynux
Reporter: Gelesh
  Labels: patch
 Fix For: 0.20.204.0

 Attachments: MAPREDUCE-4512.txt

   Original Estimate: 1m
  Remaining Estimate: 1m

 TextInputFormat delimiter  bug scenario , a character sequence of the input 
 text,  in which the first character matches with the first character of 
 delimiter, and reaming input text character sequence  matches with the entire 
 delimiter character sequence from the  starting position of the delimiter.
 eg   delimiter =record;
 and Text = record 1:- name = Gelesh e mail = gelesh.had...@gmail.com 
 Location Bangalore record 2: name = sdf  ..  location =Bangalorrecord 3: name 
  
 Here string =Bangalorrecord 3:  satisfy two condition 
 1) contains the delimiter record
 2) The character / character sequence immediately b4 the delimiter (ie 'r') 
 matches with first character (or character sequence ) of delimiter.  (ie 
 =Bangalor ends with and Delimiter starts with same character/char sequence 
 'r' ),
 Hear the delimiter is skipped

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira