[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-03-04 Thread Arun A K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592160#comment-13592160
 ] 

Arun A K commented on MAPREDUCE-4974:
-

@ All, Can we mark this issue as resolved? 

> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
> MAPREDUCE-4974.4.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-28 Thread Arun A K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589434#comment-13589434
 ] 

Arun A K commented on MAPREDUCE-4974:
-

Updated the review request url with the latest patch. Please find the same at - 
https://reviews.apache.org/r/9440/


> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
> MAPREDUCE-4974.4.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-25 Thread Arun A K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585797#comment-13585797
 ] 

Arun A K commented on MAPREDUCE-4974:
-

As [~gelesh] has mentioned, we had in mind, elimination of repeated null 
checks, while trying to optimize the code. If it is of not much significance, 
please go ahead with the latest available patch containing the rest of changes.

> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
> MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-15 Thread Arun A K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13579453#comment-13579453
 ] 

Arun A K commented on MAPREDUCE-4974:
-

Please find the review request. https://reviews.apache.org/r/9440/

> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
> MAPREDUCE-4974.3.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-11 Thread Arun A K (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun A K updated MAPREDUCE-4974:


Attachment: MAPREDUCE-4974.2.patch

Key & Value null assignment is in nextKeyValue(), is moved to close() to avoid 
NPE, as per the review comments.

Also,  if (newSize == 0) check is voided inside the loop,
since, if (newSize < maxLineLength)includes the same check.
How ever, if(newSize == 0) condition is checked outside the while loop. Hope 
this would also improve performance.

Combined effort with Gelesh.

> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-08 Thread Arun A K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574334#comment-13574334
 ] 

Arun A K commented on MAPREDUCE-4974:
-

If someone could add their review comments, we could look on for the mentioned 
changes. https://reviews.apache.org/r/9287/

> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Fix For: 0.20.204.0, 0.24.0
>
> Attachments: MAPREDUCE-4974.1.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-06 Thread Arun A K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572436#comment-13572436
 ] 

Arun A K commented on MAPREDUCE-4974:
-

Kindly advice if the optimization is worth. 

> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Fix For: 0.20.204.0, 0.24.0
>
> Attachments: MAPREDUCE-4974.1.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-04 Thread Arun A K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570161#comment-13570161
 ] 

Arun A K commented on MAPREDUCE-4974:
-

Quoting the review request url for this issue - 
https://reviews.apache.org/r/9287/

> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Fix For: 0.20.204.0, 0.24.0
>
> Attachments: MAPREDUCE-4974.1.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-04 Thread Arun A K (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun A K updated MAPREDUCE-4974:


Summary: Optimising the LineRecordReader initialize() method  (was: 
optimising the LineRecordReader initialize method)

> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Fix For: 0.20.204.0, 0.24.0
>
> Attachments: MAPREDUCE-4974.1.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4974) optimising the LineRecordReader initialize method

2013-02-04 Thread Arun A K (JIRA)
Arun A K created MAPREDUCE-4974:
---

 Summary: optimising the LineRecordReader initialize method
 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 0.23.5, 2.0.2-alpha
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Arun A K
 Fix For: 0.20.204.0, 0.24.0


I found there is a a scope of optimizing the code, over initialize() if we have 
compressionCodecs & codec instantiated only if its a compressed input.
Mean while Gelesh George Omathil, added if we could avoid the null check of key 
& value. This would time save, since for every next key value generation, null 
check is done. The intention being to instantiate only once and avoid NPE as 
well. Hope both could be met if initialize key & value over  initialize() 
method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4709) Counters that track max values

2013-01-25 Thread Arun A K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562567#comment-13562567
 ] 

Arun A K commented on MAPREDUCE-4709:
-

@Jeremy Lewi, 
Could you please elaborate on the problem with an example? 

> Counters that track max values
> --
>
> Key: MAPREDUCE-4709
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4709
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jeremy Lewi
>Priority: Minor
>
> A nice feature to help monitor MR jobs would be mapreduce counters that track 
> the maximum of some metric across all workers. These trackers would work just 
> like regular counters except it would track the max value of all arguments 
> passed to the "increment" function as opposed to summing them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4770) Hadoop jobs failing with FileNotFound Exception while the job is still running

2013-01-24 Thread Arun A K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562387#comment-13562387
 ] 

Arun A K commented on MAPREDUCE-4770:
-

Not sure if this could be the solution - 

IsolationRunner is a utility to help debug MapReduce programs.

To use the IsolationRunner, first set keep.failed.task.files to true (also see 
keep.task.files.pattern).

Next, go to the node on which the failed task ran and go to the TaskTracker's 
local directory and run the IsolationRunner:
$ cd /taskTracker/${taskid}/work
$ bin/hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml

IsolationRunner will run the failed task in a single jvm, which can be in the 
debugger, over precisely the same input.

Note that currently IsolationRunner will only re-run map tasks.

Reference : http://hadoop.apache.org/docs/r1.1.1/mapred_tutorial.html

> Hadoop jobs failing with FileNotFound Exception while the job is still running
> --
>
> Key: MAPREDUCE-4770
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4770
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Jaikannan Ramamoorthy
>
> We are having a strange issue in our Hadoop cluster. We have noticed that 
> some of our jobs fail with the with a file not found exception[see below]. 
> Basically the files in the "attempt_*" directory and the directory itself are 
> getting deleted while the task is still being run on the host. Looking 
> through some of the hadoop documentation I see that the job directory gets 
> wiped out when it gets a KillJobAction however I am not sure why it gets 
> wiped out while the job is still running.
> My question is what could be deleting it while the job is running? Any 
> thoughts or pointers on how to debug this would be helpful.
> Thanks!
> java.io.FileNotFoundException: 
> /hadoop/mapred/local_data/taskTracker//jobcache/job_201211030344_15383/attempt_201211030344_15383_m_000169_0/output/spill29.out
>  (Permission denied) at java.io.FileInputStream.open(Native Method) at 
> java.io.FileInputStream.(FileInputStream.java:120) at 
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.(RawLocalFileSystem.java:71)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:107)
>  at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:177) 
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:400) at 
> org.apache.hadoop.mapred.Merger$Segment.init(Merger.java:205) at 
> org.apache.hadoop.mapred.Merger$Segment.access$100(Merger.java:165) at 
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:418) at 
> org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381) at 
> org.apache.hadoop.mapred.Merger.merge(Merger.java:77) at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1692)
>  at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1322) 
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698) 
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>  at org.apache.hadoop.mapred.Child.main(Child.java:253)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4519) In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character

2012-08-06 Thread Arun A K (JIRA)
Arun A K created MAPREDUCE-4519:
---

 Summary: In TextInputFormat, while specifying 
textinputformat.record.delimiter the character/character sequences in data file 
similar to starting character/starting character sequence in delimiter were 
found missing in certain cases in the Map Output
 Key: MAPREDUCE-4519
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4519
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2
 Environment: Linux- Ubuntu 10.04
Reporter: Arun A K
 Fix For: 0.20.2


Set textinputformat.record.delimiter as ""

Suppose the input is a text file with the following content
1User12User23User34User45User5

Mapper was expected to get value as 

Value 1 - 1User1
Value 2 - 2User2
Value 3 - 3User3
Value 4 - 4User4
Value 5 - 5User5

According to this bug Mapper gets value

Value 1 - entity>1User1
Value 2 - id>2User2
Value 3 - 3id>User3
Value 4 - 4User4name>
Value 5 - 5User5

The pattern shown above need not occur for value 1,2,3 necessarily. The bug 
occurs at some random positions in the map input.
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira