[jira] Updated: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.

2009-11-01 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-1153:
--

   Resolution: Fixed
Fix Version/s: 0.22.0
 Release Note: Update the number of trackers and blacklisted trackers 
metrics when trackers are decommissioned.
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I just committed this.

> Metrics counting tasktrackers and blacklisted tasktrackers are not updated 
> when trackers are decommissioned.
> 
>
> Key: MAPREDUCE-1153
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.22.0
>Reporter: Hemanth Yamijala
>Assignee: Sharad Agarwal
> Fix For: 0.22.0
>
> Attachments: 1153.patch
>
>
> MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of 
> actual, blacklisted and decommissioned tasktrackers. When a tracker is 
> decommissioned, the tasktracker count or the blacklisted tracker count is not 
> decremented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.

2009-11-01 Thread Sharad Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772447#action_12772447
 ] 

Sharad Agarwal commented on MAPREDUCE-1153:
---

Test failure is due to MAPREDUCE-1124

> Metrics counting tasktrackers and blacklisted tasktrackers are not updated 
> when trackers are decommissioned.
> 
>
> Key: MAPREDUCE-1153
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.22.0
>Reporter: Hemanth Yamijala
>Assignee: Sharad Agarwal
> Attachments: 1153.patch
>
>
> MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of 
> actual, blacklisted and decommissioned tasktrackers. When a tracker is 
> decommissioned, the tasktracker count or the blacklisted tracker count is not 
> decremented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures

2009-11-01 Thread Christian Kunz (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772443#action_12772443
 ] 

Christian Kunz commented on MAPREDUCE-1171:
---

Yes, in absence of MAPREDUCE-318, MAPREDUCE-353 should be sufficient to work 
around the issue.

> Lots of fetch failures
> --
>
> Key: MAPREDUCE-1171
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Christian Kunz
>
> Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map 
> task failures because of 'Too many fetch-failures'.
> One of our jobs makes hardly any progress, because of 3000 reduces not able 
> to get map output of 2 trailing maps (with about 80GB output each), which 
> repeatedly are marked as failures because of reduces not being able to get 
> their map output.
> One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed 
> mapoutput fetch even after a single try when it was a read error 
> (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is 
> a good idea, as trailing map tasks will be attacked by all reduces 
> simultaneously.
> Here is a log output of a reduce task:
> {noformat}
> 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
> attempt_200910281903_0028_r_00_0 copy failed: 
> attempt_200910281903_0028_m_002781_1 from some host
> 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
> java.net.SocketTimeoutException: Read timed outat 
> java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
> 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task 
> attempt_200910281903_0028_r_00_0: Failed fetch #1 from 
> attempt_200910281903_0028_m_002781_1
> 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
> fetch map-output from attempt_200910281903_0028_m_002781_1 even after 
> MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to 
> the JobTracker.
> {noformat}
> Also I saw a few log messages which look suspicious as if successfully 
> fetched map output is discarded because of the map being marked as failed 
> (because of too many fetch failures). This would make the situation even 
> worse.
> {noformat}
> 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: 
> attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed 
> len: 23967845
> 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 
> 23967845 bytes (21882555 raw bytes) into RAM from 
> attempt_200910281903_0028_m_001076_0
> 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read 
> 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0
> 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from 
> attempt_200910281903_0028_m_001076_0 -> (20, 39772) from some host
> ...
> 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring 
> obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0'
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1177) TestTaskTrackerMemoryManager retries a task for more than 100 times.

2009-11-01 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1177:
---

Attachment: TEST-org.apache.hadoop.mapred.TestTaskTrackerMemoryManager.txt

Attaching the complete test log.

> TestTaskTrackerMemoryManager retries a task for more than 100 times.
> 
>
> Key: MAPREDUCE-1177
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1177
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker, test
>Reporter: Amareshwari Sriramadasu
> Attachments: 
> TEST-org.apache.hadoop.mapred.TestTaskTrackerMemoryManager.txt
>
>
> TestTaskTrackerMemoryManager retries a task for more than 100 times.
> The logs showing the same:
> {noformat}
> 2009-11-02 12:41:20,489 INFO  mapred.JobInProgress 
> (JobInProgress.java:completedTask(2530)) - Task 
> 'attempt_20091102123356106_0001_m_02_145' has completed 
> task_20091102123356106_0001_m_02 successfully.
> {noformat}
> Sometimes the test timesout also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1177) TestTaskTrackerMemoryManager retries a task for more than 100 times.

2009-11-01 Thread Amareshwari Sriramadasu (JIRA)
TestTaskTrackerMemoryManager retries a task for more than 100 times.


 Key: MAPREDUCE-1177
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1177
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker, test
Reporter: Amareshwari Sriramadasu


TestTaskTrackerMemoryManager retries a task for more than 100 times.
The logs showing the same:
{noformat}
2009-11-02 12:41:20,489 INFO  mapred.JobInProgress 
(JobInProgress.java:completedTask(2530)) - Task 
'attempt_20091102123356106_0001_m_02_145' has completed 
task_20091102123356106_0001_m_02 successfully.
{noformat}

Sometimes the test timesout also.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed

2009-11-01 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772432#action_12772432
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-323:
---

bq. Nick Rettinghouse, Tim Williamson, and Rajiv Chittajallu all suggested a 
preference for per-hour directories, in particular, USER//MM/DD/HH, an 
option you did not list. Should we perhaps err on the side of a deeper 
structure, to ensure that we don't have to re-structure things again?
Per-hour directories look like over-kill. On the average, For each user, there 
would be 10 jobs finished in an hour.

bq. However implementing Cluster.getJobHistoryUrl() would be expensive for 
archived jobs, since the jobtracker must search the entire directory tree.
Here, JobTracker need not  search the entire directory tree. If JobTracker does 
not have it in the cache, Job Client itself can do the search.

bq. Perhaps the directory structure should instead be based purely on the job 
ID? E.g., something like: jobtrackerstarttime/00/00/00
This looks fine. But when we have permissions in place, inserting user becomes 
difficult.

> Improve the way job history files are managed
> -
>
> Key: MAPREDUCE-323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-323
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Amar Kamat
>Assignee: Amareshwari Sriramadasu
>Priority: Critical
>
> Today all the jobhistory files are dumped in one _job-history_ folder. This 
> can cause problems when there is a need to search the history folder 
> (job-recovery etc). It would be nice if we group all the jobs under a _user_ 
> folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. 
> Jobs can be categorized using various features like _jobid, date, jobname_ 
> etc but using _username_ will make the search much more efficient and also 
> will not result into namespace explosion. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-962) NPE in ProcfsBasedProcessTree.destroy()

2009-11-01 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-962:
---

Release Note: Fixes an issue of NPE in ProcfsBasedProcessTree in a corner 
case.
  Status: Patch Available  (was: Open)

Allowing to go through Hudson

> NPE in ProcfsBasedProcessTree.destroy()
> ---
>
> Key: MAPREDUCE-962
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-962
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Vinod K V
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: HADOOP-6232.patch, MR-962.patch, MR-962.v1.1.patch, 
> MR-962.v1.patch
>
>
> This causes the following exception in TaskMemoryManagerThread. I observed 
> this while running TestTaskTrackerMemoryManager.
> {code}
> 2009-09-02 12:08:25,835 WARN  mapred.TaskMemoryManagerThread 
> (TaskMemoryManagerThread.java:run(239)) - \
> Uncaught exception in TaskMemoryManager while managing memory of 
> attempt_20090902120812252_0001_m_03_0 : \
> java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.assertPidPgrpidForMatch(ProcfsBasedProcessTree.java:234)
> at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.assertAndDestroyProcessGroup(ProcfsBasedProcessTree.java:257)
> at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.destroy(ProcfsBasedProcessTree.java:286)
> at 
> org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:229)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-962) NPE in ProcfsBasedProcessTree.destroy()

2009-11-01 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-962:
---

Attachment: MR-962.v1.1.patch

Attaching patch by removing the unnecessary array.

> NPE in ProcfsBasedProcessTree.destroy()
> ---
>
> Key: MAPREDUCE-962
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-962
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Vinod K V
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: HADOOP-6232.patch, MR-962.patch, MR-962.v1.1.patch, 
> MR-962.v1.patch
>
>
> This causes the following exception in TaskMemoryManagerThread. I observed 
> this while running TestTaskTrackerMemoryManager.
> {code}
> 2009-09-02 12:08:25,835 WARN  mapred.TaskMemoryManagerThread 
> (TaskMemoryManagerThread.java:run(239)) - \
> Uncaught exception in TaskMemoryManager while managing memory of 
> attempt_20090902120812252_0001_m_03_0 : \
> java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.assertPidPgrpidForMatch(ProcfsBasedProcessTree.java:234)
> at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.assertAndDestroyProcessGroup(ProcfsBasedProcessTree.java:257)
> at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.destroy(ProcfsBasedProcessTree.java:286)
> at 
> org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:229)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures

2009-11-01 Thread Christian Kunz (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772426#action_12772426
 ] 

Christian Kunz commented on MAPREDUCE-1171:
---

Just for the record, we use a 020.1 yahoo release.
I checked that Cloudera releases contain HADOOP-3327 as early as 
hadoop-0.20.0+61.

> Lots of fetch failures
> --
>
> Key: MAPREDUCE-1171
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Christian Kunz
>
> Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map 
> task failures because of 'Too many fetch-failures'.
> One of our jobs makes hardly any progress, because of 3000 reduces not able 
> to get map output of 2 trailing maps (with about 80GB output each), which 
> repeatedly are marked as failures because of reduces not being able to get 
> their map output.
> One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed 
> mapoutput fetch even after a single try when it was a read error 
> (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is 
> a good idea, as trailing map tasks will be attacked by all reduces 
> simultaneously.
> Here is a log output of a reduce task:
> {noformat}
> 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
> attempt_200910281903_0028_r_00_0 copy failed: 
> attempt_200910281903_0028_m_002781_1 from some host
> 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
> java.net.SocketTimeoutException: Read timed outat 
> java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
> 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task 
> attempt_200910281903_0028_r_00_0: Failed fetch #1 from 
> attempt_200910281903_0028_m_002781_1
> 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
> fetch map-output from attempt_200910281903_0028_m_002781_1 even after 
> MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to 
> the JobTracker.
> {noformat}
> Also I saw a few log messages which look suspicious as if successfully 
> fetched map output is discarded because of the map being marked as failed 
> (because of too many fetch failures). This would make the situation even 
> worse.
> {noformat}
> 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: 
> attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed 
> len: 23967845
> 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 
> 23967845 bytes (21882555 raw bytes) into RAM from 
> attempt_200910281903_0028_m_001076_0
> 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read 
> 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0
> 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from 
> attempt_200910281903_0028_m_001076_0 -> (20, 39772) from some host
> ...
> 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring 
> obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0'
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures

2009-11-01 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772424#action_12772424
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1171:


Christian, are you using Yahoo! distribution for 0.20? 

In branch 0.21, MAPREDUCE-353 makes connect and read timeout configurable for a 
job. Moreover, Shuffle is simplified by MAPREDUCE-318. Essentially, HADOOP-3327 
is no more there. 

Christian, Making connect and read timeout configurable should address this 
issue, right?

> Lots of fetch failures
> --
>
> Key: MAPREDUCE-1171
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Christian Kunz
>
> Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map 
> task failures because of 'Too many fetch-failures'.
> One of our jobs makes hardly any progress, because of 3000 reduces not able 
> to get map output of 2 trailing maps (with about 80GB output each), which 
> repeatedly are marked as failures because of reduces not being able to get 
> their map output.
> One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed 
> mapoutput fetch even after a single try when it was a read error 
> (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is 
> a good idea, as trailing map tasks will be attacked by all reduces 
> simultaneously.
> Here is a log output of a reduce task:
> {noformat}
> 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
> attempt_200910281903_0028_r_00_0 copy failed: 
> attempt_200910281903_0028_m_002781_1 from some host
> 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
> java.net.SocketTimeoutException: Read timed outat 
> java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
> 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task 
> attempt_200910281903_0028_r_00_0: Failed fetch #1 from 
> attempt_200910281903_0028_m_002781_1
> 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
> fetch map-output from attempt_200910281903_0028_m_002781_1 even after 
> MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to 
> the JobTracker.
> {noformat}
> Also I saw a few log messages which look suspicious as if successfully 
> fetched map output is discarded because of the map being marked as failed 
> (because of too many fetch failures). This would make the situation even 
> worse.
> {noformat}
> 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: 
> attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed 
> len: 23967845
> 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 
> 23967845 bytes (21882555 raw bytes) into RAM from 
> attempt_200910281903_0028_m_001076_0
> 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read 
> 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0
> 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from 
> attempt_200910281903_0028_m_001076_0 -> (20, 39772) from some host
> ...
> 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring 
> obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0'
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.

2009-11-01 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1143:
-

Status: Open  (was: Patch Available)

> runningMapTasks counter is not properly decremented in case of failed Tasks.
> 
>
> Key: MAPREDUCE-1143
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: rahul k singh
>Priority: Blocker
> Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, 
> MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch, 
> MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, 
> MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch, 
> MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.

2009-11-01 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1143:
-

Status: Patch Available  (was: Open)

> runningMapTasks counter is not properly decremented in case of failed Tasks.
> 
>
> Key: MAPREDUCE-1143
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: rahul k singh
>Priority: Blocker
> Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, 
> MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch, 
> MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, 
> MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch, 
> MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-962) NPE in ProcfsBasedProcessTree.destroy()

2009-11-01 Thread Hemanth Yamijala (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-962:
---

Status: Open  (was: Patch Available)

Apologies for looking at this late. The patch looks fine overall. One minor nit 
is that the test case testDestroyProcessTree is initializing an array procInfos 
without any need for it. Can this be removed and run through Hudson again - so 
I can commit it ?

> NPE in ProcfsBasedProcessTree.destroy()
> ---
>
> Key: MAPREDUCE-962
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-962
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Reporter: Vinod K V
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: HADOOP-6232.patch, MR-962.patch, MR-962.v1.patch
>
>
> This causes the following exception in TaskMemoryManagerThread. I observed 
> this while running TestTaskTrackerMemoryManager.
> {code}
> 2009-09-02 12:08:25,835 WARN  mapred.TaskMemoryManagerThread 
> (TaskMemoryManagerThread.java:run(239)) - \
> Uncaught exception in TaskMemoryManager while managing memory of 
> attempt_20090902120812252_0001_m_03_0 : \
> java.lang.NullPointerException
> at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.assertPidPgrpidForMatch(ProcfsBasedProcessTree.java:234)
> at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.assertAndDestroyProcessGroup(ProcfsBasedProcessTree.java:257)
> at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.destroy(ProcfsBasedProcessTree.java:286)
> at 
> org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:229)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1171) Lots of fetch failures

2009-11-01 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1171:
---

Affects Version/s: (was: 0.20.1)
   0.21.0

HADOOP-3327 went into branch 0.21

> Lots of fetch failures
> --
>
> Key: MAPREDUCE-1171
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.21.0
>Reporter: Christian Kunz
>
> Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map 
> task failures because of 'Too many fetch-failures'.
> One of our jobs makes hardly any progress, because of 3000 reduces not able 
> to get map output of 2 trailing maps (with about 80GB output each), which 
> repeatedly are marked as failures because of reduces not being able to get 
> their map output.
> One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed 
> mapoutput fetch even after a single try when it was a read error 
> (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is 
> a good idea, as trailing map tasks will be attacked by all reduces 
> simultaneously.
> Here is a log output of a reduce task:
> {noformat}
> 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
> attempt_200910281903_0028_r_00_0 copy failed: 
> attempt_200910281903_0028_m_002781_1 from some host
> 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
> java.net.SocketTimeoutException: Read timed outat 
> java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:129)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
> 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task 
> attempt_200910281903_0028_r_00_0: Failed fetch #1 from 
> attempt_200910281903_0028_m_002781_1
> 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
> fetch map-output from attempt_200910281903_0028_m_002781_1 even after 
> MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to 
> the JobTracker.
> {noformat}
> Also I saw a few log messages which look suspicious as if successfully 
> fetched map output is discarded because of the map being marked as failed 
> (because of too many fetch failures). This would make the situation even 
> worse.
> {noformat}
> 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: 
> attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed 
> len: 23967845
> 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 
> 23967845 bytes (21882555 raw bytes) into RAM from 
> attempt_200910281903_0028_m_001076_0
> 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read 
> 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0
> 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from 
> attempt_200910281903_0028_m_001076_0 -> (20, 39772) from some host
> ...
> 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring 
> obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0'
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1136) ConcurrentModificationException when tasktracker updates task status to jobtracker

2009-11-01 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772416#action_12772416
 ] 

Vinod K V commented on MAPREDUCE-1136:
--

bq. Is this because the access to jobs is not synchronized?
Yes

bq. Wouldn't this result in losing a heartbeat? 
No, the RPC in question is getAllJobs(), so no case of missing heartbeat. 
At-least not in this case.


> ConcurrentModificationException when tasktracker updates task status to 
> jobtracker
> --
>
> Key: MAPREDUCE-1136
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1136
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.2, 0.21.0, 0.22.0
>Reporter: Qi Liu
>
> In Hadoop 0.18.3, the following exception happened during a job execution. It 
> does not happen often.
> Here is the stack trace of the exception.
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
> at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
> at 
> org.apache.hadoop.mapred.JobTracker.getAllJobs(JobTracker.java:2376)
> at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
>  org.apache.hadoop.ipc.Client.call(Client.java:716)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1027) jobtracker.jsp can have an html text block for announcements by admins.

2009-11-01 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772415#action_12772415
 ] 

Vinod K V commented on MAPREDUCE-1027:
--

+1 for the implementation details in general.

> jobtracker.jsp can have an html text block for announcements by admins.
> ---
>
> Key: MAPREDUCE-1027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1027
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Reporter: Vinod K V
>
> jobtracker.jsp is the first page for users of Map/Reduce clusters and can be 
> used for sending information across to all users. It will be useful to have a 
> text block on this page where administrators can put any latest 
> notices/announcements time to time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-11-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772411#action_12772411
 ] 

Hadoop QA commented on MAPREDUCE-967:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423791/mapreduce-967.txt
  against trunk revision 831037.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

-1 findbugs.  The patch appears to cause Findbugs to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/116/testReport/
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/116/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/116/console

This message is automatically generated.

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-11-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Status: Open  (was: Patch Available)

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-11-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Status: Patch Available  (was: Open)

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-11-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Attachment: mapreduce-967.txt

Small update to previous patch - forgot to change references to RunJar to point 
to its new location.

> TaskTracker does not need to fully unjar job jars
> -
>
> Key: MAPREDUCE-967
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.21.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
> mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 
> 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
> up after them has a significant cost (both in wall clock and in unnecessary 
> heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1170) MultipleInputs doesn't work with new API in 0.20 branch

2009-11-01 Thread Jay Booth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated MAPREDUCE-1170:
-

Status: Open  (was: Patch Available)

Cancelling patch until this fully works

> MultipleInputs doesn't work with new API in 0.20 branch
> ---
>
> Key: MAPREDUCE-1170
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1170
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1
>Reporter: Jay Booth
> Fix For: 0.20.2
>
> Attachments: multipleInputs.patch
>
>
> This patch adds support for MultipleInputs (and KeyValueTextInputFormat) in 
> o.a.h.mapreduce.lib.input, working with the new API.  Included passing unit 
> test.  Include for 0.20.2?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772397#action_12772397
 ] 

Todd Lipcon commented on MAPREDUCE-1176:


Hi,

Could you please post this as a patch file against the hadoop-mapreduce trunk? 
This will allow Hudson to automatically test the change.

Also, a couple notes:
- please include the Apache license header at the top of these files.
- @author tags are discouraged in Apache projects
- Please include unit tests for this new code.

Thanks for the contribution - look forward to seeing this in trunk!

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthInputFormat.java, FixedLengthRecordReader.java
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)

2009-11-01 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772390#action_12772390
 ] 

Matei Zaharia commented on MAPREDUCE-961:
-

Hi Scott and Dhruba,

I've looked at the patch a little bit and have a few comments:
# I agree with Dhruba that it would be good to have the option of running 
multiple Hadoop clusters in parallel. It's also good design to allow the 
metrics data to be consumed by multiple sources.
# In MemBasedLoadManager.canLaunchTask, you are returning true in some cases 
and saying that this is "equivalent to the case of using only 
CapBasedLoadManager". How is that happening? I think you would need to return 
super.canLaunchTask(...), not true. The Fair Scheduler itself doesn't look at 
slot counts.
# It might be useful to use the max map slots / max reduce slots settings as 
upper bounds on the total number of tasks on each node, to limit the number of 
processes launched. In this case an administrator could configure the slots 
higher (e.g. 20 map slots and 10 reduce slots), and the node utilization would 
be used to determine when fewer than this number of tasks should be launched. 
Otherwise, a job with very low-utilization tasks could cause hundreds of 
processes to be launched on each node.
# Have you thought in detail about how the MemBasedLoadManager will work when 
the scheduler tries to launch multiple tasks per heartbeat (part of 
MAPREDUCE-706)? I think there are two questions:
#* First, you will need to cap the number of tasks launched per heartbeat based 
on free memory on the node, so that we don't end up launching too many tasks 
and overcommitting memory. One way to do this might be to count tasks we 
schedule against the free memory on the node, and conservatively estimate them 
to each use 2 GB or something (admin-configurable).
#* Second, it's important to launch both reduces and maps if both types of 
tasks are available. The current multiple-task-per-heartbeat code in 
MAPREDUCE-706 (and in all the other schedulers as far as I know) will first try 
to launch map tasks until canLaunchTask(TaskType.MAP) returns false (or until 
there are no pending map tasks), and will the look for pending reduce tasks. 
With the current MemBasedLoadManager, this would starve reduces whenever there 
are pending maps. It would be better to alternate between the two task types if 
both are available.

> ResourceAwareLoadManager to dynamically decide new tasks based on current 
> CPU/memory load on TaskTracker(s)
> ---
>
> Key: MAPREDUCE-961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-961
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/fair-share
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch
>
>
> Design and develop a ResouceAwareLoadManager for the FairShare scheduler that 
> dynamically decides how many maps/reduces to run on a particular machine 
> based on the CPU/Memory/diskIO/network usage in that machine.  The amount of 
> resources currently used on each task tracker is being fed into the 
> ResourceAwareLoadManager in real-time via an entity that is external to 
> Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-01 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Attachment: FixedLengthRecordReader.java
FixedLengthInputFormat.java

Attached source files

> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> 
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 0.20.1
> Environment: Any
>Reporter: BitsOfInfo
>Priority: Minor
> Attachments: FixedLengthInputFormat.java, FixedLengthRecordReader.java
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into 
> the mapreduce.lib.input package. These two classes can be used when you need 
> to read data from files containing fixed length (fixed width) records. Such 
> files have no CR/LF (or any combination thereof), no delimiters etc, but each 
> record is a fixed length, and extra data is padded with spaces. The data is 
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its 
> corresponding FixedLengthRecordReader. When creating a job that specifies 
> this input format, the job must have the 
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that 
> InputSplits do not contain any partial records since with fixed records there 
> is no way to determine where a record begins if that were to occur. Each 
> InputSplit passed to the FixedLengthRecordReader will start at the beginning 
> of a record, and the last byte in the InputSplit will be the last byte of a 
> record. The override of computeSplitSize() delegates to FileInputFormat's 
> compute method, and then adjusts the returned split size by doing the 
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed 
> files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-11-01 Thread BitsOfInfo (JIRA)
Contribution: FixedLengthInputFormat and FixedLengthRecordReader


 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor


Hello,
I would like to contribute the following two classes for incorporation into the 
mapreduce.lib.input package. These two classes can be used when you need to 
read data from files containing fixed length (fixed width) records. Such files 
have no CR/LF (or any combination thereof), no delimiters etc, but each record 
is a fixed length, and extra data is padded with spaces. The data is one 
gigantic line within a file.

Provided are two classes first is the FixedLengthInputFormat and its 
corresponding FixedLengthRecordReader. When creating a job that specifies this 
input format, the job must have the 
"mapreduce.input.fixedlengthinputformat.record.length" property set as follows

myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);

OR

myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
[myFixedRecordLength]);

This input format overrides computeSplitSize() in order to ensure that 
InputSplits do not contain any partial records since with fixed records there 
is no way to determine where a record begins if that were to occur. Each 
InputSplit passed to the FixedLengthRecordReader will start at the beginning of 
a record, and the last byte in the InputSplit will be the last byte of a 
record. The override of computeSplitSize() delegates to FileInputFormat's 
compute method, and then adjusts the returned split size by doing the 
following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * 
fixedRecordLength)

This suite of fixed length input format classes, does not support compressed 
files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.