[jira] Updated: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharad Agarwal updated MAPREDUCE-1153: -- Resolution: Fixed Fix Version/s: 0.22.0 Release Note: Update the number of trackers and blacklisted trackers metrics when trackers are decommissioned. Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. > Metrics counting tasktrackers and blacklisted tasktrackers are not updated > when trackers are decommissioned. > > > Key: MAPREDUCE-1153 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.22.0 >Reporter: Hemanth Yamijala >Assignee: Sharad Agarwal > Fix For: 0.22.0 > > Attachments: 1153.patch > > > MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of > actual, blacklisted and decommissioned tasktrackers. When a tracker is > decommissioned, the tasktracker count or the blacklisted tracker count is not > decremented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772447#action_12772447 ] Sharad Agarwal commented on MAPREDUCE-1153: --- Test failure is due to MAPREDUCE-1124 > Metrics counting tasktrackers and blacklisted tasktrackers are not updated > when trackers are decommissioned. > > > Key: MAPREDUCE-1153 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.22.0 >Reporter: Hemanth Yamijala >Assignee: Sharad Agarwal > Attachments: 1153.patch > > > MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of > actual, blacklisted and decommissioned tasktrackers. When a tracker is > decommissioned, the tasktracker count or the blacklisted tracker count is not > decremented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772443#action_12772443 ] Christian Kunz commented on MAPREDUCE-1171: --- Yes, in absence of MAPREDUCE-318, MAPREDUCE-353 should be sufficient to work around the issue. > Lots of fetch failures > -- > > Key: MAPREDUCE-1171 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Christian Kunz > > Since we upgraded to hadoop-0.20.1 from hadoop0.18.3, we see lot of more map > task failures because of 'Too many fetch-failures'. > One of our jobs makes hardly any progress, because of 3000 reduces not able > to get map output of 2 trailing maps (with about 80GB output each), which > repeatedly are marked as failures because of reduces not being able to get > their map output. > One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed > mapoutput fetch even after a single try when it was a read error > (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is > a good idea, as trailing map tasks will be attacked by all reduces > simultaneously. > Here is a log output of a reduce task: > {noformat} > 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: > attempt_200910281903_0028_r_00_0 copy failed: > attempt_200910281903_0028_m_002781_1 from some host > 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: > java.net.SocketTimeoutException: Read timed outat > java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220) > 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task > attempt_200910281903_0028_r_00_0: Failed fetch #1 from > attempt_200910281903_0028_m_002781_1 > 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to > fetch map-output from attempt_200910281903_0028_m_002781_1 even after > MAX_FETCH_RETRIES_PER_MAP retries... or it is a read error, reporting to > the JobTracker. > {noformat} > Also I saw a few log messages which look suspicious as if successfully > fetched map output is discarded because of the map being marked as failed > (because of too many fetch failures). This would make the situation even > worse. > {noformat} > 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed > len: 23967845 > 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 23967845 bytes (21882555 raw bytes) into RAM from > attempt_200910281903_0028_m_001076_0 > 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read > 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0 > 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from > attempt_200910281903_0028_m_001076_0 -> (20, 39772) from some host > ... > 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring > obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0' > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1177) TestTaskTrackerMemoryManager retries a task for more than 100 times.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1177: --- Attachment: TEST-org.apache.hadoop.mapred.TestTaskTrackerMemoryManager.txt Attaching the complete test log. > TestTaskTrackerMemoryManager retries a task for more than 100 times. > > > Key: MAPREDUCE-1177 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1177 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker, test >Reporter: Amareshwari Sriramadasu > Attachments: > TEST-org.apache.hadoop.mapred.TestTaskTrackerMemoryManager.txt > > > TestTaskTrackerMemoryManager retries a task for more than 100 times. > The logs showing the same: > {noformat} > 2009-11-02 12:41:20,489 INFO mapred.JobInProgress > (JobInProgress.java:completedTask(2530)) - Task > 'attempt_20091102123356106_0001_m_02_145' has completed > task_20091102123356106_0001_m_02 successfully. > {noformat} > Sometimes the test timesout also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1177) TestTaskTrackerMemoryManager retries a task for more than 100 times.
TestTaskTrackerMemoryManager retries a task for more than 100 times. Key: MAPREDUCE-1177 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1177 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker, test Reporter: Amareshwari Sriramadasu TestTaskTrackerMemoryManager retries a task for more than 100 times. The logs showing the same: {noformat} 2009-11-02 12:41:20,489 INFO mapred.JobInProgress (JobInProgress.java:completedTask(2530)) - Task 'attempt_20091102123356106_0001_m_02_145' has completed task_20091102123356106_0001_m_02 successfully. {noformat} Sometimes the test timesout also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772432#action_12772432 ] Amareshwari Sriramadasu commented on MAPREDUCE-323: --- bq. Nick Rettinghouse, Tim Williamson, and Rajiv Chittajallu all suggested a preference for per-hour directories, in particular, USER//MM/DD/HH, an option you did not list. Should we perhaps err on the side of a deeper structure, to ensure that we don't have to re-structure things again? Per-hour directories look like over-kill. On the average, For each user, there would be 10 jobs finished in an hour. bq. However implementing Cluster.getJobHistoryUrl() would be expensive for archived jobs, since the jobtracker must search the entire directory tree. Here, JobTracker need not search the entire directory tree. If JobTracker does not have it in the cache, Job Client itself can do the search. bq. Perhaps the directory structure should instead be based purely on the job ID? E.g., something like: jobtrackerstarttime/00/00/00 This looks fine. But when we have permissions in place, inserting user becomes difficult. > Improve the way job history files are managed > - > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.21.0, 0.22.0 >Reporter: Amar Kamat >Assignee: Amareshwari Sriramadasu >Priority: Critical > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-962) NPE in ProcfsBasedProcessTree.destroy()
[ https://issues.apache.org/jira/browse/MAPREDUCE-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-962: --- Release Note: Fixes an issue of NPE in ProcfsBasedProcessTree in a corner case. Status: Patch Available (was: Open) Allowing to go through Hudson > NPE in ProcfsBasedProcessTree.destroy() > --- > > Key: MAPREDUCE-962 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-962 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Reporter: Vinod K V >Assignee: Ravi Gummadi > Fix For: 0.21.0 > > Attachments: HADOOP-6232.patch, MR-962.patch, MR-962.v1.1.patch, > MR-962.v1.patch > > > This causes the following exception in TaskMemoryManagerThread. I observed > this while running TestTaskTrackerMemoryManager. > {code} > 2009-09-02 12:08:25,835 WARN mapred.TaskMemoryManagerThread > (TaskMemoryManagerThread.java:run(239)) - \ > Uncaught exception in TaskMemoryManager while managing memory of > attempt_20090902120812252_0001_m_03_0 : \ > java.lang.NullPointerException > at > org.apache.hadoop.util.ProcfsBasedProcessTree.assertPidPgrpidForMatch(ProcfsBasedProcessTree.java:234) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.assertAndDestroyProcessGroup(ProcfsBasedProcessTree.java:257) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.destroy(ProcfsBasedProcessTree.java:286) > at > org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:229) > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-962) NPE in ProcfsBasedProcessTree.destroy()
[ https://issues.apache.org/jira/browse/MAPREDUCE-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-962: --- Attachment: MR-962.v1.1.patch Attaching patch by removing the unnecessary array. > NPE in ProcfsBasedProcessTree.destroy() > --- > > Key: MAPREDUCE-962 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-962 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Reporter: Vinod K V >Assignee: Ravi Gummadi > Fix For: 0.21.0 > > Attachments: HADOOP-6232.patch, MR-962.patch, MR-962.v1.1.patch, > MR-962.v1.patch > > > This causes the following exception in TaskMemoryManagerThread. I observed > this while running TestTaskTrackerMemoryManager. > {code} > 2009-09-02 12:08:25,835 WARN mapred.TaskMemoryManagerThread > (TaskMemoryManagerThread.java:run(239)) - \ > Uncaught exception in TaskMemoryManager while managing memory of > attempt_20090902120812252_0001_m_03_0 : \ > java.lang.NullPointerException > at > org.apache.hadoop.util.ProcfsBasedProcessTree.assertPidPgrpidForMatch(ProcfsBasedProcessTree.java:234) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.assertAndDestroyProcessGroup(ProcfsBasedProcessTree.java:257) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.destroy(ProcfsBasedProcessTree.java:286) > at > org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:229) > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772426#action_12772426 ] Christian Kunz commented on MAPREDUCE-1171: --- Just for the record, we use a 020.1 yahoo release. I checked that Cloudera releases contain HADOOP-3327 as early as hadoop-0.20.0+61. > Lots of fetch failures > -- > > Key: MAPREDUCE-1171 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Christian Kunz > > Since we upgraded to hadoop-0.20.1 from hadoop0.18.3, we see lot of more map > task failures because of 'Too many fetch-failures'. > One of our jobs makes hardly any progress, because of 3000 reduces not able > to get map output of 2 trailing maps (with about 80GB output each), which > repeatedly are marked as failures because of reduces not being able to get > their map output. > One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed > mapoutput fetch even after a single try when it was a read error > (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is > a good idea, as trailing map tasks will be attacked by all reduces > simultaneously. > Here is a log output of a reduce task: > {noformat} > 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: > attempt_200910281903_0028_r_00_0 copy failed: > attempt_200910281903_0028_m_002781_1 from some host > 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: > java.net.SocketTimeoutException: Read timed outat > java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220) > 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task > attempt_200910281903_0028_r_00_0: Failed fetch #1 from > attempt_200910281903_0028_m_002781_1 > 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to > fetch map-output from attempt_200910281903_0028_m_002781_1 even after > MAX_FETCH_RETRIES_PER_MAP retries... or it is a read error, reporting to > the JobTracker. > {noformat} > Also I saw a few log messages which look suspicious as if successfully > fetched map output is discarded because of the map being marked as failed > (because of too many fetch failures). This would make the situation even > worse. > {noformat} > 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed > len: 23967845 > 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 23967845 bytes (21882555 raw bytes) into RAM from > attempt_200910281903_0028_m_001076_0 > 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read > 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0 > 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from > attempt_200910281903_0028_m_001076_0 -> (20, 39772) from some host > ... > 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring > obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0' > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772424#action_12772424 ] Amareshwari Sriramadasu commented on MAPREDUCE-1171: Christian, are you using Yahoo! distribution for 0.20? In branch 0.21, MAPREDUCE-353 makes connect and read timeout configurable for a job. Moreover, Shuffle is simplified by MAPREDUCE-318. Essentially, HADOOP-3327 is no more there. Christian, Making connect and read timeout configurable should address this issue, right? > Lots of fetch failures > -- > > Key: MAPREDUCE-1171 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Christian Kunz > > Since we upgraded to hadoop-0.20.1 from hadoop0.18.3, we see lot of more map > task failures because of 'Too many fetch-failures'. > One of our jobs makes hardly any progress, because of 3000 reduces not able > to get map output of 2 trailing maps (with about 80GB output each), which > repeatedly are marked as failures because of reduces not being able to get > their map output. > One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed > mapoutput fetch even after a single try when it was a read error > (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is > a good idea, as trailing map tasks will be attacked by all reduces > simultaneously. > Here is a log output of a reduce task: > {noformat} > 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: > attempt_200910281903_0028_r_00_0 copy failed: > attempt_200910281903_0028_m_002781_1 from some host > 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: > java.net.SocketTimeoutException: Read timed outat > java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220) > 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task > attempt_200910281903_0028_r_00_0: Failed fetch #1 from > attempt_200910281903_0028_m_002781_1 > 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to > fetch map-output from attempt_200910281903_0028_m_002781_1 even after > MAX_FETCH_RETRIES_PER_MAP retries... or it is a read error, reporting to > the JobTracker. > {noformat} > Also I saw a few log messages which look suspicious as if successfully > fetched map output is discarded because of the map being marked as failed > (because of too many fetch failures). This would make the situation even > worse. > {noformat} > 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed > len: 23967845 > 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 23967845 bytes (21882555 raw bytes) into RAM from > attempt_200910281903_0028_m_001076_0 > 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read > 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0 > 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from > attempt_200910281903_0028_m_001076_0 -> (20, 39772) from some host > ... > 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring > obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0' > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rahul k singh updated MAPREDUCE-1143: - Status: Open (was: Patch Available) > runningMapTasks counter is not properly decremented in case of failed Tasks. > > > Key: MAPREDUCE-1143 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: rahul k singh >Priority: Blocker > Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, > MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch, > MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, > MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch, > MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rahul k singh updated MAPREDUCE-1143: - Status: Patch Available (was: Open) > runningMapTasks counter is not properly decremented in case of failed Tasks. > > > Key: MAPREDUCE-1143 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: rahul k singh >Priority: Blocker > Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, > MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch, > MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, > MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch, > MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-962) NPE in ProcfsBasedProcessTree.destroy()
[ https://issues.apache.org/jira/browse/MAPREDUCE-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-962: --- Status: Open (was: Patch Available) Apologies for looking at this late. The patch looks fine overall. One minor nit is that the test case testDestroyProcessTree is initializing an array procInfos without any need for it. Can this be removed and run through Hudson again - so I can commit it ? > NPE in ProcfsBasedProcessTree.destroy() > --- > > Key: MAPREDUCE-962 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-962 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Reporter: Vinod K V >Assignee: Ravi Gummadi > Fix For: 0.21.0 > > Attachments: HADOOP-6232.patch, MR-962.patch, MR-962.v1.patch > > > This causes the following exception in TaskMemoryManagerThread. I observed > this while running TestTaskTrackerMemoryManager. > {code} > 2009-09-02 12:08:25,835 WARN mapred.TaskMemoryManagerThread > (TaskMemoryManagerThread.java:run(239)) - \ > Uncaught exception in TaskMemoryManager while managing memory of > attempt_20090902120812252_0001_m_03_0 : \ > java.lang.NullPointerException > at > org.apache.hadoop.util.ProcfsBasedProcessTree.assertPidPgrpidForMatch(ProcfsBasedProcessTree.java:234) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.assertAndDestroyProcessGroup(ProcfsBasedProcessTree.java:257) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.destroy(ProcfsBasedProcessTree.java:286) > at > org.apache.hadoop.mapred.TaskMemoryManagerThread.run(TaskMemoryManagerThread.java:229) > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1171) Lots of fetch failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1171: --- Affects Version/s: (was: 0.20.1) 0.21.0 HADOOP-3327 went into branch 0.21 > Lots of fetch failures > -- > > Key: MAPREDUCE-1171 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.21.0 >Reporter: Christian Kunz > > Since we upgraded to hadoop-0.20.1 from hadoop0.18.3, we see lot of more map > task failures because of 'Too many fetch-failures'. > One of our jobs makes hardly any progress, because of 3000 reduces not able > to get map output of 2 trailing maps (with about 80GB output each), which > repeatedly are marked as failures because of reduces not being able to get > their map output. > One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed > mapoutput fetch even after a single try when it was a read error > (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is > a good idea, as trailing map tasks will be attacked by all reduces > simultaneously. > Here is a log output of a reduce task: > {noformat} > 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: > attempt_200910281903_0028_r_00_0 copy failed: > attempt_200910281903_0028_m_002781_1 from some host > 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: > java.net.SocketTimeoutException: Read timed outat > java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220) > 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task > attempt_200910281903_0028_r_00_0: Failed fetch #1 from > attempt_200910281903_0028_m_002781_1 > 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to > fetch map-output from attempt_200910281903_0028_m_002781_1 even after > MAX_FETCH_RETRIES_PER_MAP retries... or it is a read error, reporting to > the JobTracker. > {noformat} > Also I saw a few log messages which look suspicious as if successfully > fetched map output is discarded because of the map being marked as failed > (because of too many fetch failures). This would make the situation even > worse. > {noformat} > 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: > attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed > len: 23967845 > 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling > 23967845 bytes (21882555 raw bytes) into RAM from > attempt_200910281903_0028_m_001076_0 > 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read > 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0 > 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from > attempt_200910281903_0028_m_001076_0 -> (20, 39772) from some host > ... > 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring > obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0' > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1136) ConcurrentModificationException when tasktracker updates task status to jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772416#action_12772416 ] Vinod K V commented on MAPREDUCE-1136: -- bq. Is this because the access to jobs is not synchronized? Yes bq. Wouldn't this result in losing a heartbeat? No, the RPC in question is getAllJobs(), so no case of missing heartbeat. At-least not in this case. > ConcurrentModificationException when tasktracker updates task status to > jobtracker > -- > > Key: MAPREDUCE-1136 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1136 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Qi Liu > > In Hadoop 0.18.3, the following exception happened during a job execution. It > does not happen often. > Here is the stack trace of the exception. > org.apache.hadoop.ipc.RemoteException: java.io.IOException: > java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) > at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145) > at > org.apache.hadoop.mapred.JobTracker.getAllJobs(JobTracker.java:2376) > at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) > org.apache.hadoop.ipc.Client.call(Client.java:716) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1027) jobtracker.jsp can have an html text block for announcements by admins.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772415#action_12772415 ] Vinod K V commented on MAPREDUCE-1027: -- +1 for the implementation details in general. > jobtracker.jsp can have an html text block for announcements by admins. > --- > > Key: MAPREDUCE-1027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1027 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Vinod K V > > jobtracker.jsp is the first page for users of Map/Reduce clusters and can be > used for sending information across to all users. It will be useful to have a > text block on this page where administrators can put any latest > notices/announcements time to time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772411#action_12772411 ] Hadoop QA commented on MAPREDUCE-967: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423791/mapreduce-967.txt against trunk revision 831037. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/116/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/116/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/116/console This message is automatically generated. > TaskTracker does not need to fully unjar job jars > - > > Key: MAPREDUCE-967 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-967 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, > mapreduce-967.txt > > > In practice we have seen some users submitting job jars that consist of > 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning > up after them has a significant cost (both in wall clock and in unnecessary > heavy disk utilization). This cost can be easily avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-967: -- Status: Open (was: Patch Available) > TaskTracker does not need to fully unjar job jars > - > > Key: MAPREDUCE-967 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-967 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, > mapreduce-967.txt > > > In practice we have seen some users submitting job jars that consist of > 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning > up after them has a significant cost (both in wall clock and in unnecessary > heavy disk utilization). This cost can be easily avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-967: -- Status: Patch Available (was: Open) > TaskTracker does not need to fully unjar job jars > - > > Key: MAPREDUCE-967 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-967 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, > mapreduce-967.txt > > > In practice we have seen some users submitting job jars that consist of > 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning > up after them has a significant cost (both in wall clock and in unnecessary > heavy disk utilization). This cost can be easily avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-967: -- Attachment: mapreduce-967.txt Small update to previous patch - forgot to change references to RunJar to point to its new location. > TaskTracker does not need to fully unjar job jars > - > > Key: MAPREDUCE-967 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-967 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, > mapreduce-967.txt > > > In practice we have seen some users submitting job jars that consist of > 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning > up after them has a significant cost (both in wall clock and in unnecessary > heavy disk utilization). This cost can be easily avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1170) MultipleInputs doesn't work with new API in 0.20 branch
[ https://issues.apache.org/jira/browse/MAPREDUCE-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Booth updated MAPREDUCE-1170: - Status: Open (was: Patch Available) Cancelling patch until this fully works > MultipleInputs doesn't work with new API in 0.20 branch > --- > > Key: MAPREDUCE-1170 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1170 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1 >Reporter: Jay Booth > Fix For: 0.20.2 > > Attachments: multipleInputs.patch > > > This patch adds support for MultipleInputs (and KeyValueTextInputFormat) in > o.a.h.mapreduce.lib.input, working with the new API. Included passing unit > test. Include for 0.20.2? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772397#action_12772397 ] Todd Lipcon commented on MAPREDUCE-1176: Hi, Could you please post this as a patch file against the hadoop-mapreduce trunk? This will allow Hudson to automatically test the change. Also, a couple notes: - please include the Apache license header at the top of these files. - @author tags are discouraged in Apache projects - Please include unit tests for this new code. Thanks for the contribution - look forward to seeing this in trunk! > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthInputFormat.java, FixedLengthRecordReader.java > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-961) ResourceAwareLoadManager to dynamically decide new tasks based on current CPU/memory load on TaskTracker(s)
[ https://issues.apache.org/jira/browse/MAPREDUCE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772390#action_12772390 ] Matei Zaharia commented on MAPREDUCE-961: - Hi Scott and Dhruba, I've looked at the patch a little bit and have a few comments: # I agree with Dhruba that it would be good to have the option of running multiple Hadoop clusters in parallel. It's also good design to allow the metrics data to be consumed by multiple sources. # In MemBasedLoadManager.canLaunchTask, you are returning true in some cases and saying that this is "equivalent to the case of using only CapBasedLoadManager". How is that happening? I think you would need to return super.canLaunchTask(...), not true. The Fair Scheduler itself doesn't look at slot counts. # It might be useful to use the max map slots / max reduce slots settings as upper bounds on the total number of tasks on each node, to limit the number of processes launched. In this case an administrator could configure the slots higher (e.g. 20 map slots and 10 reduce slots), and the node utilization would be used to determine when fewer than this number of tasks should be launched. Otherwise, a job with very low-utilization tasks could cause hundreds of processes to be launched on each node. # Have you thought in detail about how the MemBasedLoadManager will work when the scheduler tries to launch multiple tasks per heartbeat (part of MAPREDUCE-706)? I think there are two questions: #* First, you will need to cap the number of tasks launched per heartbeat based on free memory on the node, so that we don't end up launching too many tasks and overcommitting memory. One way to do this might be to count tasks we schedule against the free memory on the node, and conservatively estimate them to each use 2 GB or something (admin-configurable). #* Second, it's important to launch both reduces and maps if both types of tasks are available. The current multiple-task-per-heartbeat code in MAPREDUCE-706 (and in all the other schedulers as far as I know) will first try to launch map tasks until canLaunchTask(TaskType.MAP) returns false (or until there are no pending map tasks), and will the look for pending reduce tasks. With the current MemBasedLoadManager, this would starve reduces whenever there are pending maps. It would be better to alternate between the two task types if both are available. > ResourceAwareLoadManager to dynamically decide new tasks based on current > CPU/memory load on TaskTracker(s) > --- > > Key: MAPREDUCE-961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-961 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/fair-share >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Attachments: HIVE-961.patch, MAPREDUCE-961-v2.patch > > > Design and develop a ResouceAwareLoadManager for the FairShare scheduler that > dynamically decides how many maps/reduces to run on a particular machine > based on the CPU/Memory/diskIO/network usage in that machine. The amount of > resources currently used on each task tracker is being fed into the > ResourceAwareLoadManager in real-time via an entity that is external to > Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BitsOfInfo updated MAPREDUCE-1176: -- Attachment: FixedLengthRecordReader.java FixedLengthInputFormat.java Attached source files > Contribution: FixedLengthInputFormat and FixedLengthRecordReader > > > Key: MAPREDUCE-1176 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 0.20.1 > Environment: Any >Reporter: BitsOfInfo >Priority: Minor > Attachments: FixedLengthInputFormat.java, FixedLengthRecordReader.java > > > Hello, > I would like to contribute the following two classes for incorporation into > the mapreduce.lib.input package. These two classes can be used when you need > to read data from files containing fixed length (fixed width) records. Such > files have no CR/LF (or any combination thereof), no delimiters etc, but each > record is a fixed length, and extra data is padded with spaces. The data is > one gigantic line within a file. > Provided are two classes first is the FixedLengthInputFormat and its > corresponding FixedLengthRecordReader. When creating a job that specifies > this input format, the job must have the > "mapreduce.input.fixedlengthinputformat.record.length" property set as follows > myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); > OR > myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, > [myFixedRecordLength]); > This input format overrides computeSplitSize() in order to ensure that > InputSplits do not contain any partial records since with fixed records there > is no way to determine where a record begins if that were to occur. Each > InputSplit passed to the FixedLengthRecordReader will start at the beginning > of a record, and the last byte in the InputSplit will be the last byte of a > record. The override of computeSplitSize() delegates to FileInputFormat's > compute method, and then adjusts the returned split size by doing the > following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) > * fixedRecordLength) > This suite of fixed length input format classes, does not support compressed > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader
Contribution: FixedLengthInputFormat and FixedLengthRecordReader Key: MAPREDUCE-1176 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 0.20.1 Environment: Any Reporter: BitsOfInfo Priority: Minor Hello, I would like to contribute the following two classes for incorporation into the mapreduce.lib.input package. These two classes can be used when you need to read data from files containing fixed length (fixed width) records. Such files have no CR/LF (or any combination thereof), no delimiters etc, but each record is a fixed length, and extra data is padded with spaces. The data is one gigantic line within a file. Provided are two classes first is the FixedLengthInputFormat and its corresponding FixedLengthRecordReader. When creating a job that specifies this input format, the job must have the "mapreduce.input.fixedlengthinputformat.record.length" property set as follows myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]); OR myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, [myFixedRecordLength]); This input format overrides computeSplitSize() in order to ensure that InputSplits do not contain any partial records since with fixed records there is no way to determine where a record begins if that were to occur. Each InputSplit passed to the FixedLengthRecordReader will start at the beginning of a record, and the last byte in the InputSplit will be the last byte of a record. The override of computeSplitSize() delegates to FileInputFormat's compute method, and then adjusts the returned split size by doing the following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * fixedRecordLength) This suite of fixed length input format classes, does not support compressed files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.