[jira] Updated: (MAPREDUCE-2041) TaskRunner logDir race condition leads to crash on job-acl.xml creation
[ https://issues.apache.org/jira/browse/MAPREDUCE-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Roelofs updated MAPREDUCE-2041: Attachment: MR-2041.v1.trunk-hadoop-mapreduce.patch Patch that improves TaskRunner's error-checking. This makes the failure mechanism more obvious but does not address the nondeterministic behavior of TestTrackerBlacklistAcrossJobs. (A minor tweak - removing the "throw ie;" line - _does_ fix the test. However, I'm assuming we don't want to ignore the failure to create job-acl.xml in the general case.) > TaskRunner logDir race condition leads to crash on job-acl.xml creation > --- > > Key: MAPREDUCE-2041 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2041 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.22.0 > Environment: Linux/x86-64, 32-bit Java, NFS source tree >Reporter: Greg Roelofs > Attachments: MR-2041.v1.trunk-hadoop-mapreduce.patch > > > TaskRunner's prepareLogFiles() warns on mkdirs() failures but ignores them. > It also fails even to check the return value of setPermissions(). Either one > can fail (e.g., on NFS, where there appears to be a TOCTOU-style race, except > with C = "creation"), in which case the subsequent creation of job-acl.xml in > writeJobACLs() will also fail, killing the task: > {noformat} > 2010-08-26 20:18:10,334 INFO mapred.TaskInProgress > (TaskInProgress.java:updateStatus(591)) - Error from > attempt_20100826201758813_0001_m_01_0 on > tracker_host2.rack.com:rh45-64/127.0.0.1:35112: java.lang.Throwable: Child > Error > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229) > Caused by: java.io.FileNotFoundException: > /home//grid/trunk/hadoop-mapreduce/build/test/logs/userlogs/job_20100826201758813_0001/attempt_20100826201758813_0001_m_01_0/job-acl.xml > (No such file or directory) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:179) > at java.io.FileOutputStream.(FileOutputStream.java:131) > at org.apache.hadoop.mapred.TaskRunner.writeJobACLs(TaskRunner.java:307) > at > org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:290) > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:199) > {noformat} > This in turn causes TestTrackerBlacklistAcrossJobs to fail sporadically; the > job-acl.xml failure always seems to affect host2 - and to do so more quickly > than the intentional exception on host1 - which triggers an assertion failure > due to the wrong host being job-blacklisted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2041) TaskRunner logDir race condition leads to crash on job-acl.xml creation
TaskRunner logDir race condition leads to crash on job-acl.xml creation --- Key: MAPREDUCE-2041 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2041 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 0.22.0 Environment: Linux/x86-64, 32-bit Java, NFS source tree Reporter: Greg Roelofs TaskRunner's prepareLogFiles() warns on mkdirs() failures but ignores them. It also fails even to check the return value of setPermissions(). Either one can fail (e.g., on NFS, where there appears to be a TOCTOU-style race, except with C = "creation"), in which case the subsequent creation of job-acl.xml in writeJobACLs() will also fail, killing the task: {noformat} 2010-08-26 20:18:10,334 INFO mapred.TaskInProgress (TaskInProgress.java:updateStatus(591)) - Error from attempt_20100826201758813_0001_m_01_0 on tracker_host2.rack.com:rh45-64/127.0.0.1:35112: java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229) Caused by: java.io.FileNotFoundException: /home//grid/trunk/hadoop-mapreduce/build/test/logs/userlogs/job_20100826201758813_0001/attempt_20100826201758813_0001_m_01_0/job-acl.xml (No such file or directory) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.(FileOutputStream.java:179) at java.io.FileOutputStream.(FileOutputStream.java:131) at org.apache.hadoop.mapred.TaskRunner.writeJobACLs(TaskRunner.java:307) at org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:290) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:199) {noformat} This in turn causes TestTrackerBlacklistAcrossJobs to fail sporadically; the job-acl.xml failure always seems to affect host2 - and to do so more quickly than the intentional exception on host1 - which triggers an assertion failure due to the wrong host being job-blacklisted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2040) Forrest Documentation for Dynamic Priority Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Sandholm updated MAPREDUCE-2040: --- Status: Patch Available (was: Open) Release Note: Forrest Documentation for Dynamic Priority Scheduler > Forrest Documentation for Dynamic Priority Scheduler > > > Key: MAPREDUCE-2040 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2040 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/dynamic-scheduler >Affects Versions: 0.21.0 >Reporter: Thomas Sandholm >Assignee: Thomas Sandholm >Priority: Minor > Fix For: 0.21.1 > > Attachments: MAPREDUCE-2040.patch > > > New Forrest documentation for dynamic priority scheduler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2040) Forrest Documentation for Dynamic Priority Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Sandholm updated MAPREDUCE-2040: --- Affects Version/s: 0.21.0 (was: 0.20.1) > Forrest Documentation for Dynamic Priority Scheduler > > > Key: MAPREDUCE-2040 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2040 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/dynamic-scheduler >Affects Versions: 0.21.0 >Reporter: Thomas Sandholm >Assignee: Thomas Sandholm >Priority: Minor > Fix For: 0.21.1 > > Attachments: MAPREDUCE-2040.patch > > > New Forrest documentation for dynamic priority scheduler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2040) Forrest Documentation for Dynamic Priority Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Sandholm updated MAPREDUCE-2040: --- Attachment: MAPREDUCE-2040.patch xdoc file and a link from the scheduler menu > Forrest Documentation for Dynamic Priority Scheduler > > > Key: MAPREDUCE-2040 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2040 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/dynamic-scheduler >Affects Versions: 0.20.1 >Reporter: Thomas Sandholm >Assignee: Thomas Sandholm >Priority: Minor > Fix For: 0.21.1 > > Attachments: MAPREDUCE-2040.patch > > > New Forrest documentation for dynamic priority scheduler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2040) Forrest Documentation for Dynamic Priority Scheduler
Forrest Documentation for Dynamic Priority Scheduler Key: MAPREDUCE-2040 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2040 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/dynamic-scheduler Affects Versions: 0.20.1 Reporter: Thomas Sandholm Assignee: Thomas Sandholm Priority: Minor Fix For: 0.21.1 New Forrest documentation for dynamic priority scheduler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2039) Improve speculative execution
[ https://issues.apache.org/jira/browse/MAPREDUCE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903717#action_12903717 ] Dick King commented on MAPREDUCE-2039: -- The runtime space requirements for this will be noticeable but modest. Each task in progress will need a {{float}} or two for the exponentially smoothed value, plus an {{int}} for the most recent update [needed for the exponential smoothing calculation]. Although we internally represent times as a {{long}} , an {{int}} is enough here because the wrap-around time is 47 days. Jobs, and therefore tasks, can't run this long for other reasons. > Improve speculative execution > - > > Key: MAPREDUCE-2039 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2039 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Dick King >Assignee: Dick King > > In speculation, the framework issues a second task attempt on a task where > one attempt is already running. This is useful if the running attempt is > bogged down for reasons outside of the task's code, so a second attempt > finishes ahead of the existing attempt, even though the first attempt has a > head start. > Early versions of speculation had the weakness that an attempt that starts > out well but breaks down near the end would never get speculated. That got > fixed in HADOOP:2141 , but in the fix the speculation wouldn't engage until > the performance of the old attempt, _even counting the early portion where it > progressed normally_ , was significantly worse than average. > I want to fix that by overweighting the more recent progress increments. In > particular, I would like to use exponential smoothing with a lambda of > approximately 1/minute [which is the time scale of speculative execution] to > measure progress per unit time. This affects the speculation code in two > places: >* It affects the set of task attempts we consider to be underperforming >* It affects our estimates of when we expect tasks to finish. This could > be hugely important; speculation's main benefit is that it gets a single > outlier task finished earlier than otherwise possible, and we need to know > which task is the outlier as accurately as possible. > I would like a rich suite of configuration variables, minimally including > lambda and possibly weighting factors. We might have two exponentially > smoothed tracking variables of the progress rate, to diagnose attempts that > are bogged down and getting worse vrs. bogging down but improving. > Perhaps we should be especially eager to speculate a second attempt. If a > task is deterministically failing after bogging down [think "rare infinite > loop bug"] we would rather take a couple of our attempts in parallel to > discover the problem sooner. > As part of this patch we would like to add benchmarks that simulate rare > tasks that behave poorly, so we can discover whether this change in the code > is a good idea and what the proper configuration is. Early versions of this > will be driven by our assumptions. Later versions will be driven by the > fruits of MAPREDUCE:2037 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2039) Improve speculative execution
Improve speculative execution - Key: MAPREDUCE-2039 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2039 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Dick King Assignee: Dick King In speculation, the framework issues a second task attempt on a task where one attempt is already running. This is useful if the running attempt is bogged down for reasons outside of the task's code, so a second attempt finishes ahead of the existing attempt, even though the first attempt has a head start. Early versions of speculation had the weakness that an attempt that starts out well but breaks down near the end would never get speculated. That got fixed in HADOOP:2141 , but in the fix the speculation wouldn't engage until the performance of the old attempt, _even counting the early portion where it progressed normally_ , was significantly worse than average. I want to fix that by overweighting the more recent progress increments. In particular, I would like to use exponential smoothing with a lambda of approximately 1/minute [which is the time scale of speculative execution] to measure progress per unit time. This affects the speculation code in two places: * It affects the set of task attempts we consider to be underperforming * It affects our estimates of when we expect tasks to finish. This could be hugely important; speculation's main benefit is that it gets a single outlier task finished earlier than otherwise possible, and we need to know which task is the outlier as accurately as possible. I would like a rich suite of configuration variables, minimally including lambda and possibly weighting factors. We might have two exponentially smoothed tracking variables of the progress rate, to diagnose attempts that are bogged down and getting worse vrs. bogging down but improving. Perhaps we should be especially eager to speculate a second attempt. If a task is deterministically failing after bogging down [think "rare infinite loop bug"] we would rather take a couple of our attempts in parallel to discover the problem sooner. As part of this patch we would like to add benchmarks that simulate rare tasks that behave poorly, so we can discover whether this change in the code is a good idea and what the proper configuration is. Early versions of this will be driven by our assumptions. Later versions will be driven by the fruits of MAPREDUCE:2037 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-115) Map tasks are receiving FileNotFound Exceptions for spill files on a regular basis and are getting killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma resolved MAPREDUCE-115. - Resolution: Cannot Reproduce my bad. i didn't realize that we have some new changes that are not reflected in trunk. this seems to be the most likely the result of those changes. closing again. > Map tasks are receiving FileNotFound Exceptions for spill files on a regular > basis and are getting killed > - > > Key: MAPREDUCE-115 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-115 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jothi Padmanabhan > > The following is the log -- Map tasks are unable to locate the spill files > when they are doing the final merge (mergeParts). > java.io.FileNotFoundException: File > /xxx/mapred-tt/mapred-local/taskTracker/jobcache/job_200808190959_0001/attempt_200808190959_0001_m_00_0/output/spill23.out > does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244) > at > org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:682) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.getFileLength(ChecksumFileSystem.java:218) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.seek(ChecksumFileSystem.java:259) > at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1102) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:769) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:255) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2208) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-323: Attachment: MR323--2010-08-27--1613.patch There was a testing problem. {{TestJobCleanup}} was leaving behind files that were causing {{TestJobOutputCommitter}} to fail. I fixed that. > Improve the way job history files are managed > - > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.21.0, 0.22.0 >Reporter: Amar Kamat >Assignee: Dick King >Priority: Critical > Attachments: MR323--2010-08-20--1533.patch, > MR323--2010-08-25--1632.patch, MR323--2010-08-27--1359.patch, > MR323--2010-08-27--1613.patch > > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (MAPREDUCE-115) Map tasks are receiving FileNotFound Exceptions for spill files on a regular basis and are getting killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma reopened MAPREDUCE-115: - re-opening. we are seeing this a lot on hadoop-20 (yahoo distribution): 1. reducers not able to fetch map outputs because map side tasktracker cannot locate map output 2. mappers not able to locate previously spilled data Scott has added logging that is telling us that: - for #1. that the map output file was actually present/created at the time the map was first reported to be done - that we have not removed the mapoutput file (from the TT code path deleting the files) before the reducer fetch request came in so something very fishy - seems like either the files disappear in the interim - or that the localdirallocator is not being able to find things that are actually present. > Map tasks are receiving FileNotFound Exceptions for spill files on a regular > basis and are getting killed > - > > Key: MAPREDUCE-115 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-115 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jothi Padmanabhan > > The following is the log -- Map tasks are unable to locate the spill files > when they are doing the final merge (mergeParts). > java.io.FileNotFoundException: File > /xxx/mapred-tt/mapred-local/taskTracker/jobcache/job_200808190959_0001/attempt_200808190959_0001_m_00_0/output/spill23.out > does not exist. > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244) > at > org.apache.hadoop.fs.FileSystem.getContentSummary(FileSystem.java:682) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.getFileLength(ChecksumFileSystem.java:218) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.seek(ChecksumFileSystem.java:259) > at > org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1102) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:769) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:255) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2208) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2038) Making reduce tasks locality-aware
[ https://issues.apache.org/jira/browse/MAPREDUCE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903682#action_12903682 ] Hong Tang commented on MAPREDUCE-2038: -- bq. Yep, that's the basic idea. Implementing rack-combiners as a first class concept would be neat, but the point above is that we can "fake" it if we have locality for reducers, with a lot less work. I don't know if it would have a huge performance improvement, but we could experiment with it easily given this feature. Makes sense to me. > Making reduce tasks locality-aware > -- > > Key: MAPREDUCE-2038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2038 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Hong Tang > > Currently Hadoop MapReduce framework does not take into consideration of data > locality when it decides to launch reduce tasks. There are several cases > where it could become sub-optimal. > - The map output data for a particular reduce task are not distributed evenly > across different racks. This could happen when the job does not have many > maps, or when there is heavy skew in map output data. > - A reduce task may need to access some side file (e.g. Pig fragmented join, > or incremental merge of unsorted smaller dataset with an already sorted large > dataset). It'd be useful to place reduce tasks based on the location of the > side files they need to access. > This jira is created for the purpose of soliciting ideas on how we can make > it better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2038) Making reduce tasks locality-aware
[ https://issues.apache.org/jira/browse/MAPREDUCE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903676#action_12903676 ] Todd Lipcon commented on MAPREDUCE-2038: bq. Do you mean that for aggregation operations that would reduce data-volume along the way, so you want to do a hierarchical approach Yep, that's the basic idea. Implementing rack-combiners as a first class concept would be neat, but the point above is that we can "fake" it if we have locality for reducers, with a lot less work. I don't know if it would have a huge performance improvement, but we could experiment with it easily given this feature. > Making reduce tasks locality-aware > -- > > Key: MAPREDUCE-2038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2038 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Hong Tang > > Currently Hadoop MapReduce framework does not take into consideration of data > locality when it decides to launch reduce tasks. There are several cases > where it could become sub-optimal. > - The map output data for a particular reduce task are not distributed evenly > across different racks. This could happen when the job does not have many > maps, or when there is heavy skew in map output data. > - A reduce task may need to access some side file (e.g. Pig fragmented join, > or incremental merge of unsorted smaller dataset with an already sorted large > dataset). It'd be useful to place reduce tasks based on the location of the > side files they need to access. > This jira is created for the purpose of soliciting ideas on how we can make > it better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-323: Attachment: MR323--2010-08-27--1359.patch Problems fixed. {{ant test}} in progress. > Improve the way job history files are managed > - > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.21.0, 0.22.0 >Reporter: Amar Kamat >Assignee: Dick King >Priority: Critical > Attachments: MR323--2010-08-20--1533.patch, > MR323--2010-08-25--1632.patch, MR323--2010-08-27--1359.patch > > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2038) Making reduce tasks locality-aware
[ https://issues.apache.org/jira/browse/MAPREDUCE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903666#action_12903666 ] Hong Tang commented on MAPREDUCE-2038: -- bq. This is a very interesting direction. We have another use case for HBase bulk loads, where we know that a given reducer partition is going to end up on a particular region server (often colocated with a TT). Scheduling the reducer to run on the same node or rack will ensure a local replica of the HFile when it comes time to serve it. I believe this is similar to the second usage case I described. bq. Another interesting use case is for aggregation queries where we can make use of something like a "rack combiner". We can simply implement a Partitioner that returns the rack index of the mapper, and then schedule that reduce task on the same rack. Thus we end up with a result set per rack, and can do a second small job to recombine those. This is not unlike the multilevel query execution trees used in Dremel - I imagine Hive and Pig's query planners could make use of plenty of techniques like this. Do you mean that for aggregation operations that would reduce data-volume along the way, so you want to do a hierarchical approach? (So the real hierarchy would be: inside-the-map-combiner, same host-multiple-map-combining, inside-the-rack reduction, and cross-rack reduction). > Making reduce tasks locality-aware > -- > > Key: MAPREDUCE-2038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2038 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Hong Tang > > Currently Hadoop MapReduce framework does not take into consideration of data > locality when it decides to launch reduce tasks. There are several cases > where it could become sub-optimal. > - The map output data for a particular reduce task are not distributed evenly > across different racks. This could happen when the job does not have many > maps, or when there is heavy skew in map output data. > - A reduce task may need to access some side file (e.g. Pig fragmented join, > or incremental merge of unsorted smaller dataset with an already sorted large > dataset). It'd be useful to place reduce tasks based on the location of the > side files they need to access. > This jira is created for the purpose of soliciting ideas on how we can make > it better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2038) Making reduce tasks locality-aware
[ https://issues.apache.org/jira/browse/MAPREDUCE-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903656#action_12903656 ] Todd Lipcon commented on MAPREDUCE-2038: This is a very interesting direction. We have another use case for HBase bulk loads, where we know that a given reducer partition is going to end up on a particular region server (often colocated with a TT). Scheduling the reducer to run on the same node or rack will ensure a local replica of the HFile when it comes time to serve it. Another interesting use case is for aggregation queries where we can make use of something like a "rack combiner". We can simply implement a Partitioner that returns the rack index of the mapper, and then schedule that reduce task on the same rack. Thus we end up with a result set per rack, and can do a second small job to recombine those. This is not unlike the multilevel query execution trees used in Dremel - I imagine Hive and Pig's query planners could make use of plenty of techniques like this. > Making reduce tasks locality-aware > -- > > Key: MAPREDUCE-2038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2038 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Hong Tang > > Currently Hadoop MapReduce framework does not take into consideration of data > locality when it decides to launch reduce tasks. There are several cases > where it could become sub-optimal. > - The map output data for a particular reduce task are not distributed evenly > across different racks. This could happen when the job does not have many > maps, or when there is heavy skew in map output data. > - A reduce task may need to access some side file (e.g. Pig fragmented join, > or incremental merge of unsorted smaller dataset with an already sorted large > dataset). It'd be useful to place reduce tasks based on the location of the > side files they need to access. > This jira is created for the purpose of soliciting ideas on how we can make > it better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2038) Making reduce tasks locality-aware
Making reduce tasks locality-aware -- Key: MAPREDUCE-2038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2038 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Hong Tang Currently Hadoop MapReduce framework does not take into consideration of data locality when it decides to launch reduce tasks. There are several cases where it could become sub-optimal. - The map output data for a particular reduce task are not distributed evenly across different racks. This could happen when the job does not have many maps, or when there is heavy skew in map output data. - A reduce task may need to access some side file (e.g. Pig fragmented join, or incremental merge of unsorted smaller dataset with an already sorted large dataset). It'd be useful to place reduce tasks based on the location of the side files they need to access. This jira is created for the purpose of soliciting ideas on how we can make it better. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2037) Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds
Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds --- Key: MAPREDUCE-2037 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2037 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Dick King Assignee: Dick King Fix For: 0.22.0 We would like to capture the following information at certain progress thresholds as a task runs: * Time taken so far * CPU load [either at the time the data are taken, or exponentially smoothed] * Memory load [also either at the time the data are taken, or exponentially smoothed] This would be taken at intervals that depend on the task progress plateaus. For example, reducers have three progress ranges -- [0-1/3], (1/3-2/3], and (2/3-3/3] -- where fundamentally different activities happen. Mappers have different boundaries, I understand, that are not symmetrically placed. Data capture boundaries should coincide with activity boundaries. For the state information capture [CPU and memory] we should average over the covered interval. This data would flow in with the heartbeats. It would be placed in the job history as part of the task attempt completion event, so it could be processed by rumen or some similar tool and could drive a benchmark engine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive
[ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wittawat Tantisiriroj updated MAPREDUCE-2036: - Attachment: hdfs-raid.tar.gz MAPREDUCE-2036.patch Prototype is uploaded. > Enable Erasure Code in Tool similar to Hadoop Archive > - > > Key: MAPREDUCE-2036 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Reporter: Wittawat Tantisiriroj >Priority: Minor > Attachments: hdfs-raid.tar.gz, MAPREDUCE-2036.patch, RaidTool.pdf > > > Features: > 1) HAR-like Tool > 2) RAID5/RAID6 & pluggable interface to implement additional coding > 3) Enable to group blocks across files > 4) Portable across cluster since all necessary metadata is embedded > While it was developed separately from HAR or RAID due to time constraints, > it would make sense to integrate with either of them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive
[ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wittawat Tantisiriroj updated MAPREDUCE-2036: - Attachment: (was: RaidTool.docx) > Enable Erasure Code in Tool similar to Hadoop Archive > - > > Key: MAPREDUCE-2036 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Reporter: Wittawat Tantisiriroj >Priority: Minor > Attachments: RaidTool.pdf > > > Features: > 1) HAR-like Tool > 2) RAID5/RAID6 & pluggable interface to implement additional coding > 3) Enable to group blocks across files > 4) Portable across cluster since all necessary metadata is embedded > While it was developed separately from HAR or RAID due to time constraints, > it would make sense to integrate with either of them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive
[ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wittawat Tantisiriroj updated MAPREDUCE-2036: - Attachment: RaidTool.pdf PDF Version of design document is uploaded. > Enable Erasure Code in Tool similar to Hadoop Archive > - > > Key: MAPREDUCE-2036 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Reporter: Wittawat Tantisiriroj >Priority: Minor > Attachments: RaidTool.pdf > > > Features: > 1) HAR-like Tool > 2) RAID5/RAID6 & pluggable interface to implement additional coding > 3) Enable to group blocks across files > 4) Portable across cluster since all necessary metadata is embedded > While it was developed separately from HAR or RAID due to time constraints, > it would make sense to integrate with either of them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive
[ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wittawat Tantisiriroj updated MAPREDUCE-2036: - Attachment: RaidTool.docx Design Document > Enable Erasure Code in Tool similar to Hadoop Archive > - > > Key: MAPREDUCE-2036 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Reporter: Wittawat Tantisiriroj >Priority: Minor > Attachments: RaidTool.docx > > > Features: > 1) HAR-like Tool > 2) RAID5/RAID6 & pluggable interface to implement additional coding > 3) Enable to group blocks across files > 4) Portable across cluster since all necessary metadata is embedded > While it was developed separately from HAR or RAID due to time constraints, > it would make sense to integrate with either of them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive
Enable Erasure Code in Tool similar to Hadoop Archive - Key: MAPREDUCE-2036 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Reporter: Wittawat Tantisiriroj Priority: Minor Features: 1) HAR-like Tool 2) RAID5/RAID6 & pluggable interface to implement additional coding 3) Enable to group blocks across files 4) Portable across cluster since all necessary metadata is embedded While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2035) Enable -Wall and fix warnings in task-controller build
[ https://issues.apache.org/jira/browse/MAPREDUCE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903488#action_12903488 ] Todd Lipcon commented on MAPREDUCE-2035: Is -Wall really compiler-specific? Do you have some autoconf foo to share to make this more portable? Also, given that this is the *linux* task-controller, don't you think it's fair to assume GCC for now, until someone takes up the task of making it run on other systems? > Enable -Wall and fix warnings in task-controller build > -- > > Key: MAPREDUCE-2035 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2035 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task-controller >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > Attachments: mapreduce-2035-toreview.txt, mapreduce-2035.txt > > > Enabling -Wall shows a bunch of warnings. We should enable them and then fix > them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2023) TestDFSIO read test may not read specified bytes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-2023: - Attachment: mr-2023-20100826.patch trivial patch. for-loop is replaced with a while-loop > TestDFSIO read test may not read specified bytes. > - > > Key: MAPREDUCE-2023 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2023 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: benchmarks >Reporter: Hong Tang >Assignee: Hong Tang > Attachments: mr-2023-20100826.patch, TestFsRead.java > > > TestDFSIO's read test may read less bytes than specified when reading large > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2023) TestDFSIO read test may not read specified bytes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-2023: - Status: Patch Available (was: Open) > TestDFSIO read test may not read specified bytes. > - > > Key: MAPREDUCE-2023 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2023 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: benchmarks >Reporter: Hong Tang >Assignee: Hong Tang > Attachments: mr-2023-20100826.patch, TestFsRead.java > > > TestDFSIO's read test may read less bytes than specified when reading large > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-2023) TestDFSIO read test may not read specified bytes.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang reassigned MAPREDUCE-2023: Assignee: Hong Tang > TestDFSIO read test may not read specified bytes. > - > > Key: MAPREDUCE-2023 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2023 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: benchmarks >Reporter: Hong Tang >Assignee: Hong Tang > Attachments: TestFsRead.java > > > TestDFSIO's read test may read less bytes than specified when reading large > files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-323: Status: Open (was: Patch Available) Cancelling, pending investigation of some unit test failures: TestRumenJobTraces -- test case should use new API TestJobOutputCommitter TestTaskLauncher expect to fix in a few hours > Improve the way job history files are managed > - > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.21.0, 0.22.0 >Reporter: Amar Kamat >Assignee: Dick King >Priority: Critical > Attachments: MR323--2010-08-20--1533.patch, > MR323--2010-08-25--1632.patch > > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2035) Enable -Wall and fix warnings in task-controller build
[ https://issues.apache.org/jira/browse/MAPREDUCE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903398#action_12903398 ] Allen Wittenauer commented on MAPREDUCE-2035: - -1 Compiler specific flags should get added after compiler detection. > Enable -Wall and fix warnings in task-controller build > -- > > Key: MAPREDUCE-2035 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2035 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task-controller >Affects Versions: 0.22.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > Attachments: mapreduce-2035-toreview.txt, mapreduce-2035.txt > > > Enabling -Wall shows a bunch of warnings. We should enable them and then fix > them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.