[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632626#comment-13632626 ] Karthik Kambatla commented on MAPREDUCE-5110: - Thanks for chiming in, Vinod. My intention was precisely to add an aggressive timeout for task attempt launches and keeping it job-configurable should be good. We can implement it either on JT or TT. Do you think it is okay to implement in on TT? Please suggest - I ll upload a patch accordingly. If interested, the user should be able to configure this timeout to be shorter than the tracker-expiry-interval to ensure a single attempt. > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, > mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632543#comment-13632543 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5110: Trying to understand this, mostly agree with what Arun said. To summarize: - Strictly guaranteeing serial execution of task attempts is not possible in general and is a non-requirement - JT already deals with all kinds of slow-ness with tasks and irrespective of this patch, clients have to deal with the slowness. bq. Where possible (i.e., not transient network partitions), run a single task attempt for a task when speculation is turned off Seems an arbitrary non-requirement, don't see what we gain from this. The JIRA started with the above goal which isn't worth pursing from what I see, but now it seems to have transformed into something more benign. Looked at the patch. It looks like you want quicker failure when tasks are getting launched/localized to meet some kind of SLAs? If that is the case, instead of calling it a 'TT-side implementation', if we call it an aggressive timeout enforced on TTs for tasks, and make it job-configurable, that should do. Right? > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, > mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632397#comment-13632397 ] Karthik Kambatla commented on MAPREDUCE-5110: - Thanks for your response, Arun. Let me take a step back and explain in detail: AIC the issue this JIRA address is - "Where possible (i.e., not transient network partitions), run a single task attempt for a task when speculation is turned off". A JT solution (a.k.a MAPREDUCE-2217) spawns another task attempt, but doesn't kill the currently running task before doing so. Through a TT-side solution (patch here), one will be able to kill the currently running attempt first before spawning another task attempt. I see your point of avoid-TT-changes-if-possible. I guess the trade-off is between marginal increase in TT code complexity (a timeout check and logging changes) and running multiple attempts of the task. Given the low cost of the fix, I believe we should address this scenario which seems to be far more frequent compared to network partitions. > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, > mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632377#comment-13632377 ] Arun C Murthy commented on MAPREDUCE-5110: -- [~kkambatl] Sorry if I wasn't clear. My point is simple: The JT already has code to deal with slow launches i.e. ExpireLaunchingTasks. What benefit do we get by re-implementing this on TT? Why not reduce the timeout on JT? > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, > mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623925#comment-13623925 ] Karthik Kambatla commented on MAPREDUCE-5110: - Hey Arun, sorry for the delay. I was trying to figure out the root cause behind these occasional launch delays, we encounter them once in a while on a highly loaded cluster. It looks like a node-specific hardware/OS issue. When this happens, the task in question delays the entire job. I still believe limiting the task launch time is helpful, particularly in the case of node-specific hardware issues - failing disks, slow networks etc. Also, I discussed this offline with Alejandro and Tom, and they suggested we might not want to introduce a new config for this, but may be use half of the mapred.task.timeout. What do you think of that? > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, > mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623893#comment-13623893 ] Arun C Murthy commented on MAPREDUCE-5110: -- [~kkambatl] I haven't heard back, you comfortable with my suggestion that this is not worth additional complexity? Thanks. > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, > mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618468#comment-13618468 ] Arun C Murthy commented on MAPREDUCE-5110: -- bq. All this said, if you are uncomfortable with the JT changes, I can restrict the changes to TT. [~kkambatl] As I said in my comment, I'm more worried about TT-side than JT... The JT already has code to catch slow launches i.e. ExpireLaunchingTasks thread. Why do we need to re-invent it on TT side? IAC, the problem we are trying to solve here is fairly obscure i.e. a hardware issue causing slow task launches... even that is already tracked by JT, right? > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, > mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617577#comment-13617577 ] Hadoop QA commented on MAPREDUCE-5110: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576121/mr-5110-tt-only.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3482//console This message is automatically generated. > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, > mr-5110-tt-only.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617471#comment-13617471 ] Karthik Kambatla commented on MAPREDUCE-5110: - Thanks Arun. Agree that we can't guarantee a single task attempt in the face of a transient network partition. That said, I think there is merit to solving something we can. For instance, the users could have their own SLAs (time or percentile or plain hardware-based) to guard against inconsistencies due to network partitions. bq. I think MAPREDUCE-2217 made an important improvement and we should keep it. However, I'm very scared of trying to implement MAPREDUCE-2217 via TT-side changes, particularly, when we are adding complexity to already squiggly code on the TT. Agree MAPREDUCE-2217 addresses the hung TT case, but only for UNASSIGNED tasks. The RUNNING/COMMIT_PENDING tasks still are addressed by TT. In other words, the rationale of monitoring task progress for RUNNING/COMMIT_PENDING in TT instead of JT applies to this case too. If anything, the proposed patch only makes it consistent. All this said, if you are uncomfortable with the JT changes, I can restrict the changes to TT. > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617336#comment-13617336 ] Arun C Murthy commented on MAPREDUCE-5110: -- bq. If we want to strictly guarantee serial execution of task attempts (say, when speculative execution is turned off), we want to kill the task first before re-scheduling on another node. [~kkambatl] the premise that we can strictly guarantee the above is basically impossible. There a bunch of other scenarios where we won't be guarantee this, for e.g. you might schedule a task on TT which then is deemed 'lost' 10 mins later without a single HB after the schedule; but in reality that TT is just having trouble talking to JT. This means that multiple tasks will be running simultaneously since the JT will re-schedule all tasks on that TT. In reality, this is the more common case (lost TT) and there is, pretty much, nothing we can do about it. However, there are enough checks/balances to ensure there is consistency for the job in the system (longer writeup). As a result, I'm inclined to close this as 'wont fix'. I think MAPREDUCE-2217 made an important improvement and we should keep it. However, I'm very scared of trying to implement MAPREDUCE-2217 via TT-side changes, particularly, when we are adding complexity to already squiggly code on the TT. Makes sense? > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617104#comment-13617104 ] Karthik Kambatla commented on MAPREDUCE-5110: - To answer Arun's question in its entirety: Yes, it reverts MAPREDUCE-2217. MAPREDUCE-2217 was to address the case of hung TaskTracker. Till the time of this JIRA, we never observed the current issue of the TT taking too long to start a task (which only seems to manifest due to hardware issues). The current approach handles both scenarios. > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616847#comment-13616847 ] Hadoop QA commented on MAPREDUCE-5110: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575879/mr-5110.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3476//console This message is automatically generated. > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616779#comment-13616779 ] Karthik Kambatla commented on MAPREDUCE-5110: - If we want to strictly guarantee serial execution of task attempts (say, when speculative execution is turned off), we want to kill the task first before re-scheduling on another node. If we treat these as decoupled events, the JT will schedule another attempt. Meanwhile, before the TT kills it, the task might make some progress violating the above guarantee. > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616768#comment-13616768 ] Arun C Murthy commented on MAPREDUCE-5110: -- So, this essentially reverts MAPREDUCE-2217? Why isn't it sufficient to just let the TT kill the task once JT expires it? > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616523#comment-13616523 ] Alejandro Abdelnur commented on MAPREDUCE-5110: --- +1, nice hunting. > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
[ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616000#comment-13616000 ] Karthik Kambatla commented on MAPREDUCE-5110: - The issue can be reproduced by applying the above patch on a 4 node cluster (8 map/reduce slots) with mapred.tasktracker.expiry.interval set to 1000 and running terasort on 20 GB data. > Long task launch delays can lead to multiple parallel attempts of the task > -- > > Key: MAPREDUCE-5110 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 1.1.2 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: expose-mr-5110.patch > > > If a task takes too long to launch, the JT expires the task and schedules > another attempt. The earlier attempt can start after the later attempt > leading to two parallel attempts running at the same time. This is > particularly an issue if the user turns off speculation and expects a single > attempt of a task to run at any point in time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira