[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7282?focusedWorklogId=618793&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-618793
 ]

ASF GitHub Bot logged work on MAPREDUCE-7282:
---------------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Jul/21 20:17
            Start Date: 05/Jul/21 20:17
    Worklog Time Spent: 10m 
      Work Description: steveloughran closed pull request #2349:
URL: https://github.com/apache/hadoop/pull/2349


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 618793)
    Time Spent: 20m  (was: 10m)

> MR v2 commit algorithm should be deprecated and not the default
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-7282
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7282
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 3.3.0, 3.2.1, 3.1.3, 3.3.1
>            Reporter: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The v2 MR commit algorithm moves files from the task attempt dir into the 
> dest dir on task commit -one by one
> It is therefore not atomic
> # if a task commit fails partway through and another task attempt commits 
> -unless exactly the same filenames are used, output of the first attempt may 
> be included in the final result
> # if a worker partitions partway through task commit, and then continues 
> after another attempt has committed, it may partially overwrite the output 
> -even when the filenames are the same
> Both MR and spark assume that task commits are atomic. Either they need to 
> consider that this is not the case, we add a way to probe for a committer 
> supporting atomic task commit, and the engines both add handling for task 
> commit failures (probably fail job)
> Better: we remove this as the default, maybe also warn when it is being used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to