[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated MAPREDUCE-6823:
--------------------------------------
    Attachment: MAPREDUCE-6823-004.patch

Patch 004; in sync with the S3A patch being revieed at HADOOP-14971/HADOOP-15003

dealing with the various checkstyle issues. This code has been tested all the 
way through spark now, though it's not completely hidden, particularly in the 
case of Parquet, where Spark has some hard expectations about the type of 
committer there. All  is working (testable, because the _SUCCESS file generated 
by the new committers contains some summary data)

> FileOutputFormat to support configurable PathOutputCommitter factory
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6823
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6823
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 3.0.0-alpha2
>         Environment: Targeting S3 as the output of work
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> MAPREDUCE-6823-002.patch, MAPREDUCE-6823-002.patch, MAPREDUCE-6823-004.patch
>
>
> In HADOOP-13786 I'm adding a custom subclass for FileOutputFormat, one which 
> can talk direct to the S3A Filesystem for more efficient operations, better 
> failure modes, and, most critically, as part of HADOOP-13345, atomic commit 
> of output. The normal committer relies on directory rename() being atomic for 
> this; for S3 we don't have that luxury.
> To support a custom committer, we need to be able to tell FileOutputFormat 
> (and implicitly, all subclasses which don't have their own custom committer), 
> to use our new {{S3AOutputCommitter}}.
> I propose: 
> # {{FileOutputFormat}} takes a factory to create committers.
> # The factory to take a URI and {{TaskAttemptContext}} and return a committer
> # the default implementation always returns a {{FileOutputCommitter}}
> # A configuration option allows a new factory to be named
> # An {{S3AOutputCommitterFactory}} to return a  {{FileOutputCommitter}} or 
> new {{S3AOutputCommitter}} depending upon the URI of the destination.
> Note that MRv1 already supports configurable committers; this is only the V2 
> API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to