[ https://issues.apache.org/jira/browse/MAPREDUCE-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated MAPREDUCE-6956: -------------------------------------- Status: Patch Available (was: Open) > FileOutputCommitter to gain abstract superclass PathOutputCommitter > ------------------------------------------------------------------- > > Key: MAPREDUCE-6956 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6956 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 > Affects Versions: 3.0.0-beta1 > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: MAPREDUCE-6956-001.patch > > > This is the initial step of MAPREDUCE-6823, which proposes a factory behind > {{FileOutputFormat}} to create different committers for different > filesystems, if so configured.. > This patch simply adds the new abstract superclass of > {{FileOutputCommitter}}, {{PathOutputCommitter extends OutputCommitter}}. > This abstract class adds the {{getWorkPath()}} method as an abstract method, > with {{FIleOutputCommitter}} being the implementation.. > {{FileOutputFormat}} then relaxes its requirement of any committer returned > by {{getOutputCommitter()}}, so that instead of requiring a > {{FileOutputCommitter}} or subclass, it only needs a {{PathOutputCommitter}}, > using {{PathOutputCommitter.getWorkPath()}} to get the work path. > What does that do? > It allows people to implement subclasses of {{FileOutputFormat}} which can > provide their own committers *which don't need to inherit the complexity that > FileOutputCommitter has acquired over time* > Currently anyone implementing a new committer (example: Netflix S3 committer) > needs to subclass {{FileOutputCommitter}}, which is too complex to understand > except under a debugger with co-recursive routines, lots of methods which > need to be overwritten to guarantee a safe subclass, and, because of its > critical role and known subclassing, something which isn't ever going to be > cleaned up. > A new, lean, parent class which {{FileOutputFormat}} can handle allows people > to write new committers which don't have to worry about implementation > details of {{FileOutputCommitter}}, but instead how well they implement the > semantics of committing work. > The full MAPREDUCE-6823 goes beyond this with a change to > {{FileOutputFormat}} for a factory for creating FS-specific > {{PathOutputCommitter}} instances. This patch doesn't include that, as that > is something which needs to be reviewed in the context of HADOOP-13786 and > ideally 1+ committer for another store, so people can say "this factory model > works". > All I'm proposing here is: tune the committer class hierarchy in MRv2 so that > people can more easily implement committers, and when that factory is done, > for it to be switched to easily. And I'd like this in branch-3 from the > outset, so existing code which calls {{FileOutputFormat.getCommitter()}} to > get a {{FileOutputCommitter}} *just to call getWorkPath()* can move to the > new interface across all of Hadoop 3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org