I Think the conclusion is "no change for now", but people do need to understand the risks better. One thing I'd like to understand are: which FileOutputFormat subclasses generate unique filenames which are different in different task attempts? I've heard a mention of Avro here, but not looked in the code
On Thu, 24 Sep 2020 at 17:27, epa...@apache.org <epa...@apache.org> wrote: > Thanks Steve and Jim for bringing this issue to our attention. > > IIUC, Serial commit takes minutes with mrv1, whereas with mrv2 it is very > quick. With this kind of performance > difference, is wise to change the default behavior for released versions > of Hadoop? Should this be limited to > trunk? > > Thanks, > -Eric Payne > > > On Wednesday, September 23, 2020, 2:16:14 PM CDT, Jim Brennan > <james.bren...@verizonmedia.com.invalid> wrote: > > I replied in the Jira. The speed up provided by the v2 commit algorithm > is very important to us at Verizon Media (Yahoo). Please do not remove it. > I referred to this comment from Jason Lowe on the original Jira: > > https://issues.apache.org/jira/browse/MAPREDUCE-4815?focusedCommentId=14271115&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14271115 > > I think it would be appropriate to better document the limitations of the > v2 algorithm and possibly make it not be the default, as long as we can > still use it. > > On Wed, Sep 23, 2020 at 2:07 PM Igor Dvorzhak <i...@google.com.invalid> > wrote: > > > What will be the solution for object stores to have fast and correct > > commit algorithms? > > > > On Wed, Sep 23, 2020 at 11:42 AM Steve Loughran > > <ste...@cloudera.com.invalid> wrote: > > > >> I've got a PR up to completely remove the v2 commit algorithm > >> > >> https://github.com/apache/hadoop/pull/2320 > >> > >> That may seem overkill, but while *we* know there's a small window of > risk > >> (task attempt 1 failing partway through a nonatomic commit), that's not > >> known/appreciated by others. > >> > >> The patch removes the v2 codepath from FileOutputCommitter, making it a > >> lot > >> less complicated, and when v2 is requested, a warning is printed and the > >> option ignored. > >> > >> Overkill? Maybe. But it guarantees correctness > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org > >