Re: Append supported in hadoop 1.0.x branch?

Rodney O'Donnell Mon, 21 May 2012 06:24:06 -0700

Thanks again Harsh for your thorough response!

On Mon, May 21, 2012 at 9:39 PM, Harsh J <ha...@cloudera.com> wrote:


> Rodney,
>
> I haven't tested append() enough times to know what triggers it but I
> have often observed, both over the 0.20-append-based clusters I've
> troubleshooted on and on the cdh-users list, that append() has lead to
> odd tailing block sizes (beyond maximum allowed) and on/off warnings
> of corrupt/failed blocks (relating to only the appended files though,
> not random). In a few cases this leads to temporary unavailability of
> data as client reports all blocks bad to the NN, which is a 'data
> loss' case as any (for the moment anyway). I've not seen permanent or
> spreading corruption, but this case was odd enough for me to not
> recommend append() (not sync()) over that branch/releases that use it.
>  YMMV. I'm unsure of the JIRA here or if this is the issue with the
> new 2.x impl. as well, and I'll let other HDFS devs answer that.
>
> The HBase case speaks for HBase's own use - which is just sync() at
> this point. This is further why
> https://issues.apache.org/jira/browse/HADOOP-8230 was done to separate
> configs when toggling these two append() and sync() calls, so the docs
> don't appear confusing as they do now.
>
> On Mon, May 21, 2012 at 2:59 PM, Rodney O'Donnell <r...@rodojojo.com>
> wrote:
> > Thanks again for your response, one more clarification though.
> >
> > Are there any conditions under which I can trust append to work?
> >
> > For example, if I use ZK to lock the hdfs file to ensure there are no
> > concurrent writes, then sync & close the file after each write?
> > Also, I assume this has nothing to do with file formats (was a little
> > confuses by one of the links below) and that append should not be trusted
> > even when using a simple text file.
> >
> > Finally, any thoughts on the comment here
> > http://hbase.apache.org/book/hadoop.html :
> >
> >    Ignore the chicken-little comment you'll find in the
> > hdfs-default.xmlin the description for the
> > dfs.support.append configuration; it says it is not enabled because there
> >    are “... bugs in the 'append code' and is not supported in any
> > production cluster.”. This comment is stale, from another era, and while
> > I'm sure there are bugs, the
> >    sync/append code has been running in production at large scale deploys
> > and is on by default in the offerings of hadoop by commercial vendors
> > [7<http://hbase.apache.org/book/hadoop.html#ftn.d1905e504>
> > ] [8 <http://hbase.apache.org/book/hadoop.html#ftn.d1905e514>][9<
> http://hbase.apache.org/book/hadoop.html#ftn.d1905e520>
> > ].
> >
> > I guess this comment is only 'chicken-little' for hbase use case (i.e.,
> > sync is ok, append is not)?
> >
> > Cheers,
> >
> > Rod.
> >
> >
> > On Fri, May 18, 2012 at 5:58 PM, Harsh J <ha...@cloudera.com> wrote:
> >
> >> Rodney,
> >>
> >> There are two things that comprised the 0.20-append branch which added
> >> "append" features, which to break down simply for 1.x:
> >>
> >> append() - Available: Yes. Supported/Recommended: No.
> >> sync() - Available: Yes. Supported/Recommended: Yes.
> >>
> >> Please also see these links for further info/conversations on this
> >> topic thats happened several times before:
> >>
> >> https://issues.apache.org/jira/browse/HADOOP-8230
> >> http://search-hadoop.com/m/638TD3bAXB1
> >> http://search-hadoop.com/m/hBPRp1EWELS1
> >>
> >> Let us know if you have further questions.
> >>
> >> On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell <r...@rodojojo.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > Is FileSystem.append supported on hadoop 1.0.x?  (1.0.3 in
> particular).
> >> >
> >> > Reading this list I thought it was back in for 1.0, but it's disabled
> by
> >> > default so I'm not 100% sure.
> >> > It would be great to get a definitive answer.
> >> >
> >> > Cheers,
> >> >
> >> > Rod.
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
>
>
>
> --
> Harsh J
>

Re: Append supported in hadoop 1.0.x branch?

Reply via email to