Thanks again Harsh for your thorough response! On Mon, May 21, 2012 at 9:39 PM, Harsh J <ha...@cloudera.com> wrote:
> Rodney, > > I haven't tested append() enough times to know what triggers it but I > have often observed, both over the 0.20-append-based clusters I've > troubleshooted on and on the cdh-users list, that append() has lead to > odd tailing block sizes (beyond maximum allowed) and on/off warnings > of corrupt/failed blocks (relating to only the appended files though, > not random). In a few cases this leads to temporary unavailability of > data as client reports all blocks bad to the NN, which is a 'data > loss' case as any (for the moment anyway). I've not seen permanent or > spreading corruption, but this case was odd enough for me to not > recommend append() (not sync()) over that branch/releases that use it. > YMMV. I'm unsure of the JIRA here or if this is the issue with the > new 2.x impl. as well, and I'll let other HDFS devs answer that. > > The HBase case speaks for HBase's own use - which is just sync() at > this point. This is further why > https://issues.apache.org/jira/browse/HADOOP-8230 was done to separate > configs when toggling these two append() and sync() calls, so the docs > don't appear confusing as they do now. > > On Mon, May 21, 2012 at 2:59 PM, Rodney O'Donnell <r...@rodojojo.com> > wrote: > > Thanks again for your response, one more clarification though. > > > > Are there any conditions under which I can trust append to work? > > > > For example, if I use ZK to lock the hdfs file to ensure there are no > > concurrent writes, then sync & close the file after each write? > > Also, I assume this has nothing to do with file formats (was a little > > confuses by one of the links below) and that append should not be trusted > > even when using a simple text file. > > > > Finally, any thoughts on the comment here > > http://hbase.apache.org/book/hadoop.html : > > > > Ignore the chicken-little comment you'll find in the > > hdfs-default.xmlin the description for the > > dfs.support.append configuration; it says it is not enabled because there > > are “... bugs in the 'append code' and is not supported in any > > production cluster.”. This comment is stale, from another era, and while > > I'm sure there are bugs, the > > sync/append code has been running in production at large scale deploys > > and is on by default in the offerings of hadoop by commercial vendors > > [7<http://hbase.apache.org/book/hadoop.html#ftn.d1905e504> > > ] [8 <http://hbase.apache.org/book/hadoop.html#ftn.d1905e514>][9< > http://hbase.apache.org/book/hadoop.html#ftn.d1905e520> > > ]. > > > > I guess this comment is only 'chicken-little' for hbase use case (i.e., > > sync is ok, append is not)? > > > > Cheers, > > > > Rod. > > > > > > On Fri, May 18, 2012 at 5:58 PM, Harsh J <ha...@cloudera.com> wrote: > > > >> Rodney, > >> > >> There are two things that comprised the 0.20-append branch which added > >> "append" features, which to break down simply for 1.x: > >> > >> append() - Available: Yes. Supported/Recommended: No. > >> sync() - Available: Yes. Supported/Recommended: Yes. > >> > >> Please also see these links for further info/conversations on this > >> topic thats happened several times before: > >> > >> https://issues.apache.org/jira/browse/HADOOP-8230 > >> http://search-hadoop.com/m/638TD3bAXB1 > >> http://search-hadoop.com/m/hBPRp1EWELS1 > >> > >> Let us know if you have further questions. > >> > >> On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell <r...@rodojojo.com> > >> wrote: > >> > Hi, > >> > > >> > Is FileSystem.append supported on hadoop 1.0.x? (1.0.3 in > particular). > >> > > >> > Reading this list I thought it was back in for 1.0, but it's disabled > by > >> > default so I'm not 100% sure. > >> > It would be great to get a definitive answer. > >> > > >> > Cheers, > >> > > >> > Rod. > >> > >> > >> > >> -- > >> Harsh J > >> > > > > -- > Harsh J >