Re: Append supported in hadoop 1.0.x branch?
Thanks again for your response, one more clarification though. Are there any conditions under which I can trust append to work? For example, if I use ZK to lock the hdfs file to ensure there are no concurrent writes, then sync close the file after each write? Also, I assume this has nothing to do with file formats (was a little confuses by one of the links below) and that append should not be trusted even when using a simple text file. Finally, any thoughts on the comment here http://hbase.apache.org/book/hadoop.html : Ignore the chicken-little comment you'll find in the hdfs-default.xmlin the description for the dfs.support.append configuration; it says it is not enabled because there are “... bugs in the 'append code' and is not supported in any production cluster.”. This comment is stale, from another era, and while I'm sure there are bugs, the sync/append code has been running in production at large scale deploys and is on by default in the offerings of hadoop by commercial vendors [7http://hbase.apache.org/book/hadoop.html#ftn.d1905e504 ] [8 http://hbase.apache.org/book/hadoop.html#ftn.d1905e514][9http://hbase.apache.org/book/hadoop.html#ftn.d1905e520 ]. I guess this comment is only 'chicken-little' for hbase use case (i.e., sync is ok, append is not)? Cheers, Rod. On Fri, May 18, 2012 at 5:58 PM, Harsh J ha...@cloudera.com wrote: Rodney, There are two things that comprised the 0.20-append branch which added append features, which to break down simply for 1.x: append() - Available: Yes. Supported/Recommended: No. sync() - Available: Yes. Supported/Recommended: Yes. Please also see these links for further info/conversations on this topic thats happened several times before: https://issues.apache.org/jira/browse/HADOOP-8230 http://search-hadoop.com/m/638TD3bAXB1 http://search-hadoop.com/m/hBPRp1EWELS1 Let us know if you have further questions. On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell r...@rodojojo.com wrote: Hi, Is FileSystem.append supported on hadoop 1.0.x? (1.0.3 in particular). Reading this list I thought it was back in for 1.0, but it's disabled by default so I'm not 100% sure. It would be great to get a definitive answer. Cheers, Rod. -- Harsh J
Re: Append supported in hadoop 1.0.x branch?
Rodney, I haven't tested append() enough times to know what triggers it but I have often observed, both over the 0.20-append-based clusters I've troubleshooted on and on the cdh-users list, that append() has lead to odd tailing block sizes (beyond maximum allowed) and on/off warnings of corrupt/failed blocks (relating to only the appended files though, not random). In a few cases this leads to temporary unavailability of data as client reports all blocks bad to the NN, which is a 'data loss' case as any (for the moment anyway). I've not seen permanent or spreading corruption, but this case was odd enough for me to not recommend append() (not sync()) over that branch/releases that use it. YMMV. I'm unsure of the JIRA here or if this is the issue with the new 2.x impl. as well, and I'll let other HDFS devs answer that. The HBase case speaks for HBase's own use - which is just sync() at this point. This is further why https://issues.apache.org/jira/browse/HADOOP-8230 was done to separate configs when toggling these two append() and sync() calls, so the docs don't appear confusing as they do now. On Mon, May 21, 2012 at 2:59 PM, Rodney O'Donnell r...@rodojojo.com wrote: Thanks again for your response, one more clarification though. Are there any conditions under which I can trust append to work? For example, if I use ZK to lock the hdfs file to ensure there are no concurrent writes, then sync close the file after each write? Also, I assume this has nothing to do with file formats (was a little confuses by one of the links below) and that append should not be trusted even when using a simple text file. Finally, any thoughts on the comment here http://hbase.apache.org/book/hadoop.html : Ignore the chicken-little comment you'll find in the hdfs-default.xmlin the description for the dfs.support.append configuration; it says it is not enabled because there are “... bugs in the 'append code' and is not supported in any production cluster.”. This comment is stale, from another era, and while I'm sure there are bugs, the sync/append code has been running in production at large scale deploys and is on by default in the offerings of hadoop by commercial vendors [7http://hbase.apache.org/book/hadoop.html#ftn.d1905e504 ] [8 http://hbase.apache.org/book/hadoop.html#ftn.d1905e514][9http://hbase.apache.org/book/hadoop.html#ftn.d1905e520 ]. I guess this comment is only 'chicken-little' for hbase use case (i.e., sync is ok, append is not)? Cheers, Rod. On Fri, May 18, 2012 at 5:58 PM, Harsh J ha...@cloudera.com wrote: Rodney, There are two things that comprised the 0.20-append branch which added append features, which to break down simply for 1.x: append() - Available: Yes. Supported/Recommended: No. sync() - Available: Yes. Supported/Recommended: Yes. Please also see these links for further info/conversations on this topic thats happened several times before: https://issues.apache.org/jira/browse/HADOOP-8230 http://search-hadoop.com/m/638TD3bAXB1 http://search-hadoop.com/m/hBPRp1EWELS1 Let us know if you have further questions. On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell r...@rodojojo.com wrote: Hi, Is FileSystem.append supported on hadoop 1.0.x? (1.0.3 in particular). Reading this list I thought it was back in for 1.0, but it's disabled by default so I'm not 100% sure. It would be great to get a definitive answer. Cheers, Rod. -- Harsh J -- Harsh J
Re: Append supported in hadoop 1.0.x branch?
Thanks again Harsh for your thorough response! On Mon, May 21, 2012 at 9:39 PM, Harsh J ha...@cloudera.com wrote: Rodney, I haven't tested append() enough times to know what triggers it but I have often observed, both over the 0.20-append-based clusters I've troubleshooted on and on the cdh-users list, that append() has lead to odd tailing block sizes (beyond maximum allowed) and on/off warnings of corrupt/failed blocks (relating to only the appended files though, not random). In a few cases this leads to temporary unavailability of data as client reports all blocks bad to the NN, which is a 'data loss' case as any (for the moment anyway). I've not seen permanent or spreading corruption, but this case was odd enough for me to not recommend append() (not sync()) over that branch/releases that use it. YMMV. I'm unsure of the JIRA here or if this is the issue with the new 2.x impl. as well, and I'll let other HDFS devs answer that. The HBase case speaks for HBase's own use - which is just sync() at this point. This is further why https://issues.apache.org/jira/browse/HADOOP-8230 was done to separate configs when toggling these two append() and sync() calls, so the docs don't appear confusing as they do now. On Mon, May 21, 2012 at 2:59 PM, Rodney O'Donnell r...@rodojojo.com wrote: Thanks again for your response, one more clarification though. Are there any conditions under which I can trust append to work? For example, if I use ZK to lock the hdfs file to ensure there are no concurrent writes, then sync close the file after each write? Also, I assume this has nothing to do with file formats (was a little confuses by one of the links below) and that append should not be trusted even when using a simple text file. Finally, any thoughts on the comment here http://hbase.apache.org/book/hadoop.html : Ignore the chicken-little comment you'll find in the hdfs-default.xmlin the description for the dfs.support.append configuration; it says it is not enabled because there are “... bugs in the 'append code' and is not supported in any production cluster.”. This comment is stale, from another era, and while I'm sure there are bugs, the sync/append code has been running in production at large scale deploys and is on by default in the offerings of hadoop by commercial vendors [7http://hbase.apache.org/book/hadoop.html#ftn.d1905e504 ] [8 http://hbase.apache.org/book/hadoop.html#ftn.d1905e514][9 http://hbase.apache.org/book/hadoop.html#ftn.d1905e520 ]. I guess this comment is only 'chicken-little' for hbase use case (i.e., sync is ok, append is not)? Cheers, Rod. On Fri, May 18, 2012 at 5:58 PM, Harsh J ha...@cloudera.com wrote: Rodney, There are two things that comprised the 0.20-append branch which added append features, which to break down simply for 1.x: append() - Available: Yes. Supported/Recommended: No. sync() - Available: Yes. Supported/Recommended: Yes. Please also see these links for further info/conversations on this topic thats happened several times before: https://issues.apache.org/jira/browse/HADOOP-8230 http://search-hadoop.com/m/638TD3bAXB1 http://search-hadoop.com/m/hBPRp1EWELS1 Let us know if you have further questions. On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell r...@rodojojo.com wrote: Hi, Is FileSystem.append supported on hadoop 1.0.x? (1.0.3 in particular). Reading this list I thought it was back in for 1.0, but it's disabled by default so I'm not 100% sure. It would be great to get a definitive answer. Cheers, Rod. -- Harsh J -- Harsh J
Append supported in hadoop 1.0.x branch?
Hi, Is FileSystem.append supported on hadoop 1.0.x? (1.0.3 in particular). Reading this list I thought it was back in for 1.0, but it's disabled by default so I'm not 100% sure. It would be great to get a definitive answer. Cheers, Rod.
Re: Append supported in hadoop 1.0.x branch?
Rodney, There are two things that comprised the 0.20-append branch which added append features, which to break down simply for 1.x: append() - Available: Yes. Supported/Recommended: No. sync() - Available: Yes. Supported/Recommended: Yes. Please also see these links for further info/conversations on this topic thats happened several times before: https://issues.apache.org/jira/browse/HADOOP-8230 http://search-hadoop.com/m/638TD3bAXB1 http://search-hadoop.com/m/hBPRp1EWELS1 Let us know if you have further questions. On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell r...@rodojojo.com wrote: Hi, Is FileSystem.append supported on hadoop 1.0.x? (1.0.3 in particular). Reading this list I thought it was back in for 1.0, but it's disabled by default so I'm not 100% sure. It would be great to get a definitive answer. Cheers, Rod. -- Harsh J