Re: Append supported in hadoop 1.0.x branch?

2012-05-21 Thread Rodney O'Donnell
Thanks again for your response, one more clarification though.

Are there any conditions under which I can trust append to work?

For example, if I use ZK to lock the hdfs file to ensure there are no
concurrent writes, then sync  close the file after each write?
Also, I assume this has nothing to do with file formats (was a little
confuses by one of the links below) and that append should not be trusted
even when using a simple text file.

Finally, any thoughts on the comment here
http://hbase.apache.org/book/hadoop.html :

Ignore the chicken-little comment you'll find in the
hdfs-default.xmlin the description for the
dfs.support.append configuration; it says it is not enabled because there
are “... bugs in the 'append code' and is not supported in any
production cluster.”. This comment is stale, from another era, and while
I'm sure there are bugs, the
sync/append code has been running in production at large scale deploys
and is on by default in the offerings of hadoop by commercial vendors
[7http://hbase.apache.org/book/hadoop.html#ftn.d1905e504
] [8 
http://hbase.apache.org/book/hadoop.html#ftn.d1905e514][9http://hbase.apache.org/book/hadoop.html#ftn.d1905e520
].

I guess this comment is only 'chicken-little' for hbase use case (i.e.,
sync is ok, append is not)?

Cheers,

Rod.


On Fri, May 18, 2012 at 5:58 PM, Harsh J ha...@cloudera.com wrote:

 Rodney,

 There are two things that comprised the 0.20-append branch which added
 append features, which to break down simply for 1.x:

 append() - Available: Yes. Supported/Recommended: No.
 sync() - Available: Yes. Supported/Recommended: Yes.

 Please also see these links for further info/conversations on this
 topic thats happened several times before:

 https://issues.apache.org/jira/browse/HADOOP-8230
 http://search-hadoop.com/m/638TD3bAXB1
 http://search-hadoop.com/m/hBPRp1EWELS1

 Let us know if you have further questions.

 On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell r...@rodojojo.com
 wrote:
  Hi,
 
  Is FileSystem.append supported on hadoop 1.0.x?  (1.0.3 in particular).
 
  Reading this list I thought it was back in for 1.0, but it's disabled by
  default so I'm not 100% sure.
  It would be great to get a definitive answer.
 
  Cheers,
 
  Rod.



 --
 Harsh J



Re: Append supported in hadoop 1.0.x branch?

2012-05-21 Thread Harsh J
Rodney,

I haven't tested append() enough times to know what triggers it but I
have often observed, both over the 0.20-append-based clusters I've
troubleshooted on and on the cdh-users list, that append() has lead to
odd tailing block sizes (beyond maximum allowed) and on/off warnings
of corrupt/failed blocks (relating to only the appended files though,
not random). In a few cases this leads to temporary unavailability of
data as client reports all blocks bad to the NN, which is a 'data
loss' case as any (for the moment anyway). I've not seen permanent or
spreading corruption, but this case was odd enough for me to not
recommend append() (not sync()) over that branch/releases that use it.
 YMMV. I'm unsure of the JIRA here or if this is the issue with the
new 2.x impl. as well, and I'll let other HDFS devs answer that.

The HBase case speaks for HBase's own use - which is just sync() at
this point. This is further why
https://issues.apache.org/jira/browse/HADOOP-8230 was done to separate
configs when toggling these two append() and sync() calls, so the docs
don't appear confusing as they do now.

On Mon, May 21, 2012 at 2:59 PM, Rodney O'Donnell r...@rodojojo.com wrote:
 Thanks again for your response, one more clarification though.

 Are there any conditions under which I can trust append to work?

 For example, if I use ZK to lock the hdfs file to ensure there are no
 concurrent writes, then sync  close the file after each write?
 Also, I assume this has nothing to do with file formats (was a little
 confuses by one of the links below) and that append should not be trusted
 even when using a simple text file.

 Finally, any thoughts on the comment here
 http://hbase.apache.org/book/hadoop.html :

    Ignore the chicken-little comment you'll find in the
 hdfs-default.xmlin the description for the
 dfs.support.append configuration; it says it is not enabled because there
    are “... bugs in the 'append code' and is not supported in any
 production cluster.”. This comment is stale, from another era, and while
 I'm sure there are bugs, the
    sync/append code has been running in production at large scale deploys
 and is on by default in the offerings of hadoop by commercial vendors
 [7http://hbase.apache.org/book/hadoop.html#ftn.d1905e504
 ] [8 
 http://hbase.apache.org/book/hadoop.html#ftn.d1905e514][9http://hbase.apache.org/book/hadoop.html#ftn.d1905e520
 ].

 I guess this comment is only 'chicken-little' for hbase use case (i.e.,
 sync is ok, append is not)?

 Cheers,

 Rod.


 On Fri, May 18, 2012 at 5:58 PM, Harsh J ha...@cloudera.com wrote:

 Rodney,

 There are two things that comprised the 0.20-append branch which added
 append features, which to break down simply for 1.x:

 append() - Available: Yes. Supported/Recommended: No.
 sync() - Available: Yes. Supported/Recommended: Yes.

 Please also see these links for further info/conversations on this
 topic thats happened several times before:

 https://issues.apache.org/jira/browse/HADOOP-8230
 http://search-hadoop.com/m/638TD3bAXB1
 http://search-hadoop.com/m/hBPRp1EWELS1

 Let us know if you have further questions.

 On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell r...@rodojojo.com
 wrote:
  Hi,
 
  Is FileSystem.append supported on hadoop 1.0.x?  (1.0.3 in particular).
 
  Reading this list I thought it was back in for 1.0, but it's disabled by
  default so I'm not 100% sure.
  It would be great to get a definitive answer.
 
  Cheers,
 
  Rod.



 --
 Harsh J




-- 
Harsh J


Re: Append supported in hadoop 1.0.x branch?

2012-05-21 Thread Rodney O'Donnell
Thanks again Harsh for your thorough response!

On Mon, May 21, 2012 at 9:39 PM, Harsh J ha...@cloudera.com wrote:

 Rodney,

 I haven't tested append() enough times to know what triggers it but I
 have often observed, both over the 0.20-append-based clusters I've
 troubleshooted on and on the cdh-users list, that append() has lead to
 odd tailing block sizes (beyond maximum allowed) and on/off warnings
 of corrupt/failed blocks (relating to only the appended files though,
 not random). In a few cases this leads to temporary unavailability of
 data as client reports all blocks bad to the NN, which is a 'data
 loss' case as any (for the moment anyway). I've not seen permanent or
 spreading corruption, but this case was odd enough for me to not
 recommend append() (not sync()) over that branch/releases that use it.
  YMMV. I'm unsure of the JIRA here or if this is the issue with the
 new 2.x impl. as well, and I'll let other HDFS devs answer that.

 The HBase case speaks for HBase's own use - which is just sync() at
 this point. This is further why
 https://issues.apache.org/jira/browse/HADOOP-8230 was done to separate
 configs when toggling these two append() and sync() calls, so the docs
 don't appear confusing as they do now.

 On Mon, May 21, 2012 at 2:59 PM, Rodney O'Donnell r...@rodojojo.com
 wrote:
  Thanks again for your response, one more clarification though.
 
  Are there any conditions under which I can trust append to work?
 
  For example, if I use ZK to lock the hdfs file to ensure there are no
  concurrent writes, then sync  close the file after each write?
  Also, I assume this has nothing to do with file formats (was a little
  confuses by one of the links below) and that append should not be trusted
  even when using a simple text file.
 
  Finally, any thoughts on the comment here
  http://hbase.apache.org/book/hadoop.html :
 
 Ignore the chicken-little comment you'll find in the
  hdfs-default.xmlin the description for the
  dfs.support.append configuration; it says it is not enabled because there
 are “... bugs in the 'append code' and is not supported in any
  production cluster.”. This comment is stale, from another era, and while
  I'm sure there are bugs, the
 sync/append code has been running in production at large scale deploys
  and is on by default in the offerings of hadoop by commercial vendors
  [7http://hbase.apache.org/book/hadoop.html#ftn.d1905e504
  ] [8 http://hbase.apache.org/book/hadoop.html#ftn.d1905e514][9
 http://hbase.apache.org/book/hadoop.html#ftn.d1905e520
  ].
 
  I guess this comment is only 'chicken-little' for hbase use case (i.e.,
  sync is ok, append is not)?
 
  Cheers,
 
  Rod.
 
 
  On Fri, May 18, 2012 at 5:58 PM, Harsh J ha...@cloudera.com wrote:
 
  Rodney,
 
  There are two things that comprised the 0.20-append branch which added
  append features, which to break down simply for 1.x:
 
  append() - Available: Yes. Supported/Recommended: No.
  sync() - Available: Yes. Supported/Recommended: Yes.
 
  Please also see these links for further info/conversations on this
  topic thats happened several times before:
 
  https://issues.apache.org/jira/browse/HADOOP-8230
  http://search-hadoop.com/m/638TD3bAXB1
  http://search-hadoop.com/m/hBPRp1EWELS1
 
  Let us know if you have further questions.
 
  On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell r...@rodojojo.com
  wrote:
   Hi,
  
   Is FileSystem.append supported on hadoop 1.0.x?  (1.0.3 in
 particular).
  
   Reading this list I thought it was back in for 1.0, but it's disabled
 by
   default so I'm not 100% sure.
   It would be great to get a definitive answer.
  
   Cheers,
  
   Rod.
 
 
 
  --
  Harsh J
 



 --
 Harsh J



Append supported in hadoop 1.0.x branch?

2012-05-18 Thread Rodney O'Donnell
Hi,

Is FileSystem.append supported on hadoop 1.0.x?  (1.0.3 in particular).

Reading this list I thought it was back in for 1.0, but it's disabled by
default so I'm not 100% sure.
It would be great to get a definitive answer.

Cheers,

Rod.


Re: Append supported in hadoop 1.0.x branch?

2012-05-18 Thread Harsh J
Rodney,

There are two things that comprised the 0.20-append branch which added
append features, which to break down simply for 1.x:

append() - Available: Yes. Supported/Recommended: No.
sync() - Available: Yes. Supported/Recommended: Yes.

Please also see these links for further info/conversations on this
topic thats happened several times before:

https://issues.apache.org/jira/browse/HADOOP-8230
http://search-hadoop.com/m/638TD3bAXB1
http://search-hadoop.com/m/hBPRp1EWELS1

Let us know if you have further questions.

On Fri, May 18, 2012 at 12:12 PM, Rodney O'Donnell r...@rodojojo.com wrote:
 Hi,

 Is FileSystem.append supported on hadoop 1.0.x?  (1.0.3 in particular).

 Reading this list I thought it was back in for 1.0, but it's disabled by
 default so I'm not 100% sure.
 It would be great to get a definitive answer.

 Cheers,

 Rod.



-- 
Harsh J