Re: HDFS Backup nodes

2011-12-14 Thread Scott Carey


On 12/13/11 11:00 PM, M. C. Srivas mcsri...@gmail.com wrote:

Suresh,

As of today, there is no option except to use NFS.  And as you yourself
mention, the first HA prototype when it comes out will require NFS.

How will it 'require' NFS?  Won't any 'remote, high availability storage'
work?  NFS is unreliable unless in my experience unless:
* Its a Netapp
* Its based on Solaris
(caveat: I have only used 5 NFS solution types over the last decade, and
the issues are not data integrity, rather availability from a client
perspective)


A solution with a brief 'stall' in service while a SAN mount switched over
or similar with drbd should be possible and data safe, if this is being
built to truly 'require' NFS that is no better for me than the current
situation, which we manage using OS level tools for failover that will
temporarily break clients but resume availability quickly thereafter.
Where I would like the most help from hadoop is in making the failover
transparent to clients, not in solving the reliable storage problem or
failover scenarios that Storage and OS vendors do.


(a) I wasn't aware that Bookkeeper had progressed that far. I wonder
whether it would be able to keep up with the data rates that is required
in
order to hold the NN log without falling behind.

(b) I do know Karthik Ranga at FB just started a design to put the NN data
in HDFS itself, but that is in very preliminary design stages with no real
code there.

The problem is that the HA code written with NFS in mind is very different
from the HA code written with HDFS in mind, which are both quite different
from the code that is written with Bookkeeper in mind. Essentially the
three options will form three different implementations, since the failure
modes of each of the back-ends are different. Am I totally off base?

thanks,
Srivas.




On Tue, Dec 13, 2011 at 11:00 AM, Suresh Srinivas
sur...@hortonworks.comwrote:

 Srivas,

 As you may know already, NFS is just being used in the first prototype
for
 HA.

 Two options for editlog store are:
 1. Using BookKeeper. Work has already completed on trunk towards this.
This
 will replace need for NFS to  store the editlogs and is highly
available.
 This solution will also be used for HA.
 2. We have a short term goal also to enable editlogs going to HDFS
itself.
 The work is in progress.

 Regards,
 Suresh


 
  -- Forwarded message --
  From: M. C. Srivas mcsri...@gmail.com
  Date: Sun, Dec 11, 2011 at 10:47 PM
  Subject: Re: HDFS Backup nodes
  To: common-user@hadoop.apache.org
 
 
  You are out of luck if you don't want to use NFS, and yet want
redundancy
  for the NN.  Even the new NN HA work being done by the community
will
  require NFS ... and the NFS itself needs to be HA.
 
  But if you use a Netapp, then the likelihood of the Netapp crashing is
  lower than the likelihood of a garbage-collection-of-death happening
in
 the
  NN.
 
  [ disclaimer:  I don't work for Netapp, I work for MapR ]
 
 
  On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote:
 
   Thanks Joey. We've had enough problems with nfs (mainly under very
high
   load) that we thought it might be riskier to use it for the NN.
  
   randy
  
  
   On 12/07/2011 06:46 PM, Joey Echeverria wrote:
  
   Hey Rand,
  
   It will mark that storage directory as failed and ignore it from
then
   on. In order to do this correctly, you need a couple of options
   enabled on the NFS mount to make sure that it doesn't retry
   infinitely. I usually run with the
tcp,soft,intr,timeo=10,**retrans=10
   options set.
  
   -Joey
  
   On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:
  
   What happens then if the nfs server fails or isn't reachable? Does
 hdfs
   lock up? Does it gracefully ignore the nfs copy?
  
   Thanks,
   randy
  
   - Original Message -
   From: Joey Echeverriaj...@cloudera.com
   To: common-user@hadoop.apache.org
   Sent: Wednesday, December 7, 2011 6:07:58 AM
   Subject: Re: HDFS Backup nodes
  
   You should also configure the Namenode to use an NFS mount for
one of
   it's storage directories. That will give the most up-to-date back
of
   the metadata in case of total node failure.
  
   -Joey
  
   On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar
 praveen...@gmail.com
wrote:
  
   This means still we are relying on Secondary NameNode idealogy
for
   Namenode's backup.
   Can OS-mirroring of Namenode is a good alternative keep it alive
all
  the
   time ?
  
   Thanks,
   Praveenesh
  
   On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
   mahesw...@huawei.comwrote:
  
AFAIK backup node introduced in 0.21 version onwards.
   __**__
   From: praveenesh kumar [praveen...@gmail.com]
   Sent: Wednesday, December 07, 2011 12:40 PM
   To: common-user@hadoop.apache.org
   Subject: HDFS Backup nodes
  
   Does hadoop 0.20.205 supports configuring HDFS backup nodes ?
  
   Thanks,
   Praveenesh
  
  
  
  
   --
   Joseph

Re: HDFS Backup nodes

2011-12-14 Thread Konstantin Boudnik
On Wed, Dec 14, 2011 at 10:09AM, Scott Carey wrote:
 
 On 12/13/11 11:28 PM, Konstantin Boudnik c...@apache.org wrote:
 
 On Tue, Dec 13, 2011 at 11:00PM, M. C. Srivas wrote:
  Suresh,
  
  As of today, there is no option except to use NFS.  And as you yourself
  mention, the first HA prototype when it comes out will require NFS.
 
 
 NFS is just happen to be readily available in any data center and doesn't
 require much of the extra investment on top of what exists.
 
 That is a false assumption.  I'm not buying a netapp filer just for this.
  We have no NFS, or want any.  If we ever use it, it won't be in the data
 center with Hadoop!

It isn't a false assumption, it is a reasonable one based on the experience.
You don't need netapp for NFS, you can have a Thumper or whatever. I am not
saying NFS is the only and the best - all I said it is pretty common ;) I
would opt fo BK or Jini Spaces like solution any day, though.

Cos



Re: HDFS Backup nodes

2011-12-14 Thread Todd Lipcon
On Wed, Dec 14, 2011 at 10:00 AM, Scott Carey sc...@richrelevance.com wrote:
As of today, there is no option except to use NFS.  And as you yourself
mention, the first HA prototype when it comes out will require NFS.

 How will it 'require' NFS?  Won't any 'remote, high availability storage'
 work?  NFS is unreliable unless in my experience unless:
...
 A solution with a brief 'stall' in service while a SAN mount switched over
 or similar with drbd should be possible and data safe, if this is being
 built to truly 'require' NFS that is no better for me than the current
 situation, which we manage using OS level tools for failover that will
 temporarily break clients but resume availability quickly thereafter.
 Where I would like the most help from hadoop is in making the failover
 transparent to clients, not in solving the reliable storage problem or
 failover scenarios that Storage and OS vendors do.

Currently our requirement is that we can have two client machines
mount the storage, though only one needs to have it mounted rw at a
time. This is certainly doable with DRBD in conjunction with a
clustered filesystem like GPFS2. I believe Dhruba was doing some
experimentation with an approach like this.

It's not currently provided for, but it wouldn't be very difficult to
extend the design so that the standby didn't even need read access
until the failover event. It would just cause a longer failover period
since the standby would have more edits to catch up with, etc. I
don't think anyone's currently working on this, but if you wanted to
contribute I can point you in the right direction. If you happen to be
at the SF HUG tonight, grab me and I'll give you the rundown on what
would be needed.

-Todd

-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HDFS Backup nodes

2011-12-13 Thread Suresh Srinivas
Srivas,

As you may know already, NFS is just being used in the first prototype for
HA.

Two options for editlog store are:
1. Using BookKeeper. Work has already completed on trunk towards this. This
will replace need for NFS to  store the editlogs and is highly available.
This solution will also be used for HA.
2. We have a short term goal also to enable editlogs going to HDFS itself.
The work is in progress.

Regards,
Suresh



 -- Forwarded message --
 From: M. C. Srivas mcsri...@gmail.com
 Date: Sun, Dec 11, 2011 at 10:47 PM
 Subject: Re: HDFS Backup nodes
 To: common-user@hadoop.apache.org


 You are out of luck if you don't want to use NFS, and yet want redundancy
 for the NN.  Even the new NN HA work being done by the community will
 require NFS ... and the NFS itself needs to be HA.

 But if you use a Netapp, then the likelihood of the Netapp crashing is
 lower than the likelihood of a garbage-collection-of-death happening in the
 NN.

 [ disclaimer:  I don't work for Netapp, I work for MapR ]


 On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote:

  Thanks Joey. We've had enough problems with nfs (mainly under very high
  load) that we thought it might be riskier to use it for the NN.
 
  randy
 
 
  On 12/07/2011 06:46 PM, Joey Echeverria wrote:
 
  Hey Rand,
 
  It will mark that storage directory as failed and ignore it from then
  on. In order to do this correctly, you need a couple of options
  enabled on the NFS mount to make sure that it doesn't retry
  infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10
  options set.
 
  -Joey
 
  On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:
 
  What happens then if the nfs server fails or isn't reachable? Does hdfs
  lock up? Does it gracefully ignore the nfs copy?
 
  Thanks,
  randy
 
  - Original Message -
  From: Joey Echeverriaj...@cloudera.com
  To: common-user@hadoop.apache.org
  Sent: Wednesday, December 7, 2011 6:07:58 AM
  Subject: Re: HDFS Backup nodes
 
  You should also configure the Namenode to use an NFS mount for one of
  it's storage directories. That will give the most up-to-date back of
  the metadata in case of total node failure.
 
  -Joey
 
  On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumarpraveen...@gmail.com
   wrote:
 
  This means still we are relying on Secondary NameNode idealogy for
  Namenode's backup.
  Can OS-mirroring of Namenode is a good alternative keep it alive all
 the
  time ?
 
  Thanks,
  Praveenesh
 
  On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
  mahesw...@huawei.comwrote:
 
   AFAIK backup node introduced in 0.21 version onwards.
  __**__
  From: praveenesh kumar [praveen...@gmail.com]
  Sent: Wednesday, December 07, 2011 12:40 PM
  To: common-user@hadoop.apache.org
  Subject: HDFS Backup nodes
 
  Does hadoop 0.20.205 supports configuring HDFS backup nodes ?
 
  Thanks,
  Praveenesh
 
 
 
 
  --
  Joseph Echeverria
  Cloudera, Inc.
  443.305.9434
 
 
 
 
 
 




Re: HDFS Backup nodes

2011-12-13 Thread Todd Lipcon
On Sun, Dec 11, 2011 at 10:47 PM, M. C. Srivas mcsri...@gmail.com wrote:
 But if you use a Netapp, then the likelihood of the Netapp crashing is
 lower than the likelihood of a garbage-collection-of-death happening in the
 NN.

This is pure FUD.

I've never seen a garbage collection of death ever in any NN with
smaller than a 40GB heap, and only a small handful of times on larger
heaps. So, unless you're running a 4000 node cluster, you shouldn't be
concerned with this. And the existence of many 4000 node clusters
running fine on HDFS indicates that a properly tuned NN does just
fine.

[Disclaimer: I don't spread FUD regardless of vendor affiliation.]

-Todd


 [ disclaimer:  I don't work for Netapp, I work for MapR ]


 On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote:

 Thanks Joey. We've had enough problems with nfs (mainly under very high
 load) that we thought it might be riskier to use it for the NN.

 randy


 On 12/07/2011 06:46 PM, Joey Echeverria wrote:

 Hey Rand,

 It will mark that storage directory as failed and ignore it from then
 on. In order to do this correctly, you need a couple of options
 enabled on the NFS mount to make sure that it doesn't retry
 infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10
 options set.

 -Joey

 On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:

 What happens then if the nfs server fails or isn't reachable? Does hdfs
 lock up? Does it gracefully ignore the nfs copy?

 Thanks,
 randy

 - Original Message -
 From: Joey Echeverriaj...@cloudera.com
 To: common-user@hadoop.apache.org
 Sent: Wednesday, December 7, 2011 6:07:58 AM
 Subject: Re: HDFS Backup nodes

 You should also configure the Namenode to use an NFS mount for one of
 it's storage directories. That will give the most up-to-date back of
 the metadata in case of total node failure.

 -Joey

 On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumarpraveen...@gmail.com
  wrote:

 This means still we are relying on Secondary NameNode idealogy for
 Namenode's backup.
 Can OS-mirroring of Namenode is a good alternative keep it alive all the
 time ?

 Thanks,
 Praveenesh

 On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
 mahesw...@huawei.comwrote:

  AFAIK backup node introduced in 0.21 version onwards.
 __**__
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes

 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

 Thanks,
 Praveenesh




 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434









-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HDFS Backup nodes

2011-12-13 Thread M. C. Srivas
Suresh,

As of today, there is no option except to use NFS.  And as you yourself
mention, the first HA prototype when it comes out will require NFS.

(a) I wasn't aware that Bookkeeper had progressed that far. I wonder
whether it would be able to keep up with the data rates that is required in
order to hold the NN log without falling behind.

(b) I do know Karthik Ranga at FB just started a design to put the NN data
in HDFS itself, but that is in very preliminary design stages with no real
code there.

The problem is that the HA code written with NFS in mind is very different
from the HA code written with HDFS in mind, which are both quite different
from the code that is written with Bookkeeper in mind. Essentially the
three options will form three different implementations, since the failure
modes of each of the back-ends are different. Am I totally off base?

thanks,
Srivas.




On Tue, Dec 13, 2011 at 11:00 AM, Suresh Srinivas sur...@hortonworks.comwrote:

 Srivas,

 As you may know already, NFS is just being used in the first prototype for
 HA.

 Two options for editlog store are:
 1. Using BookKeeper. Work has already completed on trunk towards this. This
 will replace need for NFS to  store the editlogs and is highly available.
 This solution will also be used for HA.
 2. We have a short term goal also to enable editlogs going to HDFS itself.
 The work is in progress.

 Regards,
 Suresh


 
  -- Forwarded message --
  From: M. C. Srivas mcsri...@gmail.com
  Date: Sun, Dec 11, 2011 at 10:47 PM
  Subject: Re: HDFS Backup nodes
  To: common-user@hadoop.apache.org
 
 
  You are out of luck if you don't want to use NFS, and yet want redundancy
  for the NN.  Even the new NN HA work being done by the community will
  require NFS ... and the NFS itself needs to be HA.
 
  But if you use a Netapp, then the likelihood of the Netapp crashing is
  lower than the likelihood of a garbage-collection-of-death happening in
 the
  NN.
 
  [ disclaimer:  I don't work for Netapp, I work for MapR ]
 
 
  On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote:
 
   Thanks Joey. We've had enough problems with nfs (mainly under very high
   load) that we thought it might be riskier to use it for the NN.
  
   randy
  
  
   On 12/07/2011 06:46 PM, Joey Echeverria wrote:
  
   Hey Rand,
  
   It will mark that storage directory as failed and ignore it from then
   on. In order to do this correctly, you need a couple of options
   enabled on the NFS mount to make sure that it doesn't retry
   infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10
   options set.
  
   -Joey
  
   On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:
  
   What happens then if the nfs server fails or isn't reachable? Does
 hdfs
   lock up? Does it gracefully ignore the nfs copy?
  
   Thanks,
   randy
  
   - Original Message -
   From: Joey Echeverriaj...@cloudera.com
   To: common-user@hadoop.apache.org
   Sent: Wednesday, December 7, 2011 6:07:58 AM
   Subject: Re: HDFS Backup nodes
  
   You should also configure the Namenode to use an NFS mount for one of
   it's storage directories. That will give the most up-to-date back of
   the metadata in case of total node failure.
  
   -Joey
  
   On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar
 praveen...@gmail.com
wrote:
  
   This means still we are relying on Secondary NameNode idealogy for
   Namenode's backup.
   Can OS-mirroring of Namenode is a good alternative keep it alive all
  the
   time ?
  
   Thanks,
   Praveenesh
  
   On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
   mahesw...@huawei.comwrote:
  
AFAIK backup node introduced in 0.21 version onwards.
   __**__
   From: praveenesh kumar [praveen...@gmail.com]
   Sent: Wednesday, December 07, 2011 12:40 PM
   To: common-user@hadoop.apache.org
   Subject: HDFS Backup nodes
  
   Does hadoop 0.20.205 supports configuring HDFS backup nodes ?
  
   Thanks,
   Praveenesh
  
  
  
  
   --
   Joseph Echeverria
   Cloudera, Inc.
   443.305.9434
  
  
  
  
  
  
 
 



Re: HDFS Backup nodes

2011-12-13 Thread Todd Lipcon
On Tue, Dec 13, 2011 at 10:42 PM, M. C. Srivas mcsri...@gmail.com wrote:
 Any simple file meta-data test will cause the NN to spiral to death with
 infinite GC.  For example, try create many many files. Or even simple
 stat a bunch of file continuously.

Sure. If I run dd if=/dev/zero of=foo my laptop will spiral to
death also. I think this is what you're referring to -- continuously
write files until it is out of RAM.

This is a well understood design choice of HDFS. It is not designed as
general purpose storage for small files, and if you run tests against
it assuming it is, you'll get bad results. I agree there.


 The real FUD going on is refusing to acknowledge that there is indeed a
 real problem.

Yes, if you use HDFS for workloads for which it was never designed,
you'll have a problem. If you stick to commonly accepted best
practices I think you'll find the same thing that hundreds of other
companies have found: HDFS is stable and reliable and has no such GC
of death problems when used as intended.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera


Re: HDFS Backup nodes

2011-12-13 Thread Konstantin Boudnik
On Tue, Dec 13, 2011 at 11:00PM, M. C. Srivas wrote:
 Suresh,
 
 As of today, there is no option except to use NFS.  And as you yourself
 mention, the first HA prototype when it comes out will require NFS.

Well, in the interest of full disclosure NFS is just one of the options and
not the only one. Any auxiliary storage will do greatly. Distributed in-memory
redundant storage for sub-seconds fail-over? Sure, Gigaspaces do this for
years using very mature JINI.

NFS is just happen to be readily available in any data center and doesn't
require much of the extra investment on top of what exists. NFS comes with its
own set of problems of course. First and foremost is No-File-Security which
requires use of something like Kerberos for third-party user management. And
when paired with something like LinuxTaskController it can produce some very
interesting effects.

Cos

 (a) I wasn't aware that Bookkeeper had progressed that far. I wonder
 whether it would be able to keep up with the data rates that is required in
 order to hold the NN log without falling behind.
 
 (b) I do know Karthik Ranga at FB just started a design to put the NN data
 in HDFS itself, but that is in very preliminary design stages with no real
 code there.
 
 The problem is that the HA code written with NFS in mind is very different
 from the HA code written with HDFS in mind, which are both quite different
 from the code that is written with Bookkeeper in mind. Essentially the
 three options will form three different implementations, since the failure
 modes of each of the back-ends are different. Am I totally off base?
 
 thanks,
 Srivas.
 
 
 
 
 On Tue, Dec 13, 2011 at 11:00 AM, Suresh Srinivas 
 sur...@hortonworks.comwrote:
 
  Srivas,
 
  As you may know already, NFS is just being used in the first prototype for
  HA.
 
  Two options for editlog store are:
  1. Using BookKeeper. Work has already completed on trunk towards this. This
  will replace need for NFS to  store the editlogs and is highly available.
  This solution will also be used for HA.
  2. We have a short term goal also to enable editlogs going to HDFS itself.
  The work is in progress.
 
  Regards,
  Suresh
 
 
  
   -- Forwarded message --
   From: M. C. Srivas mcsri...@gmail.com
   Date: Sun, Dec 11, 2011 at 10:47 PM
   Subject: Re: HDFS Backup nodes
   To: common-user@hadoop.apache.org
  
  
   You are out of luck if you don't want to use NFS, and yet want redundancy
   for the NN.  Even the new NN HA work being done by the community will
   require NFS ... and the NFS itself needs to be HA.
  
   But if you use a Netapp, then the likelihood of the Netapp crashing is
   lower than the likelihood of a garbage-collection-of-death happening in
  the
   NN.
  
   [ disclaimer:  I don't work for Netapp, I work for MapR ]
  
  
   On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote:
  
Thanks Joey. We've had enough problems with nfs (mainly under very high
load) that we thought it might be riskier to use it for the NN.
   
randy
   
   
On 12/07/2011 06:46 PM, Joey Echeverria wrote:
   
Hey Rand,
   
It will mark that storage directory as failed and ignore it from then
on. In order to do this correctly, you need a couple of options
enabled on the NFS mount to make sure that it doesn't retry
infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10
options set.
   
-Joey
   
On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:
   
What happens then if the nfs server fails or isn't reachable? Does
  hdfs
lock up? Does it gracefully ignore the nfs copy?
   
Thanks,
randy
   
- Original Message -
From: Joey Echeverriaj...@cloudera.com
To: common-user@hadoop.apache.org
Sent: Wednesday, December 7, 2011 6:07:58 AM
Subject: Re: HDFS Backup nodes
   
You should also configure the Namenode to use an NFS mount for one of
it's storage directories. That will give the most up-to-date back of
the metadata in case of total node failure.
   
-Joey
   
On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar
  praveen...@gmail.com
 wrote:
   
This means still we are relying on Secondary NameNode idealogy for
Namenode's backup.
Can OS-mirroring of Namenode is a good alternative keep it alive all
   the
time ?
   
Thanks,
Praveenesh
   
On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
mahesw...@huawei.comwrote:
   
 AFAIK backup node introduced in 0.21 version onwards.
__**__
From: praveenesh kumar [praveen...@gmail.com]
Sent: Wednesday, December 07, 2011 12:40 PM
To: common-user@hadoop.apache.org
Subject: HDFS Backup nodes
   
Does hadoop 0.20.205 supports configuring HDFS backup nodes ?
   
Thanks,
Praveenesh
   
   
   
   
--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
   
   
   
   
   
   
  
  
 


Re: HDFS Backup nodes

2011-12-13 Thread Todd Lipcon
On Tue, Dec 13, 2011 at 11:00 PM, M. C. Srivas mcsri...@gmail.com wrote:
 (a) I wasn't aware that Bookkeeper had progressed that far. I wonder
 whether it would be able to keep up with the data rates that is required in
 order to hold the NN log without falling behind.

It's a good question - but one which has data relatively available.
Reading from Flavio Junqueira's slides from the Hadoop In China
conference a few weeks ago, he can maintain ~50k TPS with 20ms
latency, with 128 byte transactions. Given that HDFS does batch
multiple transactions per commit (standard group commit techniques) we
might imagine 4KB transactions where it looks like about 5K TPS,
equating to around 20MB/sec throughput. These transaction rates should
be plenty for the edit logging use case in my experience.


 (b) I do know Karthik Ranga at FB just started a design to put the NN data
 in HDFS itself, but that is in very preliminary design stages with no real
 code there.

Agreed. But it's not particularly complex either.. things can move
from preliminary design to working code in short timelines.


 The problem is that the HA code written with NFS in mind is very different
 from the HA code written with HDFS in mind, which are both quite different
 from the code that is written with Bookkeeper in mind. Essentially the
 three options will form three different implementations, since the failure
 modes of each of the back-ends are different. Am I totally off base?

Actually since the beginning of the HA project we have been keeping in
mind that NFS is only a step along the way. The shared edits storage
only has to have the following very basic operations:
- write and append to files (log segments)
- read from closed files
- fence another writer (which can also be implemented with STONITH)

As I understand it, BK supports all of the above and in fact the BK
team has a working prototype of journal storage in BK. The interface
is already made pluggable as of last month. So this is not far-off
brainstorming but rather a very real implementation that's coming very
soon to stable releases.

-Todd

 On Tue, Dec 13, 2011 at 11:00 AM, Suresh Srinivas 
 sur...@hortonworks.comwrote:

 Srivas,

 As you may know already, NFS is just being used in the first prototype for
 HA.

 Two options for editlog store are:
 1. Using BookKeeper. Work has already completed on trunk towards this. This
 will replace need for NFS to  store the editlogs and is highly available.
 This solution will also be used for HA.
 2. We have a short term goal also to enable editlogs going to HDFS itself.
 The work is in progress.

 Regards,
 Suresh


 
  -- Forwarded message --
  From: M. C. Srivas mcsri...@gmail.com
  Date: Sun, Dec 11, 2011 at 10:47 PM
  Subject: Re: HDFS Backup nodes
  To: common-user@hadoop.apache.org
 
 
  You are out of luck if you don't want to use NFS, and yet want redundancy
  for the NN.  Even the new NN HA work being done by the community will
  require NFS ... and the NFS itself needs to be HA.
 
  But if you use a Netapp, then the likelihood of the Netapp crashing is
  lower than the likelihood of a garbage-collection-of-death happening in
 the
  NN.
 
  [ disclaimer:  I don't work for Netapp, I work for MapR ]
 
 
  On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote:
 
   Thanks Joey. We've had enough problems with nfs (mainly under very high
   load) that we thought it might be riskier to use it for the NN.
  
   randy
  
  
   On 12/07/2011 06:46 PM, Joey Echeverria wrote:
  
   Hey Rand,
  
   It will mark that storage directory as failed and ignore it from then
   on. In order to do this correctly, you need a couple of options
   enabled on the NFS mount to make sure that it doesn't retry
   infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10
   options set.
  
   -Joey
  
   On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:
  
   What happens then if the nfs server fails or isn't reachable? Does
 hdfs
   lock up? Does it gracefully ignore the nfs copy?
  
   Thanks,
   randy
  
   - Original Message -
   From: Joey Echeverriaj...@cloudera.com
   To: common-user@hadoop.apache.org
   Sent: Wednesday, December 7, 2011 6:07:58 AM
   Subject: Re: HDFS Backup nodes
  
   You should also configure the Namenode to use an NFS mount for one of
   it's storage directories. That will give the most up-to-date back of
   the metadata in case of total node failure.
  
   -Joey
  
   On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar
 praveen...@gmail.com
    wrote:
  
   This means still we are relying on Secondary NameNode idealogy for
   Namenode's backup.
   Can OS-mirroring of Namenode is a good alternative keep it alive all
  the
   time ?
  
   Thanks,
   Praveenesh
  
   On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
   mahesw...@huawei.comwrote:
  
    AFAIK backup node introduced in 0.21 version onwards.
   __**__
   From: praveenesh kumar

Re: HDFS Backup nodes

2011-12-11 Thread M. C. Srivas
You are out of luck if you don't want to use NFS, and yet want redundancy
for the NN.  Even the new NN HA work being done by the community will
require NFS ... and the NFS itself needs to be HA.

But if you use a Netapp, then the likelihood of the Netapp crashing is
lower than the likelihood of a garbage-collection-of-death happening in the
NN.

[ disclaimer:  I don't work for Netapp, I work for MapR ]


On Wed, Dec 7, 2011 at 4:30 PM, randy randy...@comcast.net wrote:

 Thanks Joey. We've had enough problems with nfs (mainly under very high
 load) that we thought it might be riskier to use it for the NN.

 randy


 On 12/07/2011 06:46 PM, Joey Echeverria wrote:

 Hey Rand,

 It will mark that storage directory as failed and ignore it from then
 on. In order to do this correctly, you need a couple of options
 enabled on the NFS mount to make sure that it doesn't retry
 infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10
 options set.

 -Joey

 On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:

 What happens then if the nfs server fails or isn't reachable? Does hdfs
 lock up? Does it gracefully ignore the nfs copy?

 Thanks,
 randy

 - Original Message -
 From: Joey Echeverriaj...@cloudera.com
 To: common-user@hadoop.apache.org
 Sent: Wednesday, December 7, 2011 6:07:58 AM
 Subject: Re: HDFS Backup nodes

 You should also configure the Namenode to use an NFS mount for one of
 it's storage directories. That will give the most up-to-date back of
 the metadata in case of total node failure.

 -Joey

 On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumarpraveen...@gmail.com
  wrote:

 This means still we are relying on Secondary NameNode idealogy for
 Namenode's backup.
 Can OS-mirroring of Namenode is a good alternative keep it alive all the
 time ?

 Thanks,
 Praveenesh

 On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
 mahesw...@huawei.comwrote:

  AFAIK backup node introduced in 0.21 version onwards.
 __**__
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes

 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

 Thanks,
 Praveenesh




 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434








RE: HDFS Backup nodes

2011-12-08 Thread Jorn Argelo - Ephorus
Hi Koji,

This was on CHD3U1. For the record I had the dfs.name.dir.restore which
Harsh mentioned enabled as well.

Jorn

-Oorspronkelijk bericht-
Van: Koji Noguchi [mailto:knogu...@yahoo-inc.com] 
Verzonden: woensdag 7 december 2011 17:59
Aan: common-user@hadoop.apache.org
Onderwerp: Re: HDFS Backup nodes

Hi Jorn, 

Which hadoop version were you using when you hit that issue?

Koji


On 12/7/11 5:25 AM, Jorn Argelo - Ephorus jorn.arg...@ephorus.com
wrote:

 Just to add to that note - we've ran into an issue where the NFS share
 was out of sync (the namenode storage failed even though the NFS share
 was working), but the other local metadata was fine. At the restart of
 the namenode it picked the NFS share's fsimage even if it was out of
 sync. This had the effect that loads of blocks were marked as invalid
 and deleted by the datanodes, and the namenode never came out of safe
 mode because it was missing blocks. The Hadoop documentation says it
 always picks the most recent version of the fsimage but in my case
this
 doesn't seem to have happened. Maybe a bug? With that said I've been
 having issues with NFS before (the NFS namenode storage always failed
 every hour even if the cluster was idle).
 
 Now since this was just test data it wasn't all that important ... but
 if that would happen with your production cluster you got yourself a
 problem. I've moved away from NFS and I'm using DRBD instead. Not
having
 any problems anymore whatsoever.
 
 YMMV.
 
 Jorn
 
 -Oorspronkelijk bericht-
 Van: Joey Echeverria [mailto:j...@cloudera.com]
 Verzonden: woensdag 7 december 2011 12:08
 Aan: common-user@hadoop.apache.org
 Onderwerp: Re: HDFS Backup nodes
 
 You should also configure the Namenode to use an NFS mount for one of
 it's storage directories. That will give the most up-to-date back of
 the metadata in case of total node failure.
 
 -Joey
 
 On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar
praveen...@gmail.com
 wrote:
 This means still we are relying on Secondary NameNode idealogy for
 Namenode's backup.
 Can OS-mirroring of Namenode is a good alternative keep it alive all
 the
 time ?
 
 Thanks,
 Praveenesh
 
 On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
 mahesw...@huawei.comwrote:
 
 AFAIK backup node introduced in 0.21 version onwards.
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes
 
 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?
 
 Thanks,
 Praveenesh
 
 
 



RE: HDFS Backup nodes

2011-12-07 Thread Uma Maheswara Rao G
AFAIK backup node introduced in 0.21 version onwards.

From: praveenesh kumar [praveen...@gmail.com]
Sent: Wednesday, December 07, 2011 12:40 PM
To: common-user@hadoop.apache.org
Subject: HDFS Backup nodes

Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

Thanks,
Praveenesh


Re: HDFS Backup nodes

2011-12-07 Thread praveenesh kumar
This means still we are relying on Secondary NameNode idealogy for
Namenode's backup.
Can OS-mirroring of Namenode is a good alternative keep it alive all the
time ?

Thanks,
Praveenesh

On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote:

 AFAIK backup node introduced in 0.21 version onwards.
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes

 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

 Thanks,
 Praveenesh



RE: HDFS Backup nodes

2011-12-07 Thread Sagar Shukla
Yes ... it you are looking for high uptime then keeping the Namenode OS-mirror 
always running would be the best way to go.

We might need to explore further on the capabilities of HDFS backup node to see 
how it can be utilized.

Thanks,
Sagar

-Original Message-
From: praveenesh kumar [mailto:praveen...@gmail.com]
Sent: Wednesday, December 07, 2011 1:47 PM
To: common-user@hadoop.apache.org
Subject: Re: HDFS Backup nodes

This means still we are relying on Secondary NameNode idealogy for Namenode's 
backup.
Can OS-mirroring of Namenode is a good alternative keep it alive all the time ?

Thanks,
Praveenesh

On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote:

 AFAIK backup node introduced in 0.21 version onwards.
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes

 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

 Thanks,
 Praveenesh


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: HDFS Backup nodes

2011-12-07 Thread Joey Echeverria
You should also configure the Namenode to use an NFS mount for one of
it's storage directories. That will give the most up-to-date back of
the metadata in case of total node failure.

-Joey

On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar praveen...@gmail.com wrote:
 This means still we are relying on Secondary NameNode idealogy for
 Namenode's backup.
 Can OS-mirroring of Namenode is a good alternative keep it alive all the
 time ?

 Thanks,
 Praveenesh

 On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G 
 mahesw...@huawei.comwrote:

 AFAIK backup node introduced in 0.21 version onwards.
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes

 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

 Thanks,
 Praveenesh




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


RE: HDFS Backup nodes

2011-12-07 Thread Jorn Argelo - Ephorus
Just to add to that note - we've ran into an issue where the NFS share
was out of sync (the namenode storage failed even though the NFS share
was working), but the other local metadata was fine. At the restart of
the namenode it picked the NFS share's fsimage even if it was out of
sync. This had the effect that loads of blocks were marked as invalid
and deleted by the datanodes, and the namenode never came out of safe
mode because it was missing blocks. The Hadoop documentation says it
always picks the most recent version of the fsimage but in my case this
doesn't seem to have happened. Maybe a bug? With that said I've been
having issues with NFS before (the NFS namenode storage always failed
every hour even if the cluster was idle).

Now since this was just test data it wasn't all that important ... but
if that would happen with your production cluster you got yourself a
problem. I've moved away from NFS and I'm using DRBD instead. Not having
any problems anymore whatsoever.

YMMV.

Jorn

-Oorspronkelijk bericht-
Van: Joey Echeverria [mailto:j...@cloudera.com] 
Verzonden: woensdag 7 december 2011 12:08
Aan: common-user@hadoop.apache.org
Onderwerp: Re: HDFS Backup nodes

You should also configure the Namenode to use an NFS mount for one of
it's storage directories. That will give the most up-to-date back of
the metadata in case of total node failure.

-Joey

On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar praveen...@gmail.com
wrote:
 This means still we are relying on Secondary NameNode idealogy for
 Namenode's backup.
 Can OS-mirroring of Namenode is a good alternative keep it alive all
the
 time ?

 Thanks,
 Praveenesh

 On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G
mahesw...@huawei.comwrote:

 AFAIK backup node introduced in 0.21 version onwards.
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes

 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

 Thanks,
 Praveenesh




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: HDFS Backup nodes

2011-12-07 Thread randysch
What happens then if the nfs server fails or isn't reachable? Does hdfs lock 
up? Does it gracefully ignore the nfs copy?

Thanks,
randy

- Original Message -
From: Joey Echeverria j...@cloudera.com
To: common-user@hadoop.apache.org
Sent: Wednesday, December 7, 2011 6:07:58 AM
Subject: Re: HDFS Backup nodes

You should also configure the Namenode to use an NFS mount for one of
it's storage directories. That will give the most up-to-date back of
the metadata in case of total node failure.

-Joey

On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar praveen...@gmail.com wrote:
 This means still we are relying on Secondary NameNode idealogy for
 Namenode's backup.
 Can OS-mirroring of Namenode is a good alternative keep it alive all the
 time ?

 Thanks,
 Praveenesh

 On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G 
 mahesw...@huawei.comwrote:

 AFAIK backup node introduced in 0.21 version onwards.
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes

 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

 Thanks,
 Praveenesh




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: HDFS Backup nodes

2011-12-07 Thread Joey Echeverria
Hey Rand,

It will mark that storage directory as failed and ignore it from then
on. In order to do this correctly, you need a couple of options
enabled on the NFS mount to make sure that it doesn't retry
infinitely. I usually run with the tcp,soft,intr,timeo=10,retrans=10
options set.

-Joey

On Wed, Dec 7, 2011 at 12:37 PM,  randy...@comcast.net wrote:
 What happens then if the nfs server fails or isn't reachable? Does hdfs lock 
 up? Does it gracefully ignore the nfs copy?

 Thanks,
 randy

 - Original Message -
 From: Joey Echeverria j...@cloudera.com
 To: common-user@hadoop.apache.org
 Sent: Wednesday, December 7, 2011 6:07:58 AM
 Subject: Re: HDFS Backup nodes

 You should also configure the Namenode to use an NFS mount for one of
 it's storage directories. That will give the most up-to-date back of
 the metadata in case of total node failure.

 -Joey

 On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar praveen...@gmail.com wrote:
 This means still we are relying on Secondary NameNode idealogy for
 Namenode's backup.
 Can OS-mirroring of Namenode is a good alternative keep it alive all the
 time ?

 Thanks,
 Praveenesh

 On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G 
 mahesw...@huawei.comwrote:

 AFAIK backup node introduced in 0.21 version onwards.
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes

 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

 Thanks,
 Praveenesh




 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Re: HDFS Backup nodes

2011-12-07 Thread randy
Thanks Joey. We've had enough problems with nfs (mainly under very high 
load) that we thought it might be riskier to use it for the NN.


randy

On 12/07/2011 06:46 PM, Joey Echeverria wrote:

Hey Rand,

It will mark that storage directory as failed and ignore it from then
on. In order to do this correctly, you need a couple of options
enabled on the NFS mount to make sure that it doesn't retry
infinitely. I usually run with the tcp,soft,intr,timeo=10,retrans=10
options set.

-Joey

On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:

What happens then if the nfs server fails or isn't reachable? Does hdfs lock 
up? Does it gracefully ignore the nfs copy?

Thanks,
randy

- Original Message -
From: Joey Echeverriaj...@cloudera.com
To: common-user@hadoop.apache.org
Sent: Wednesday, December 7, 2011 6:07:58 AM
Subject: Re: HDFS Backup nodes

You should also configure the Namenode to use an NFS mount for one of
it's storage directories. That will give the most up-to-date back of
the metadata in case of total node failure.

-Joey

On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumarpraveen...@gmail.com  wrote:

This means still we are relying on Secondary NameNode idealogy for
Namenode's backup.
Can OS-mirroring of Namenode is a good alternative keep it alive all the
time ?

Thanks,
Praveenesh

On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao Gmahesw...@huawei.comwrote:


AFAIK backup node introduced in 0.21 version onwards.

From: praveenesh kumar [praveen...@gmail.com]
Sent: Wednesday, December 07, 2011 12:40 PM
To: common-user@hadoop.apache.org
Subject: HDFS Backup nodes

Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

Thanks,
Praveenesh





--
Joseph Echeverria
Cloudera, Inc.
443.305.9434








Re: HDFS Backup nodes

2011-12-07 Thread Harsh J
Randy,

On recent releases (CDH3u2 here for example), you also have
dfs.name.dir.restore, a boolean flag that will automatically try to
enable previously failed name directories upon every checkpoint if
possible. Hence if you have a SNN running, and your NFS failed at some
point and got marked as FAILED on your NN web UI, if the NFS is back
up again before the next checkpoint interval, it will be auto-restored
after the NN deems its in a writable state again.

On Thu, Dec 8, 2011 at 6:00 AM, randy randy...@comcast.net wrote:
 Thanks Joey. We've had enough problems with nfs (mainly under very high
 load) that we thought it might be riskier to use it for the NN.

 randy


 On 12/07/2011 06:46 PM, Joey Echeverria wrote:

 Hey Rand,

 It will mark that storage directory as failed and ignore it from then
 on. In order to do this correctly, you need a couple of options
 enabled on the NFS mount to make sure that it doesn't retry
 infinitely. I usually run with the tcp,soft,intr,timeo=10,retrans=10
 options set.

 -Joey

 On Wed, Dec 7, 2011 at 12:37 PM,randy...@comcast.net  wrote:

 What happens then if the nfs server fails or isn't reachable? Does hdfs
 lock up? Does it gracefully ignore the nfs copy?

 Thanks,
 randy

 - Original Message -
 From: Joey Echeverriaj...@cloudera.com
 To: common-user@hadoop.apache.org
 Sent: Wednesday, December 7, 2011 6:07:58 AM
 Subject: Re: HDFS Backup nodes

 You should also configure the Namenode to use an NFS mount for one of
 it's storage directories. That will give the most up-to-date back of
 the metadata in case of total node failure.

 -Joey

 On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumarpraveen...@gmail.com
  wrote:

 This means still we are relying on Secondary NameNode idealogy for
 Namenode's backup.
 Can OS-mirroring of Namenode is a good alternative keep it alive all the
 time ?

 Thanks,
 Praveenesh

 On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao
 Gmahesw...@huawei.comwrote:

 AFAIK backup node introduced in 0.21 version onwards.
 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Wednesday, December 07, 2011 12:40 PM
 To: common-user@hadoop.apache.org
 Subject: HDFS Backup nodes

 Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

 Thanks,
 Praveenesh




 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434








-- 
Harsh J


HDFS Backup nodes

2011-12-06 Thread praveenesh kumar
Does hadoop 0.20.205 supports configuring HDFS backup nodes ?

Thanks,
Praveenesh