Re: NameNode failure and recovery!

2013-04-03 Thread Rahul Bhattacharjee
Or both the options are used together. NFS + SNN ?



On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee rahul.rec@gmail.com
 wrote:

 Hi all,

 I was reading about Hadoop and got to know that there are two ways to
 protect against the name node failures.

 1) To write to a nfs mount along with the usual local disk.
  -or-
 2) Use secondary name node. In case of failure of NN , the SNN can take in
 charge.

 My questions :-

 1) SNN is always lagging , so when SNN becomes primary in event of a NN
 failure ,  then the edits which have not been merged into the image file
 would be lost , so the system of SNN would not be consistent with the NN
 before its failure.

 2) Also I have read that other purpose of SNN is to periodically merge the
 edit logs with the image file. In case a setup goes with option #1 (writing
 to NFS, no SNN) , then who does this merging.

 Thanks,
 Rahul





RE: NameNode failure and recovery!

2013-04-03 Thread Vijay Thakorlal
Hi Rahul,

 

The SNN does not act as a backup / standby NameNode in the event of failure. 

 

The sole purpose of the Secondary NameNode (or as it’s otherwise / more 
correctly known as the Checkpoint Node) is to perform checkpointing of the 
current state of HDFS:

 

The SNN retrieves the fsimage and edits files from the NN 

The NN rolls the edits file

The SNN Loads the fsimage into memory 

Then the SNN replays the edits log file to merge the two

Then the SNN transfers the merged checkpoint back to the NN

The NN uses the checkpoint as the new fsimage file

 

It’s true that technically you could use the fsimage from the SNN if completely 
lost the NN – and yes as you said you would “lose” any changes to HDFS that 
occurred between the NN dieing and the last time the checkpoint occurred. But 
as mentioned the SNN is not a backup for the NN.

 

Regards,

Vijay

 

From: Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] 
Sent: 03 April 2013 15:40
To: user@hadoop.apache.org
Subject: NameNode failure and recovery!

 

Hi all,

I was reading about Hadoop and got to know that there are two ways to protect 
against the name node failures.

1) To write to a nfs mount along with the usual local disk.

 -or-

2) Use secondary name node. In case of failure of NN , the SNN can take in 
charge. 

My questions :-

1) SNN is always lagging , so when SNN becomes primary in event of a NN failure 
,  then the edits which have not been merged into the image file would be lost 
, so the system of SNN would not be consistent with the NN before its failure.

2) Also I have read that other purpose of SNN is to periodically merge the edit 
logs with the image file. In case a setup goes with option #1 (writing to NFS, 
no SNN) , then who does this merging.

 

Thanks,
Rahul

 



Re: NameNode failure and recovery!

2013-04-03 Thread Mohammad Tariq
Hello Rahul,

  It's always better to have both 1 and 2 together. One common
misconception is that SNN is a backup of the NN, which is wrong. SNN is a
helper node to the NN. In case of any failure SNN is not gonna take up the
NN spot.

Yes, we can't guarantee that the SNN fsimage replica will always be up to
date. And when you are writing the metadata on a filer or NFS, you are just
creating an additional copy of the metadata. Don't mistake it with SNN.
When you specify value of your dfs.name.dir property as a comma separated
list, which is localFS+NFS, you are just making sure that even if something
goes wrong with the localFS, your metadata is still same in the NFS.

But, it is still better to have the SNN in a separate machine. But you can
never rely 100% on SNN, because of the fact you have already mentioned.
It'll not be in 100% sync.



Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 3, 2013 at 8:12 PM, Rahul Bhattacharjee rahul.rec@gmail.com
 wrote:

 Or both the options are used together. NFS + SNN ?



  On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee 
 rahul.rec@gmail.com wrote:

 Hi all,

 I was reading about Hadoop and got to know that there are two ways to
 protect against the name node failures.

 1) To write to a nfs mount along with the usual local disk.
  -or-
 2) Use secondary name node. In case of failure of NN , the SNN can take
 in charge.

 My questions :-

 1) SNN is always lagging , so when SNN becomes primary in event of a NN
 failure ,  then the edits which have not been merged into the image file
 would be lost , so the system of SNN would not be consistent with the NN
 before its failure.

 2) Also I have read that other purpose of SNN is to periodically merge
 the edit logs with the image file. In case a setup goes with option #1
 (writing to NFS, no SNN) , then who does this merging.

 Thanks,
 Rahul






Re: NameNode failure and recovery!

2013-04-03 Thread Mohammad Tariq
@Vijay : We seem to be in 100% sync though :)

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Wed, Apr 3, 2013 at 8:27 PM, Mohammad Tariq donta...@gmail.com wrote:

 Hello Rahul,

   It's always better to have both 1 and 2 together. One common
 misconception is that SNN is a backup of the NN, which is wrong. SNN is a
 helper node to the NN. In case of any failure SNN is not gonna take up the
 NN spot.

 Yes, we can't guarantee that the SNN fsimage replica will always be up to
 date. And when you are writing the metadata on a filer or NFS, you are just
 creating an additional copy of the metadata. Don't mistake it with SNN.
 When you specify value of your dfs.name.dir property as a comma separated
 list, which is localFS+NFS, you are just making sure that even if something
 goes wrong with the localFS, your metadata is still same in the NFS.

 But, it is still better to have the SNN in a separate machine. But you can
 never rely 100% on SNN, because of the fact you have already mentioned.
 It'll not be in 100% sync.



 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Wed, Apr 3, 2013 at 8:12 PM, Rahul Bhattacharjee 
 rahul.rec@gmail.com wrote:

 Or both the options are used together. NFS + SNN ?



  On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee 
 rahul.rec@gmail.com wrote:

 Hi all,

 I was reading about Hadoop and got to know that there are two ways to
 protect against the name node failures.

 1) To write to a nfs mount along with the usual local disk.
  -or-
 2) Use secondary name node. In case of failure of NN , the SNN can take
 in charge.

 My questions :-

 1) SNN is always lagging , so when SNN becomes primary in event of a NN
 failure ,  then the edits which have not been merged into the image file
 would be lost , so the system of SNN would not be consistent with the NN
 before its failure.

 2) Also I have read that other purpose of SNN is to periodically merge
 the edit logs with the image file. In case a setup goes with option #1
 (writing to NFS, no SNN) , then who does this merging.

 Thanks,
 Rahul







Re: NameNode failure and recovery!

2013-04-03 Thread Rahul Bhattacharjee
Thanks to all of you for precise and complete responses.

S
​o in case of failure we have to bring another backup system up with the
fsimage and edit logs from the NFS filer.
SNN stays as is for the new NN.

Thanks,
Rahul​


On Wed, Apr 3, 2013 at 8:38 PM, Azuryy Yu azury...@gmail.com wrote:

 for Hadoopv2, there is HA, so SNN is not necessary.
 On Apr 3, 2013 10:41 PM, Rahul Bhattacharjee rahul.rec@gmail.com
 wrote:

 Hi all,

 I was reading about Hadoop and got to know that there are two ways to
 protect against the name node failures.

 1) To write to a nfs mount along with the usual local disk.
  -or-
 2) Use secondary name node. In case of failure of NN , the SNN can take
 in charge.

 My questions :-

 1) SNN is always lagging , so when SNN becomes primary in event of a NN
 failure ,  then the edits which have not been merged into the image file
 would be lost , so the system of SNN would not be consistent with the NN
 before its failure.

 2) Also I have read that other purpose of SNN is to periodically merge
 the edit logs with the image file. In case a setup goes with option #1
 (writing to NFS, no SNN) , then who does this merging.

 Thanks,
 Rahul





Re: NameNode failure and recovery!

2013-04-03 Thread Harsh J
There is a 3rd, most excellent way: Use HDFS's own HA, see
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
:)

On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
 Hi all,

 I was reading about Hadoop and got to know that there are two ways to
 protect against the name node failures.

 1) To write to a nfs mount along with the usual local disk.
  -or-
 2) Use secondary name node. In case of failure of NN , the SNN can take in
 charge.

 My questions :-

 1) SNN is always lagging , so when SNN becomes primary in event of a NN
 failure ,  then the edits which have not been merged into the image file
 would be lost , so the system of SNN would not be consistent with the NN
 before its failure.

 2) Also I have read that other purpose of SNN is to periodically merge the
 edit logs with the image file. In case a setup goes with option #1 (writing
 to NFS, no SNN) , then who does this merging.

 Thanks,
 Rahul





-- 
Harsh J


Re: NameNode failure and recovery!

2013-04-03 Thread shashwat shriparv
If you are not in position to go for HA just keep your checkpoint period
shorter to have recent data recoverable from SNN.

and you always have a option
hadoop namenode -recover
try this on testing cluster and get versed to it.

and take backup of image at some solid state storage.



∞
Shashwat Shriparv



On Wed, Apr 3, 2013 at 9:56 PM, Harsh J ha...@cloudera.com wrote:

 There is a 3rd, most excellent way: Use HDFS's own HA, see

 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
 :)

 On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
 rahul.rec@gmail.com wrote:
  Hi all,
 
  I was reading about Hadoop and got to know that there are two ways to
  protect against the name node failures.
 
  1) To write to a nfs mount along with the usual local disk.
   -or-
  2) Use secondary name node. In case of failure of NN , the SNN can take
 in
  charge.
 
  My questions :-
 
  1) SNN is always lagging , so when SNN becomes primary in event of a NN
  failure ,  then the edits which have not been merged into the image file
  would be lost , so the system of SNN would not be consistent with the NN
  before its failure.
 
  2) Also I have read that other purpose of SNN is to periodically merge
 the
  edit logs with the image file. In case a setup goes with option #1
 (writing
  to NFS, no SNN) , then who does this merging.
 
  Thanks,
  Rahul
 
 



 --
 Harsh J



Re: NameNode failure and recovery!

2013-04-03 Thread Rahul Bhattacharjee
Thats also doable , reducing the checkpoint period would also have have
some amount of edit log loss and how short should be the checkpoint
interval has to be evaluated.I think the good way to go , in case HA is not
doable is SNN and secondary storage NFS.

Thanks,
Rahul


On Thu, Apr 4, 2013 at 12:19 AM, shashwat shriparv 
dwivedishash...@gmail.com wrote:

 If you are not in position to go for HA just keep your checkpoint period
 shorter to have recent data recoverable from SNN.

 and you always have a option
 hadoop namenode -recover
 try this on testing cluster and get versed to it.

 and take backup of image at some solid state storage.



 ∞
 Shashwat Shriparv



 On Wed, Apr 3, 2013 at 9:56 PM, Harsh J ha...@cloudera.com wrote:

 There is a 3rd, most excellent way: Use HDFS's own HA, see

 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
 :)

 On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
 rahul.rec@gmail.com wrote:
  Hi all,
 
  I was reading about Hadoop and got to know that there are two ways to
  protect against the name node failures.
 
  1) To write to a nfs mount along with the usual local disk.
   -or-
  2) Use secondary name node. In case of failure of NN , the SNN can take
 in
  charge.
 
  My questions :-
 
  1) SNN is always lagging , so when SNN becomes primary in event of a NN
  failure ,  then the edits which have not been merged into the image file
  would be lost , so the system of SNN would not be consistent with the NN
  before its failure.
 
  2) Also I have read that other purpose of SNN is to periodically merge
 the
  edit logs with the image file. In case a setup goes with option #1
 (writing
  to NFS, no SNN) , then who does this merging.
 
  Thanks,
  Rahul
 
 



 --
 Harsh J