Job Tracker/Name Node redundancy

2009-01-09 Thread Ryan LeCompte
Are there any plans to build redundancy/failover support for the Job
Tracker and Name Node components in Hadoop? Let's take the current
scenario:

1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines.
2) Half-way through the job execution, the Job Tracker or Name Node fails.
3) We bring up a new Job Tracker or Name Node manually.

-- Will the individual task trackers / data nodes reconnect to the
new masters? Or will the job have to be resubmitted? If we had
failover support, we could setup essentially 3 Job Tracker masters and
3 NameNode masters so that if one dies the other would gracefully take
over and start handling results from the children nodes.

Thanks!

Ryan


Re: Job Tracker/Name Node redundancy

2009-01-09 Thread Otis Gospodnetic
Yes, there is a JIRA issue for a redundant JobTracker already.
The NN redundancy scenario is mentioned on the Wiki (look for 
SecondaryNameNode).

Otis

 --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Ryan LeCompte lecom...@gmail.com
 To: core-user@hadoop.apache.org core-user@hadoop.apache.org
 Sent: Friday, January 9, 2009 3:09:13 PM
 Subject: Job Tracker/Name Node redundancy
 
 Are there any plans to build redundancy/failover support for the Job
 Tracker and Name Node components in Hadoop? Let's take the current
 scenario:
 
 1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines.
 2) Half-way through the job execution, the Job Tracker or Name Node fails.
 3) We bring up a new Job Tracker or Name Node manually.
 
 -- Will the individual task trackers / data nodes reconnect to the
 new masters? Or will the job have to be resubmitted? If we had
 failover support, we could setup essentially 3 Job Tracker masters and
 3 NameNode masters so that if one dies the other would gracefully take
 over and start handling results from the children nodes.
 
 Thanks!
 
 Ryan



Re: Job Tracker/Name Node redundancy

2009-01-09 Thread Jeff Hammerbacher
Hey Ryan,

Some specific JIRA tickets that will help narrow your search:

JT: https://issues.apache.org/jira/browse/HADOOP-4586
NN: https://issues.apache.org/jira/browse/HADOOP-4539

Would love to hear your thoughts there!

Regards,
Jeff

On Fri, Jan 9, 2009 at 12:36 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Yes, there is a JIRA issue for a redundant JobTracker already.
 The NN redundancy scenario is mentioned on the Wiki (look for
 SecondaryNameNode).

 Otis

  --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Ryan LeCompte lecom...@gmail.com
  To: core-user@hadoop.apache.org core-user@hadoop.apache.org
  Sent: Friday, January 9, 2009 3:09:13 PM
  Subject: Job Tracker/Name Node redundancy
 
  Are there any plans to build redundancy/failover support for the Job
  Tracker and Name Node components in Hadoop? Let's take the current
  scenario:
 
  1) A data/cpu intensive job is submitted to a Hadoop cluster of 10
 machines.
  2) Half-way through the job execution, the Job Tracker or Name Node
 fails.
  3) We bring up a new Job Tracker or Name Node manually.
 
  -- Will the individual task trackers / data nodes reconnect to the
  new masters? Or will the job have to be resubmitted? If we had
  failover support, we could setup essentially 3 Job Tracker masters and
  3 NameNode masters so that if one dies the other would gracefully take
  over and start handling results from the children nodes.
 
  Thanks!
 
  Ryan




RE: Job Tracker/Name Node redundancy

2009-01-09 Thread Amar Kamat
Ryan,
From the MR (JobTracker) side we have a failover support. 
If a large job is submitted and the JobTracker fails midway then you can start 
the JobTracker on the same host and resume
the job. Look at https://issues.apache.org/jira/browse/HADOOP-3245 for more 
details. Hope that helps.

Amar


-Original Message-
From: Ryan LeCompte [mailto:lecom...@gmail.com]
Sent: Fri 1/9/2009 12:09 PM
To: core-user@hadoop.apache.org
Subject: Job Tracker/Name Node redundancy
 
Are there any plans to build redundancy/failover support for the Job
Tracker and Name Node components in Hadoop? Let's take the current
scenario:

1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines.
2) Half-way through the job execution, the Job Tracker or Name Node fails.
3) We bring up a new Job Tracker or Name Node manually.

-- Will the individual task trackers / data nodes reconnect to the
new masters? Or will the job have to be resubmitted? If we had
failover support, we could setup essentially 3 Job Tracker masters and
3 NameNode masters so that if one dies the other would gracefully take
over and start handling results from the children nodes.

Thanks!

Ryan