Job Tracker/Name Node redundancy
Are there any plans to build redundancy/failover support for the Job Tracker and Name Node components in Hadoop? Let's take the current scenario: 1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines. 2) Half-way through the job execution, the Job Tracker or Name Node fails. 3) We bring up a new Job Tracker or Name Node manually. -- Will the individual task trackers / data nodes reconnect to the new masters? Or will the job have to be resubmitted? If we had failover support, we could setup essentially 3 Job Tracker masters and 3 NameNode masters so that if one dies the other would gracefully take over and start handling results from the children nodes. Thanks! Ryan
Re: Job Tracker/Name Node redundancy
Yes, there is a JIRA issue for a redundant JobTracker already. The NN redundancy scenario is mentioned on the Wiki (look for SecondaryNameNode). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ryan LeCompte lecom...@gmail.com To: core-user@hadoop.apache.org core-user@hadoop.apache.org Sent: Friday, January 9, 2009 3:09:13 PM Subject: Job Tracker/Name Node redundancy Are there any plans to build redundancy/failover support for the Job Tracker and Name Node components in Hadoop? Let's take the current scenario: 1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines. 2) Half-way through the job execution, the Job Tracker or Name Node fails. 3) We bring up a new Job Tracker or Name Node manually. -- Will the individual task trackers / data nodes reconnect to the new masters? Or will the job have to be resubmitted? If we had failover support, we could setup essentially 3 Job Tracker masters and 3 NameNode masters so that if one dies the other would gracefully take over and start handling results from the children nodes. Thanks! Ryan
Re: Job Tracker/Name Node redundancy
Hey Ryan, Some specific JIRA tickets that will help narrow your search: JT: https://issues.apache.org/jira/browse/HADOOP-4586 NN: https://issues.apache.org/jira/browse/HADOOP-4539 Would love to hear your thoughts there! Regards, Jeff On Fri, Jan 9, 2009 at 12:36 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Yes, there is a JIRA issue for a redundant JobTracker already. The NN redundancy scenario is mentioned on the Wiki (look for SecondaryNameNode). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ryan LeCompte lecom...@gmail.com To: core-user@hadoop.apache.org core-user@hadoop.apache.org Sent: Friday, January 9, 2009 3:09:13 PM Subject: Job Tracker/Name Node redundancy Are there any plans to build redundancy/failover support for the Job Tracker and Name Node components in Hadoop? Let's take the current scenario: 1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines. 2) Half-way through the job execution, the Job Tracker or Name Node fails. 3) We bring up a new Job Tracker or Name Node manually. -- Will the individual task trackers / data nodes reconnect to the new masters? Or will the job have to be resubmitted? If we had failover support, we could setup essentially 3 Job Tracker masters and 3 NameNode masters so that if one dies the other would gracefully take over and start handling results from the children nodes. Thanks! Ryan
RE: Job Tracker/Name Node redundancy
Ryan, From the MR (JobTracker) side we have a failover support. If a large job is submitted and the JobTracker fails midway then you can start the JobTracker on the same host and resume the job. Look at https://issues.apache.org/jira/browse/HADOOP-3245 for more details. Hope that helps. Amar -Original Message- From: Ryan LeCompte [mailto:lecom...@gmail.com] Sent: Fri 1/9/2009 12:09 PM To: core-user@hadoop.apache.org Subject: Job Tracker/Name Node redundancy Are there any plans to build redundancy/failover support for the Job Tracker and Name Node components in Hadoop? Let's take the current scenario: 1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines. 2) Half-way through the job execution, the Job Tracker or Name Node fails. 3) We bring up a new Job Tracker or Name Node manually. -- Will the individual task trackers / data nodes reconnect to the new masters? Or will the job have to be resubmitted? If we had failover support, we could setup essentially 3 Job Tracker masters and 3 NameNode masters so that if one dies the other would gracefully take over and start handling results from the children nodes. Thanks! Ryan