Are there any plans to build redundancy/failover support for the Job Tracker and Name Node components in Hadoop? Let's take the current scenario:
1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines. 2) Half-way through the job execution, the Job Tracker or Name Node fails. 3) We bring up a new Job Tracker or Name Node manually. -- Will the individual task trackers / data nodes "reconnect" to the new masters? Or will the job have to be resubmitted? If we had failover support, we could setup essentially 3 Job Tracker masters and 3 NameNode masters so that if one dies the other would gracefully take over and start handling results from the children nodes. Thanks! Ryan