On Thu, Sep 22, 2011 at 11:44 AM, praveenesh kumar <praveen...@gmail.com> wrote:
> But apart from storing metadata info, Is there anything more NN/JT machines
> are doing ?? .
> So I can say I can survive with poor NN if I am not dealing with lots of
> files in HDFS ?
<snip>

The JT and NN are your central throughput machines. All client
communications happen with these primarily and are its first contacts.
The JT and NN also need CPU to manage the slaves that have joined
them, and maintain states of each of them.

I'd not place them on poor machines, and face a general slowdown on
the whole cluster. Also, I'd ensure that the machine I use for my
master services must be fairly more reliable in material than the
slaves (of which losses are easier to bear).

>> > > > Can we replace our namenode machine later with some other
>> > machine. ?
>> > > > Actually I got a new  server machine in my cluster and now I want
>> > > > to make
>> > > > this machine as my new namenode and jobtracker node ?

So long as your hostname does not change, you should not have an
issue. The change will be as transparent to your cluster as a restart
would be.

If you are introducing a hostname change, certain ecosystem components
such as Hive, apart from all your client configs, may need minor
repairs to their states.

(This is another reason why you should use hostnames for HDFS, and not
IP addresses)

>> > > > How can I achieve this target with least overhead ?

If there's no hostname change happening here, then it should be as
simple as: Turn off HDFS, switch host pointers (if IP is different for
the new addition), move metadata to new machine, ensure permissions
are set and that everything is a mirror copy of what was before, run
NN, ensure it binds fine and comes up with all the files intact (You
can browse files on a DN-less HDFS cluster just fine, that is a
non-issue), start the DNs back up.

As always, having extra backups of your dfs.name.dir contents is
always recommended.

-- 
Harsh J

Reply via email to