Hi Steve and Amit, Thanks for your answers. I agree with you that key-based ssh is nothing to worry about. But I'm wondering what exactly - that means wich grid administration tasks - hadoop does via ssh?! Does it restart crashed data nodes or tasks trackers on the slaves? Oder does it transfer data over the grid with ssh access? How can I find a short description what exactly hadoop needs ssh for? The documentation says only that I have to configure it.
Thanks & Regards Matthias > -----Ursprüngliche Nachricht----- > Von: Steve Loughran [mailto:ste...@apache.org] > Gesendet: Mittwoch, 21. Januar 2009 13:59 > An: core-user@hadoop.apache.org > Betreff: Re: Why does Hadoop need ssh access to master and slaves? > > Amit k. Saha wrote: > > On Wed, Jan 21, 2009 at 5:53 PM, Matthias Scherer > > <matthias.sche...@1und1.de> wrote: > >> Hi all, > >> > >> we've made our first steps in evaluating hadoop. The setup > of 2 VMs > >> as a hadoop grid was very easy and works fine. > >> > >> Now our operations team wonders why hadoop has to be able > to connect > >> to the master and slaves via password-less ssh?! Can > anyone give us > >> an answer to this question? > > > > 1. There has to be a way to connect to the remote hosts- > slaves and a > > secondary master, and SSH is the secure way to do it 2. It > has to be > > password-less to enable automatic logins > > > > SSH is *a * secure way to do it, but not the only way. Other > management tools can bring up hadoop clusters. Hadoop ships > with scripted support for SSH as it is standard with Linux > distros and generally the best way to bring up a remote console. > > Matthias, > Your ops team should not be worrying about the SSH security, > as long as they keep their keys under control. > > (a) Key-based SSH is more secure than passworded SSH, as > man-in-middle attacks are prevented. passphrase protected SSH > keys on external USB keys even better. > > (b) once the cluster is up, that filesystem is pretty > vulnerable to anything on the LAN. You do need to lock down > your datacentre, or set up the firewall/routing of the > servers so that only trusted hosts can talk to the FS. SSH > becomes a detail at that point. > > >