Hi Matthias,

It is not necessary to have SSH set up to run Hadoop, but it does make
things easier. SSH is used by the scripts in the bin directory which
start and stop daemons across the cluster (the slave nodes are defined
in the slaves file), see the start-all.sh script as a starting point.
These scripts are a convenient way to control Hadoop, but there are
other possibilities. If you had another system to control daemons on
your cluster then you wouldn't need SSH.

Tom

On Wed, Jan 21, 2009 at 1:20 PM, Matthias Scherer
<matthias.sche...@1und1.de> wrote:
> Hi Steve and Amit,
>
> Thanks for your answers. I agree with you that key-based ssh is nothing to 
> worry about. But I'm wondering what exactly - that means wich grid 
> administration tasks - hadoop does via ssh?! Does it restart crashed data 
> nodes or tasks trackers on the slaves? Oder does it transfer data over the 
> grid with ssh access? How can I find a short description what exactly hadoop 
> needs ssh for? The documentation says only that I have to configure it.
>
> Thanks & Regards
> Matthias
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: Steve Loughran [mailto:ste...@apache.org]
>> Gesendet: Mittwoch, 21. Januar 2009 13:59
>> An: core-user@hadoop.apache.org
>> Betreff: Re: Why does Hadoop need ssh access to master and slaves?
>>
>> Amit k. Saha wrote:
>> > On Wed, Jan 21, 2009 at 5:53 PM, Matthias Scherer
>> > <matthias.sche...@1und1.de> wrote:
>> >> Hi all,
>> >>
>> >> we've made our first steps in evaluating hadoop. The setup
>> of 2 VMs
>> >> as a hadoop grid was very easy and works fine.
>> >>
>> >> Now our operations team wonders why hadoop has to be able
>> to connect
>> >> to the master and slaves via password-less ssh?! Can
>> anyone give us
>> >> an answer to this question?
>> >
>> > 1. There has to be a way to connect to the remote hosts-
>> slaves and a
>> > secondary master, and SSH is the secure way to do it 2. It
>> has to be
>> > password-less to enable automatic logins
>> >
>>
>> SSH is *a * secure way to do it, but not the only way. Other
>> management tools can bring up hadoop clusters. Hadoop ships
>> with scripted support for SSH as it is standard with Linux
>> distros and generally the best way to bring up a remote console.
>>
>> Matthias,
>> Your ops team should not be worrying about the SSH security,
>> as long as they keep their keys under control.
>>
>> (a) Key-based SSH is more secure than passworded SSH, as
>> man-in-middle attacks are prevented. passphrase protected SSH
>> keys on external USB keys even better.
>>
>> (b) once the cluster is up, that filesystem is pretty
>> vulnerable to anything on the LAN. You do need to lock down
>> your datacentre, or set up the firewall/routing of the
>> servers so that only trusted hosts can talk to the FS. SSH
>> becomes a detail at that point.
>>
>>
>>
>

Reply via email to