Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Chris K Wensel

you can't do this with the contrib/ec2 scripts/ami.

but passing the master private dns name to the slaves on boot as 'user- 
data' works fine. when a slave starts, it contacts the master and  
joins the cluster. there isn't any need for a slave to rsync from the  
master, thus removing the dependency on them having the private key.  
and not using the start|stop-all scripts, you don't need to maintain  
the slaves file, and can thus lazily boot your cluster.


to do this, you will need to create your own AMI that works this way.  
not hard, just time consuming.


On Mar 20, 2008, at 11:56 AM, Prasan Ary wrote:

Chris,
 What do you mean when you say boot the slaves with "the master  
private name" ?



 ===

Chris K Wensel <[EMAIL PROTECTED]> wrote:
 I found it much better to start the master first, then boot the  
slaves

with the master private name.

i do not use the start|stop-all scrips, so i do not need to maintain
the slaves file. thus i don't need to push private keys around to
support those scripts.

this lets me start 20 nodes, then add 20 more later. or kill some.

btw, get ganglia installed. life will be better knowing what's going  
on.


also, setting up FoxyProxy on firefox lets you browse your whole
cluster if you setup a ssh tunnel (socks).

On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote:

Hi All,
I have been trying to configure Hadoop on EC2 for large number of
clusters ( 100 plus). It seems that I have to copy EC2 private key
to all the machines in the cluster so that they can have SSH
connections.
For now it seems I have to run a script to copy the key file to
each of the EC2 instances. I wanted to know if there is a better way
to accomplish this.

Thanks,
PA


-
Never miss a thing. Make Yahoo your homepage.


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/






-
Looking for last minute shopping deals?  Find them fast with Yahoo!  
Search.


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/





Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Prasan Ary
Chris,
  What do you mean when you say boot the slaves with "the master private name" ?
   
   
  ===

Chris K Wensel <[EMAIL PROTECTED]> wrote:
  I found it much better to start the master first, then boot the slaves 
with the master private name.

i do not use the start|stop-all scrips, so i do not need to maintain 
the slaves file. thus i don't need to push private keys around to 
support those scripts.

this lets me start 20 nodes, then add 20 more later. or kill some.

btw, get ganglia installed. life will be better knowing what's going on.

also, setting up FoxyProxy on firefox lets you browse your whole 
cluster if you setup a ssh tunnel (socks).

On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote:
> Hi All,
> I have been trying to configure Hadoop on EC2 for large number of 
> clusters ( 100 plus). It seems that I have to copy EC2 private key 
> to all the machines in the cluster so that they can have SSH 
> connections.
> For now it seems I have to run a script to copy the key file to 
> each of the EC2 instances. I wanted to know if there is a better way 
> to accomplish this.
>
> Thanks,
> PA
>
>
> -
> Never miss a thing. Make Yahoo your homepage.

Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/





   
-
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Andreas Kostyrka
Actually, I personally use the following "2 part" copy technique to copy
files to a cluster of boxes:

tar cf - myfile | dsh -f host-list-file -i -c -M tar xCfv /tmp -

The first tar packages myfile into a tar file.

dsh runs a tar that unpacks the tar (in the above case all boxes listed
in host-list-file would have a /tmp/myfile after the command).

Tar options that are relevant include C (chdir) and v (verbose, can be
given twice) so you see what got copied.

dsh options that are relevant:
-i copy stdin to all ssh processes, requires -c
-c do the ssh calls concurrently.
-M prefix the out from the ssh with the hostname.

While this is not rsync, it has the benefit of being processed
concurrently, and quite flexible.

Andreas

Am Donnerstag, den 20.03.2008, 19:57 +0200 schrieb Andrey Pankov:
> Hi,
> 
> Did you see hadoop-0.16.0/src/contrib/ec2/bin/start-hadoop script? It 
> already contains such part:
> 
> echo "Copying private key to slaves"
> for slave in `cat slaves`; do
>scp $SSH_OPTS $PRIVATE_KEY_PATH "[EMAIL PROTECTED]:/root/.ssh/id_rsa"
>ssh $SSH_OPTS "[EMAIL PROTECTED]" "chmod 600 /root/.ssh/id_rsa"
>sleep 1
> done
> 
> Anyway, did you tried hadoop-ec2 script? It works well for task you 
> described.
> 
> 
> Prasan Ary wrote:
> > Hi All,
> >   I have been trying to configure Hadoop on EC2 for large number of 
> > clusters ( 100 plus). It seems that I have to copy EC2 private key to all 
> > the machines in the cluster so that they can have SSH connections.
> >   For now it seems I have to run a script to copy the key file to each of 
> > the EC2 instances. I wanted to know if there is a better way to accomplish 
> > this.
> >
> >   Thanks,
> >   PA
> > 
> >
> > -
> > Never miss a thing.   Make Yahoo your homepage.
> 
> ---
> Andrey Pankov


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Tom White
Yes, this isn't ideal for larger clusters. There's a jira to address
this: https://issues.apache.org/jira/browse/HADOOP-2410.

Tom

On 20/03/2008, Prasan Ary <[EMAIL PROTECTED]> wrote:
> Hi All,
>   I have been trying to configure Hadoop on EC2 for large number of clusters 
> ( 100 plus). It seems that I have to copy EC2 private key to all the machines 
> in the cluster so that they can have SSH connections.
>   For now it seems I have to run a script to copy the key file to each of the 
> EC2 instances. I wanted to know if there is a better way to accomplish this.
>
>   Thanks,
>
>   PA
>
>
>
>  -
>  Never miss a thing.   Make Yahoo your homepage.


-- 
Blog: http://www.lexemetech.com/


Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Chris K Wensel
I found it much better to start the master first, then boot the slaves  
with the master private name.


i do not use the start|stop-all scrips, so i do not need to maintain  
the slaves file. thus i don't need to push private keys around to  
support those scripts.


this lets me start 20 nodes, then add 20 more later. or kill some.

btw, get ganglia installed. life will be better knowing what's going on.

also, setting up FoxyProxy on firefox lets you browse your whole  
cluster if you setup a ssh tunnel (socks).


On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote:

Hi All,
 I have been trying to configure Hadoop on EC2 for large number of  
clusters ( 100 plus). It seems that I have to copy EC2 private key  
to all the machines in the cluster so that they can have SSH  
connections.
 For now it seems I have to run a script to copy the key file to  
each of the EC2 instances. I wanted to know if there is a better way  
to accomplish this.


 Thanks,
 PA


-
Never miss a thing.   Make Yahoo your homepage.


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/





Re: Hadoop on EC2 for large cluster

2008-03-20 Thread Andrey Pankov

Hi,

Did you see hadoop-0.16.0/src/contrib/ec2/bin/start-hadoop script? It 
already contains such part:


echo "Copying private key to slaves"
for slave in `cat slaves`; do
  scp $SSH_OPTS $PRIVATE_KEY_PATH "[EMAIL PROTECTED]:/root/.ssh/id_rsa"
  ssh $SSH_OPTS "[EMAIL PROTECTED]" "chmod 600 /root/.ssh/id_rsa"
  sleep 1
done

Anyway, did you tried hadoop-ec2 script? It works well for task you 
described.



Prasan Ary wrote:

Hi All,
  I have been trying to configure Hadoop on EC2 for large number of clusters ( 
100 plus). It seems that I have to copy EC2 private key to all the machines in 
the cluster so that they can have SSH connections.
  For now it seems I have to run a script to copy the key file to each of the 
EC2 instances. I wanted to know if there is a better way to accomplish this.
   
  Thanks,

  PA

   
-

Never miss a thing.   Make Yahoo your homepage.


---
Andrey Pankov


Hadoop on EC2 for large cluster

2008-03-20 Thread Prasan Ary
Hi All,
  I have been trying to configure Hadoop on EC2 for large number of clusters ( 
100 plus). It seems that I have to copy EC2 private key to all the machines in 
the cluster so that they can have SSH connections.
  For now it seems I have to run a script to copy the key file to each of the 
EC2 instances. I wanted to know if there is a better way to accomplish this.
   
  Thanks,
  PA

   
-
Never miss a thing.   Make Yahoo your homepage.