Re: Hadoop on EC2 for large cluster
you can't do this with the contrib/ec2 scripts/ami. but passing the master private dns name to the slaves on boot as 'user- data' works fine. when a slave starts, it contacts the master and joins the cluster. there isn't any need for a slave to rsync from the master, thus removing the dependency on them having the private key. and not using the start|stop-all scripts, you don't need to maintain the slaves file, and can thus lazily boot your cluster. to do this, you will need to create your own AMI that works this way. not hard, just time consuming. On Mar 20, 2008, at 11:56 AM, Prasan Ary wrote: Chris, What do you mean when you say boot the slaves with "the master private name" ? === Chris K Wensel <[EMAIL PROTECTED]> wrote: I found it much better to start the master first, then boot the slaves with the master private name. i do not use the start|stop-all scrips, so i do not need to maintain the slaves file. thus i don't need to push private keys around to support those scripts. this lets me start 20 nodes, then add 20 more later. or kill some. btw, get ganglia installed. life will be better knowing what's going on. also, setting up FoxyProxy on firefox lets you browse your whole cluster if you setup a ssh tunnel (socks). On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote: Hi All, I have been trying to configure Hadoop on EC2 for large number of clusters ( 100 plus). It seems that I have to copy EC2 private key to all the machines in the cluster so that they can have SSH connections. For now it seems I have to run a script to copy the key file to each of the EC2 instances. I wanted to know if there is a better way to accomplish this. Thanks, PA - Never miss a thing. Make Yahoo your homepage. Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ - Looking for last minute shopping deals? Find them fast with Yahoo! Search. Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/
Re: Hadoop on EC2 for large cluster
Chris, What do you mean when you say boot the slaves with "the master private name" ? === Chris K Wensel <[EMAIL PROTECTED]> wrote: I found it much better to start the master first, then boot the slaves with the master private name. i do not use the start|stop-all scrips, so i do not need to maintain the slaves file. thus i don't need to push private keys around to support those scripts. this lets me start 20 nodes, then add 20 more later. or kill some. btw, get ganglia installed. life will be better knowing what's going on. also, setting up FoxyProxy on firefox lets you browse your whole cluster if you setup a ssh tunnel (socks). On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote: > Hi All, > I have been trying to configure Hadoop on EC2 for large number of > clusters ( 100 plus). It seems that I have to copy EC2 private key > to all the machines in the cluster so that they can have SSH > connections. > For now it seems I have to run a script to copy the key file to > each of the EC2 instances. I wanted to know if there is a better way > to accomplish this. > > Thanks, > PA > > > - > Never miss a thing. Make Yahoo your homepage. Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ - Looking for last minute shopping deals? Find them fast with Yahoo! Search.
Re: Hadoop on EC2 for large cluster
Actually, I personally use the following "2 part" copy technique to copy files to a cluster of boxes: tar cf - myfile | dsh -f host-list-file -i -c -M tar xCfv /tmp - The first tar packages myfile into a tar file. dsh runs a tar that unpacks the tar (in the above case all boxes listed in host-list-file would have a /tmp/myfile after the command). Tar options that are relevant include C (chdir) and v (verbose, can be given twice) so you see what got copied. dsh options that are relevant: -i copy stdin to all ssh processes, requires -c -c do the ssh calls concurrently. -M prefix the out from the ssh with the hostname. While this is not rsync, it has the benefit of being processed concurrently, and quite flexible. Andreas Am Donnerstag, den 20.03.2008, 19:57 +0200 schrieb Andrey Pankov: > Hi, > > Did you see hadoop-0.16.0/src/contrib/ec2/bin/start-hadoop script? It > already contains such part: > > echo "Copying private key to slaves" > for slave in `cat slaves`; do >scp $SSH_OPTS $PRIVATE_KEY_PATH "[EMAIL PROTECTED]:/root/.ssh/id_rsa" >ssh $SSH_OPTS "[EMAIL PROTECTED]" "chmod 600 /root/.ssh/id_rsa" >sleep 1 > done > > Anyway, did you tried hadoop-ec2 script? It works well for task you > described. > > > Prasan Ary wrote: > > Hi All, > > I have been trying to configure Hadoop on EC2 for large number of > > clusters ( 100 plus). It seems that I have to copy EC2 private key to all > > the machines in the cluster so that they can have SSH connections. > > For now it seems I have to run a script to copy the key file to each of > > the EC2 instances. I wanted to know if there is a better way to accomplish > > this. > > > > Thanks, > > PA > > > > > > - > > Never miss a thing. Make Yahoo your homepage. > > --- > Andrey Pankov signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
Re: Hadoop on EC2 for large cluster
Yes, this isn't ideal for larger clusters. There's a jira to address this: https://issues.apache.org/jira/browse/HADOOP-2410. Tom On 20/03/2008, Prasan Ary <[EMAIL PROTECTED]> wrote: > Hi All, > I have been trying to configure Hadoop on EC2 for large number of clusters > ( 100 plus). It seems that I have to copy EC2 private key to all the machines > in the cluster so that they can have SSH connections. > For now it seems I have to run a script to copy the key file to each of the > EC2 instances. I wanted to know if there is a better way to accomplish this. > > Thanks, > > PA > > > > - > Never miss a thing. Make Yahoo your homepage. -- Blog: http://www.lexemetech.com/
Re: Hadoop on EC2 for large cluster
I found it much better to start the master first, then boot the slaves with the master private name. i do not use the start|stop-all scrips, so i do not need to maintain the slaves file. thus i don't need to push private keys around to support those scripts. this lets me start 20 nodes, then add 20 more later. or kill some. btw, get ganglia installed. life will be better knowing what's going on. also, setting up FoxyProxy on firefox lets you browse your whole cluster if you setup a ssh tunnel (socks). On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote: Hi All, I have been trying to configure Hadoop on EC2 for large number of clusters ( 100 plus). It seems that I have to copy EC2 private key to all the machines in the cluster so that they can have SSH connections. For now it seems I have to run a script to copy the key file to each of the EC2 instances. I wanted to know if there is a better way to accomplish this. Thanks, PA - Never miss a thing. Make Yahoo your homepage. Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/
Re: Hadoop on EC2 for large cluster
Hi, Did you see hadoop-0.16.0/src/contrib/ec2/bin/start-hadoop script? It already contains such part: echo "Copying private key to slaves" for slave in `cat slaves`; do scp $SSH_OPTS $PRIVATE_KEY_PATH "[EMAIL PROTECTED]:/root/.ssh/id_rsa" ssh $SSH_OPTS "[EMAIL PROTECTED]" "chmod 600 /root/.ssh/id_rsa" sleep 1 done Anyway, did you tried hadoop-ec2 script? It works well for task you described. Prasan Ary wrote: Hi All, I have been trying to configure Hadoop on EC2 for large number of clusters ( 100 plus). It seems that I have to copy EC2 private key to all the machines in the cluster so that they can have SSH connections. For now it seems I have to run a script to copy the key file to each of the EC2 instances. I wanted to know if there is a better way to accomplish this. Thanks, PA - Never miss a thing. Make Yahoo your homepage. --- Andrey Pankov
Hadoop on EC2 for large cluster
Hi All, I have been trying to configure Hadoop on EC2 for large number of clusters ( 100 plus). It seems that I have to copy EC2 private key to all the machines in the cluster so that they can have SSH connections. For now it seems I have to run a script to copy the key file to each of the EC2 instances. I wanted to know if there is a better way to accomplish this. Thanks, PA - Never miss a thing. Make Yahoo your homepage.