[ https://issues.apache.org/jira/browse/SPARK-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicholas Chammas resolved SPARK-2396. ------------------------------------- Resolution: Cannot Reproduce Resolving this issue as "Cannot Reproduce". Feel free to reopen with clarification if otherwise. > Spark EC2 scripts fail when trying to log in to EC2 instances > ------------------------------------------------------------- > > Key: SPARK-2396 > URL: https://issues.apache.org/jira/browse/SPARK-2396 > Project: Spark > Issue Type: Bug > Components: EC2 > Affects Versions: 1.0.0 > Environment: Windows 8, Cygwin and command prompt, Python 2.7 > Reporter: Stephen M. Hopper > Labels: aws, ec2, ssh > > I cannot seem to successfully start up a Spark EC2 cluster using the > spark-ec2 script. > I'm using variations on the following command: > ./spark-ec2 --instance-type=m1.small --region=us-west-1 --spot-price=0.05 > --spark-version=1.0.0 -k my-key-name -i my-key-name.pem -s 1 launch > spark-test-cluster > The script always allocates the EC2 instances without much trouble, but can > never seem to complete the SSH step to install Spark on the cluster. It > always complains about my SSH key. If I try to log in with my ssh key doing > something like this: > ssh -i my-key-name.pem root@<insert ip of my instance here> > it fails. However, if I log in to the AWS console, click on my instance and > select "connect", it displays the instructions for SSHing into my instance > (which are no different from the ssh command from above). So, if I rerun the > SSH command from above, I'm able to log in. > Next, if I try to rerun the spark-ec2 command from above (replacing "launch" > with "start"), the script logs in and starts installing Spark. However, it > eventually errors out with the following output: > Cloning into 'spark-ec2'... > remote: Counting objects: 1465, done. > remote: Compressing objects: 100% (697/697), done. > remote: Total 1465 (delta 485), reused 1465 (delta 485) > Receiving objects: 100% (1465/1465), 228.51 KiB | 287 KiB/s, done. > Resolving deltas: 100% (485/485), done. > Connection to ec2-<my-clusters-ip>.us-west-1.compute.amazonaws.com closed. > Searching for existing cluster spark-test-cluster... > Found 1 master(s), 1 slaves > Starting slaves... > Starting master... > Waiting for instances to start up... > Waiting 120 more seconds... > Deploying files to master... > Traceback (most recent call last): > File "./spark_ec2.py", line 823, in <module> > main() > File "./spark_ec2.py", line 815, in main > real_main() > File "./spark_ec2.py", line 806, in real_main > setup_cluster(conn, master_nodes, slave_nodes, opts, False) > File "./spark_ec2.py", line 450, in setup_cluster > deploy_files(conn, "deploy.generic", opts, master_nodes, slave_nodes, > modules) > File "./spark_ec2.py", line 593, in deploy_files > subprocess.check_call(command) > File "E:\windows_programs\Python27\lib\subprocess.py", line 535, in > check_call > retcode = call(*popenargs, **kwargs) > File "E:\windows_programs\Python27\lib\subprocess.py", line 522, in call > return Popen(*popenargs, **kwargs).wait() > File "E:\windows_programs\Python27\lib\subprocess.py", line 710, in __init__ > errread, errwrite) > File "E:\windows_programs\Python27\lib\subprocess.py", line 958, in > _execute_child > startupinfo) > WindowsError: [Error 2] The system cannot find the file specified > So, in short, am I missing something or is this a bug? Any help would be > appreciated. > Other notes: > -I've tried both us-west-1 and us-east-1 regions. > -I've tried several different instance types. > -I've tried playing with the permissions on the ssh key (600, 400, etc.), but > to no avail -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org