Repository: spark
Updated Branches:
  refs/heads/branch-1.3 5782ee29e -> f2aa7b757


[SPARK-5473] [EC2] Expose SSH failures after status checks pass

If there is some fatal problem with launching a cluster, `spark-ec2` just hangs 
without giving the user useful feedback on what the problem is.

This PR exposes the output of the SSH calls to the user if the SSH test fails 
during cluster launch for any reason but the instance status checks are all 
green. It also removes the growing trail of dots while waiting in favor of a 
fixed 3 dots.

For example:

```
$ ./ec2/spark-ec2 -k key -i /incorrect/path/identity.pem --instance-type 
m3.medium --slaves 1 --zone us-east-1c launch "spark-test"
Setting up security groups...
Searching for existing cluster spark-test...
Spark AMI: ami-35b1885c
Launching instances...
Launched 1 slaves in us-east-1c, regid = r-7dadd096
Launched master in us-east-1c, regid = r-fcadd017
Waiting for cluster to enter 'ssh-ready' state...
Warning: SSH connection error. (This could be temporary.)
Host: 127.0.0.1
SSH return code: 255
SSH output: Warning: Identity file /incorrect/path/identity.pem not accessible: 
No such file or directory.
Warning: Permanently added '127.0.0.1' (RSA) to the list of known hosts.
Permission denied (publickey).
```

This should give users enough information when some unrecoverable error occurs 
during launch so they can know to abort the launch. This will help avoid 
situations like the ones reported [here on Stack 
Overflow](http://stackoverflow.com/q/28002443/) and [here on the user 
list](http://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3C1422323829398-21381.postn3.nabble.com%3E),
 where the users couldn't tell what the problem was because it was being hidden 
by `spark-ec2`.

This is a usability improvement that should be backported to 1.2.

Resolves [SPARK-5473](https://issues.apache.org/jira/browse/SPARK-5473).

Author: Nicholas Chammas <nicholas.cham...@gmail.com>

Closes #4262 from nchammas/expose-ssh-failure and squashes the following 
commits:

8bda6ed [Nicholas Chammas] default to print SSH output
2b92534 [Nicholas Chammas] show SSH output after status check pass

(cherry picked from commit 4dfe180fc893bee1146161f8b2a6efd4d6d2bb8c)
Signed-off-by: Sean Owen <so...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f2aa7b75
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f2aa7b75
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f2aa7b75

Branch: refs/heads/branch-1.3
Commit: f2aa7b7572735019c56091e770c14ccba3d95833
Parents: 5782ee2
Author: Nicholas Chammas <nicholas.cham...@gmail.com>
Authored: Mon Feb 9 09:44:53 2015 +0000
Committer: Sean Owen <so...@cloudera.com>
Committed: Mon Feb 9 09:45:03 2015 +0000

----------------------------------------------------------------------
 ec2/spark_ec2.py | 36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/f2aa7b75/ec2/spark_ec2.py
----------------------------------------------------------------------
diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py
index 3f7242a..ee45dd3 100755
--- a/ec2/spark_ec2.py
+++ b/ec2/spark_ec2.py
@@ -32,6 +32,7 @@ import subprocess
 import sys
 import tarfile
 import tempfile
+import textwrap
 import time
 import urllib2
 import warnings
@@ -678,21 +679,32 @@ def setup_spark_cluster(master, opts):
         print "Ganglia started at http://%s:5080/ganglia"; % master
 
 
-def is_ssh_available(host, opts):
+def is_ssh_available(host, opts, print_ssh_output=True):
     """
     Check if SSH is available on a host.
     """
-    try:
-        with open(os.devnull, 'w') as devnull:
-            ret = subprocess.check_call(
-                ssh_command(opts) + ['-t', '-t', '-o', 'ConnectTimeout=3',
-                                     '%s@%s' % (opts.user, host), 
stringify_command('true')],
-                stdout=devnull,
-                stderr=devnull
-            )
-        return ret == 0
-    except subprocess.CalledProcessError as e:
-        return False
+    s = subprocess.Popen(
+        ssh_command(opts) + ['-t', '-t', '-o', 'ConnectTimeout=3',
+                             '%s@%s' % (opts.user, host), 
stringify_command('true')],
+        stdout=subprocess.PIPE,
+        stderr=subprocess.STDOUT  # we pipe stderr through stdout to preserve 
output order
+    )
+    cmd_output = s.communicate()[0]  # [1] is stderr, which we redirected to 
stdout
+
+    if s.returncode != 0 and print_ssh_output:
+        # extra leading newline is for spacing in wait_for_cluster_state()
+        print textwrap.dedent("""\n
+            Warning: SSH connection error. (This could be temporary.)
+            Host: {h}
+            SSH return code: {r}
+            SSH output: {o}
+        """).format(
+            h=host,
+            r=s.returncode,
+            o=cmd_output.strip()
+        )
+
+    return s.returncode == 0
 
 
 def is_cluster_ssh_available(cluster_instances, opts):


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to