Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r18027788 --- Diff: docs/spark-standalone.md --- @@ -62,7 +62,12 @@ Finally, the following configuration options can be passed to the master and wor # Cluster Launch Scripts -To launch a Spark standalone cluster with the launch scripts, you need to create a file called `conf/slaves` in your Spark directory, which should contain the hostnames of all the machines where you would like to start Spark workers, one per line. The master machine must be able to access each of the slave machines via password-less `ssh` (using a private key). For testing, you can just put `localhost` in this file. +To launch a Spark standalone cluster with the launch scripts, you need to create a file called `conf/slaves` in your Spark directory, +which should contain the hostnames of all the machines where you would like to start Spark workers, one per line. If `conf/slaves` +does not exist, the launch scripts use a list which contains single hostname `localhost`. This can be used for testing. +The master machine must be able to access each of the slave machines via `ssh`. By default, `ssh` is executed in the background for parallel execution for each slave machine. +If you would like to use password authentication instead of password-less(using a private key) for `ssh`, `ssh` does not work well in the background. +To avoid this, you can set a environment variable `SPARK_SSH_FOREGROUND` to something like `yes` or `y` to execute `ssh` in the foreground. --- End diff -- what about - To launch a Spark standalone cluster with the launch scripts, you should create a file called `conf/slaves` in your Spark directory, which must contain the hostnames of all the machines where you intend to start Spark workers, one per line. If `conf/slaves` does not exist, the launch scripts defaults to a single machine (`localhost`), which is useful for testing. Note, the master machine accesses each of the worker machines via `ssh`. By default, `ssh` is run in parallel and requires password-less (using a private key) access to be setup. If you do not have a password-less setup, you can set the environment variable `SPARK_SSH_FOREGROUND` and serially provide a password for each worker.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org