[ https://issues.apache.org/jira/browse/SPARK-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187471#comment-14187471 ]
Michael Griffiths edited comment on SPARK-3398 at 10/28/14 9:10 PM: -------------------------------------------------------------------- I'm running into an issue with {{wait_for_cluster_state}} - specifically, waiting for {{ssh-ready}}. AFAICT the [valid states in boto are|http://boto.readthedocs.org/en/latest/ref/ec2.html#boto.ec2.instance.InstanceState]: * pending * running * shutting-down * terminated * stopping * stopped When I invoke spark_ec2.py, it never moves to the next stage (infinite loop). Is {{ssh-ready}} a state in a different version of boto? Thanks, Michael was (Author: michael.griffiths): I'm running into an issue with {{wait_for_cluster_state}} - specifically, waiting {{for ssh-ready}}. AFAICT the [valid states in boto are|http://boto.readthedocs.org/en/latest/ref/ec2.html#boto.ec2.instance.InstanceState]: * pending * running * shutting-down * terminated * stopping * stopped When I invoke spark_ec2.py, it never moves to the next stage (infinite loop). Is {{ssh-ready}} a state in a different version of boto? Thanks, Michael > Have spark-ec2 intelligently wait for specific cluster states > ------------------------------------------------------------- > > Key: SPARK-3398 > URL: https://issues.apache.org/jira/browse/SPARK-3398 > Project: Spark > Issue Type: Improvement > Components: EC2 > Reporter: Nicholas Chammas > Assignee: Nicholas Chammas > Priority: Minor > Fix For: 1.2.0 > > > {{spark-ec2}} currently has retry logic for when it tries to install stuff on > a cluster and for when it tries to destroy security groups. > It would be better to have some logic that allows {{spark-ec2}} to explicitly > wait for when all the nodes in a cluster it is working on have reached a > specific state. > Examples: > * Wait for all nodes to be up > * Wait for all nodes to be up and accepting SSH connections (then start > installing stuff) > * Wait for all nodes to be down > * Wait for all nodes to be terminated (then delete the security groups) > Having a function in the {{spark_ec2.py}} script that blocks until the > desired cluster state is reached would reduce the need for various retry > logic. It would probably also eliminate the need for the {{--wait}} parameter. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org