[jira] [Commented] (SPARK-5189) Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master

2016-01-27 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119220#comment-15119220
 ] 

Nicholas Chammas commented on SPARK-5189:
-

FWIW, I found this issue to be practically unsolvable without rewriting most of 
spark-ec2, so I started a new project that aims to replace spark-ec2 for most 
of its use cases: [Flintrock|https://github.com/nchammas/flintrock]

> Reorganize EC2 scripts so that nodes can be provisioned independent of Spark 
> master
> ---
>
> Key: SPARK-5189
> URL: https://issues.apache.org/jira/browse/SPARK-5189
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2
>Reporter: Nicholas Chammas
>
> As of 1.2.0, we launch Spark clusters on EC2 by setting up the master first, 
> then setting up all the slaves together. This includes broadcasting files 
> from the lonely master to potentially hundreds of slaves.
> There are 2 main problems with this approach:
> # Broadcasting files from the master to all slaves using 
> [{{copy-dir}}|https://github.com/mesos/spark-ec2/blob/branch-1.3/copy-dir.sh] 
> (e.g. during [ephemeral-hdfs 
> init|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/ephemeral-hdfs/init.sh#L36],
>  or during [Spark 
> setup|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/setup.sh#L3])
>  takes a long time. This time increases as the number of slaves increases.
>  I did some testing in {{us-east-1}}. This is, concretely, what the problem 
> looks like:
>  || number of slaves ({{m3.large}}) || launch time (best of 6 tries) ||
> | 1 | 8m 44s |
> | 10 | 13m 45s |
> | 25 | 22m 50s |
> | 50 | 37m 30s |
> | 75 | 51m 30s |
> | 99 | 1h 5m 30s |
>  Unfortunately, I couldn't report on 100 slaves or more due to SPARK-6246, 
> but I think the point is clear enough.
>  We can extrapolate from this data that *every additional slave adds roughly 
> 35 seconds to the launch time* (so a cluster with 100 slaves would take 1h 6m 
> 5s to launch).
> # It's more complicated to add slaves to an existing cluster (a la 
> [SPARK-2008]), since slaves are only configured through the master during the 
> setup of the master itself.
> Logically, the operations we want to implement are:
> * Provision a Spark node
> * Join a node to a cluster (including an empty cluster) as either a master or 
> a slave
> * Remove a node from a cluster
> We need our scripts to roughly be organized to match the above operations. 
> The goals would be:
> # When launching a cluster, enable all cluster nodes to be provisioned in 
> parallel, removing the master-to-slave file broadcast bottleneck.
> # Facilitate cluster modifications like adding or removing nodes.
> # Enable exploration of infrastructure tools like 
> [Terraform|https://www.terraform.io/] that might simplify {{spark-ec2}} 
> internals and perhaps even allow us to build [one tool that launches Spark 
> clusters on several different cloud 
> platforms|https://groups.google.com/forum/#!topic/terraform-tool/eD23GLLkfDw].
> More concretely, the modifications we need to make are:
> * Replace all occurrences of {{copy-dir}} or {{rsync}}-to-slaves with 
> equivalent, slave-side operations.
> * Repurpose {{setup-slave.sh}} as {{provision-spark-node.sh}} and make sure 
> it fully creates a node that can be used as either a master or slave.
> * Create a new script, {{join-to-cluster.sh}}, that takes a provisioned node, 
> configures it as a master or slave, and joins it to a cluster.
> * Move any remaining logic in {{setup.sh}} up to {{spark_ec2.py}} and delete 
> that script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5189) Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master

2015-04-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516854#comment-14516854
 ] 

Sean Owen commented on SPARK-5189:
--

[~jackli066519] You don't need to have this assigned to you, but I would work 
with [~nchammas] to understand first whether this is still relevant or what 
he's done.

 Reorganize EC2 scripts so that nodes can be provisioned independent of Spark 
 master
 ---

 Key: SPARK-5189
 URL: https://issues.apache.org/jira/browse/SPARK-5189
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas

 As of 1.2.0, we launch Spark clusters on EC2 by setting up the master first, 
 then setting up all the slaves together. This includes broadcasting files 
 from the lonely master to potentially hundreds of slaves.
 There are 2 main problems with this approach:
 # Broadcasting files from the master to all slaves using 
 [{{copy-dir}}|https://github.com/mesos/spark-ec2/blob/branch-1.3/copy-dir.sh] 
 (e.g. during [ephemeral-hdfs 
 init|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/ephemeral-hdfs/init.sh#L36],
  or during [Spark 
 setup|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/setup.sh#L3])
  takes a long time. This time increases as the number of slaves increases.
  I did some testing in {{us-east-1}}. This is, concretely, what the problem 
 looks like:
  || number of slaves ({{m3.large}}) || launch time (best of 6 tries) ||
 | 1 | 8m 44s |
 | 10 | 13m 45s |
 | 25 | 22m 50s |
 | 50 | 37m 30s |
 | 75 | 51m 30s |
 | 99 | 1h 5m 30s |
  Unfortunately, I couldn't report on 100 slaves or more due to SPARK-6246, 
 but I think the point is clear enough.
 # It's more complicated to add slaves to an existing cluster (a la 
 [SPARK-2008]), since slaves are only configured through the master during the 
 setup of the master itself.
 Logically, the operations we want to implement are:
 * Provision a Spark node
 * Join a node to a cluster (including an empty cluster) as either a master or 
 a slave
 * Remove a node from a cluster
 We need our scripts to roughly be organized to match the above operations. 
 The goals would be:
 # When launching a cluster, enable all cluster nodes to be provisioned in 
 parallel, removing the master-to-slave file broadcast bottleneck.
 # Facilitate cluster modifications like adding or removing nodes.
 # Enable exploration of infrastructure tools like 
 [Terraform|https://www.terraform.io/] that might simplify {{spark-ec2}} 
 internals and perhaps even allow us to build [one tool that launches Spark 
 clusters on several different cloud 
 platforms|https://groups.google.com/forum/#!topic/terraform-tool/eD23GLLkfDw].
 More concretely, the modifications we need to make are:
 * Replace all occurrences of {{copy-dir}} or {{rsync}}-to-slaves with 
 equivalent, slave-side operations.
 * Repurpose {{setup-slave.sh}} as {{provision-spark-node.sh}} and make sure 
 it fully creates a node that can be used as either a master or slave.
 * Create a new script, {{join-to-cluster.sh}}, that takes a provisioned node, 
 configures it as a master or slave, and joins it to a cluster.
 * Move any remaining logic in {{setup.sh}} up to {{spark_ec2.py}} and delete 
 that script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5189) Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master

2015-04-28 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517301#comment-14517301
 ] 

Nicholas Chammas commented on SPARK-5189:
-

Yeah, as Sean said you can just start working on this whenever you want. Just 
let us know over here in a comment and that way others can know that someone is 
already working on this.

This issue is still relevant, but unfortunately, solving it requires 
redesigning the whole of spark-ec2 to be able to provision nodes in parallel. 
This means changing the Bash scripts in the mesos/spark-ec2 repo to act on 1 
node at a time, and changing the main spark-ec2 script itself to be 
multi-threaded (or somehow otherwise asynchronous) to be able to manage several 
nodes in parallel.

It's probably a major effort, but you can definitely take it on if you are 
interested.

 Reorganize EC2 scripts so that nodes can be provisioned independent of Spark 
 master
 ---

 Key: SPARK-5189
 URL: https://issues.apache.org/jira/browse/SPARK-5189
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas

 As of 1.2.0, we launch Spark clusters on EC2 by setting up the master first, 
 then setting up all the slaves together. This includes broadcasting files 
 from the lonely master to potentially hundreds of slaves.
 There are 2 main problems with this approach:
 # Broadcasting files from the master to all slaves using 
 [{{copy-dir}}|https://github.com/mesos/spark-ec2/blob/branch-1.3/copy-dir.sh] 
 (e.g. during [ephemeral-hdfs 
 init|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/ephemeral-hdfs/init.sh#L36],
  or during [Spark 
 setup|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/setup.sh#L3])
  takes a long time. This time increases as the number of slaves increases.
  I did some testing in {{us-east-1}}. This is, concretely, what the problem 
 looks like:
  || number of slaves ({{m3.large}}) || launch time (best of 6 tries) ||
 | 1 | 8m 44s |
 | 10 | 13m 45s |
 | 25 | 22m 50s |
 | 50 | 37m 30s |
 | 75 | 51m 30s |
 | 99 | 1h 5m 30s |
  Unfortunately, I couldn't report on 100 slaves or more due to SPARK-6246, 
 but I think the point is clear enough.
 # It's more complicated to add slaves to an existing cluster (a la 
 [SPARK-2008]), since slaves are only configured through the master during the 
 setup of the master itself.
 Logically, the operations we want to implement are:
 * Provision a Spark node
 * Join a node to a cluster (including an empty cluster) as either a master or 
 a slave
 * Remove a node from a cluster
 We need our scripts to roughly be organized to match the above operations. 
 The goals would be:
 # When launching a cluster, enable all cluster nodes to be provisioned in 
 parallel, removing the master-to-slave file broadcast bottleneck.
 # Facilitate cluster modifications like adding or removing nodes.
 # Enable exploration of infrastructure tools like 
 [Terraform|https://www.terraform.io/] that might simplify {{spark-ec2}} 
 internals and perhaps even allow us to build [one tool that launches Spark 
 clusters on several different cloud 
 platforms|https://groups.google.com/forum/#!topic/terraform-tool/eD23GLLkfDw].
 More concretely, the modifications we need to make are:
 * Replace all occurrences of {{copy-dir}} or {{rsync}}-to-slaves with 
 equivalent, slave-side operations.
 * Repurpose {{setup-slave.sh}} as {{provision-spark-node.sh}} and make sure 
 it fully creates a node that can be used as either a master or slave.
 * Create a new script, {{join-to-cluster.sh}}, that takes a provisioned node, 
 configures it as a master or slave, and joins it to a cluster.
 * Move any remaining logic in {{setup.sh}} up to {{spark_ec2.py}} and delete 
 that script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5189) Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master

2015-04-27 Thread pengyunli (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516427#comment-14516427
 ] 

pengyunli commented on SPARK-5189:
--

i want to do this issue ,please assign it to me

 Reorganize EC2 scripts so that nodes can be provisioned independent of Spark 
 master
 ---

 Key: SPARK-5189
 URL: https://issues.apache.org/jira/browse/SPARK-5189
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas

 As of 1.2.0, we launch Spark clusters on EC2 by setting up the master first, 
 then setting up all the slaves together. This includes broadcasting files 
 from the lonely master to potentially hundreds of slaves.
 There are 2 main problems with this approach:
 # Broadcasting files from the master to all slaves using 
 [{{copy-dir}}|https://github.com/mesos/spark-ec2/blob/branch-1.3/copy-dir.sh] 
 (e.g. during [ephemeral-hdfs 
 init|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/ephemeral-hdfs/init.sh#L36],
  or during [Spark 
 setup|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/setup.sh#L3])
  takes a long time. This time increases as the number of slaves increases.
  I did some testing in {{us-east-1}}. This is, concretely, what the problem 
 looks like:
  || number of slaves ({{m3.large}}) || launch time (best of 6 tries) ||
 | 1 | 8m 44s |
 | 10 | 13m 45s |
 | 25 | 22m 50s |
 | 50 | 37m 30s |
 | 75 | 51m 30s |
 | 99 | 1h 5m 30s |
  Unfortunately, I couldn't report on 100 slaves or more due to SPARK-6246, 
 but I think the point is clear enough.
 # It's more complicated to add slaves to an existing cluster (a la 
 [SPARK-2008]), since slaves are only configured through the master during the 
 setup of the master itself.
 Logically, the operations we want to implement are:
 * Provision a Spark node
 * Join a node to a cluster (including an empty cluster) as either a master or 
 a slave
 * Remove a node from a cluster
 We need our scripts to roughly be organized to match the above operations. 
 The goals would be:
 # When launching a cluster, enable all cluster nodes to be provisioned in 
 parallel, removing the master-to-slave file broadcast bottleneck.
 # Facilitate cluster modifications like adding or removing nodes.
 # Enable exploration of infrastructure tools like 
 [Terraform|https://www.terraform.io/] that might simplify {{spark-ec2}} 
 internals and perhaps even allow us to build [one tool that launches Spark 
 clusters on several different cloud 
 platforms|https://groups.google.com/forum/#!topic/terraform-tool/eD23GLLkfDw].
 More concretely, the modifications we need to make are:
 * Replace all occurrences of {{copy-dir}} or {{rsync}}-to-slaves with 
 equivalent, slave-side operations.
 * Repurpose {{setup-slave.sh}} as {{provision-spark-node.sh}} and make sure 
 it fully creates a node that can be used as either a master or slave.
 * Create a new script, {{join-to-cluster.sh}}, that takes a provisioned node, 
 configures it as a master or slave, and joins it to a cluster.
 * Move any remaining logic in {{setup.sh}} up to {{spark_ec2.py}} and delete 
 that script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5189) Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master

2015-03-12 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359665#comment-14359665
 ] 

Nicholas Chammas commented on SPARK-5189:
-

For the record, this is the script I used to get the launch time stats above:

{code}
{
python -m timeit -r 6 -n 1 \
--setup 'import subprocess; import time; subprocess.call(yes y | 
./ec2/spark-ec2 destroy launch-test --identity-file /path/to/file.pem 
--key-pair my-pair --region us-east-1, shell=True); time.sleep(60)' \
'subprocess.call(./ec2/spark-ec2 launch launch-test --slaves 99 
--identity-file /path/to/file.pem --key-pair my-pair --region us-east-1 --zone 
us-east-1c --instance-type m3.large, shell=True)'

yes y | ./ec2/spark-ec2 destroy launch-test --identity-file 
/path/to/file.pem --key-pair my-pair --region us-east-1
}
{code}

 Reorganize EC2 scripts so that nodes can be provisioned independent of Spark 
 master
 ---

 Key: SPARK-5189
 URL: https://issues.apache.org/jira/browse/SPARK-5189
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas

 As of 1.2.0, we launch Spark clusters on EC2 by setting up the master first, 
 then setting up all the slaves together. This includes broadcasting files 
 from the lonely master to potentially hundreds of slaves.
 There are 2 main problems with this approach:
 # Broadcasting files from the master to all slaves using 
 [{{copy-dir}}|https://github.com/mesos/spark-ec2/blob/branch-1.3/copy-dir.sh] 
 (e.g. during [ephemeral-hdfs 
 init|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/ephemeral-hdfs/init.sh#L36],
  or during [Spark 
 setup|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/setup.sh#L3])
  takes a long time. This time increases as the number of slaves increases.
  I did some testing in {{us-east-1}}. This is, concretely, what the problem 
 looks like:
  || number of slaves ({{m3.large}}) || launch time (best of 6 tries) ||
 | 1 | 8m 44s |
 | 10 | 13m 45s |
 | 25 | 22m 50s |
 | 50 | 37m 30s |
 | 75 | 51m 30s |
 | 99 | 1h 5m 30s |
  Unfortunately, I couldn't report on 100 slaves or more due to SPARK-6246, 
 but I think the point is clear enough.
 # It's more complicated to add slaves to an existing cluster (a la 
 [SPARK-2008]), since slaves are only configured through the master during the 
 setup of the master itself.
 Logically, the operations we want to implement are:
 * Provision a Spark node
 * Join a node to a cluster (including an empty cluster) as either a master or 
 a slave
 * Remove a node from a cluster
 We need our scripts to roughly be organized to match the above operations. 
 The goals would be:
 # When launching a cluster, enable all cluster nodes to be provisioned in 
 parallel, removing the master-to-slave file broadcast bottleneck.
 # Facilitate cluster modifications like adding or removing nodes.
 # Enable exploration of infrastructure tools like 
 [Terraform|https://www.terraform.io/] that might simplify {{spark-ec2}} 
 internals and perhaps even allow us to build [one tool that launches Spark 
 clusters on several different cloud 
 platforms|https://groups.google.com/forum/#!topic/terraform-tool/eD23GLLkfDw].
 More concretely, the modifications we need to make are:
 * Replace all occurrences of {{copy-dir}} or {{rsync}}-to-slaves with 
 equivalent, slave-side operations.
 * Repurpose {{setup-slave.sh}} as {{provision-spark-node.sh}} and make sure 
 it fully creates a node that can be used as either a master or slave.
 * Create a new script, {{join-to-cluster.sh}}, that takes a provisioned node, 
 configures it as a master or slave, and joins it to a cluster.
 * Move any remaining logic in {{setup.sh}} up to {{spark_ec2.py}} and delete 
 that script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5189) Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master

2015-01-10 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272660#comment-14272660
 ] 

Nicholas Chammas commented on SPARK-5189:
-

cc [~joshrosen] and [~shivaram] - What do y'all think?

 Reorganize EC2 scripts so that nodes can be provisioned independent of Spark 
 master
 ---

 Key: SPARK-5189
 URL: https://issues.apache.org/jira/browse/SPARK-5189
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Nicholas Chammas

 As of 1.2.0, we launch Spark clusters on EC2 by setting up the master first, 
 then setting up all the slaves together. This includes broadcasting files 
 from the lonely master to potentially hundreds of slaves.
 There are 2 main problems with this approach:
 # Broadcasting files from the master to all slaves using 
 [{{copy-dir}}|https://github.com/mesos/spark-ec2/blob/branch-1.3/copy-dir.sh] 
 (e.g. during [ephemeral-hdfs 
 init|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/ephemeral-hdfs/init.sh#L36],
  or during [Spark 
 setup|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/setup.sh#L3])
  takes a long time. This time increases as the number of slaves increases.
 # It's more complicated to add slaves to an existing cluster (a la 
 [SPARK-2008]), since slaves are only configured through the master during the 
 setup of the master itself.
 Logically, the operations we want to implement are:
 * Provision a Spark node
 * Join a node to a cluster (including an empty cluster) as either a master or 
 a slave
 * Remove a node from a cluster
 We need our scripts to roughly be organized to match the above operations. 
 The goals would be:
 # When launching a cluster, enable all cluster nodes to be provisioned in 
 parallel, removing the master-to-slave file broadcast bottleneck.
 # Facilitate cluster modifications like adding or removing nodes.
 # Enable exploration of infrastructure tools like 
 [Terraform|https://www.terraform.io/] that might simplify {{spark-ec2}} 
 internals and perhaps even allow us to build [one tool that launches Spark 
 clusters on several different cloud 
 platforms|https://groups.google.com/forum/#!topic/terraform-tool/eD23GLLkfDw].
 More concretely, the modifications we need to make are:
 * Replace all occurrences of {{copy-dir}} or {{rsync}}-to-slaves with 
 equivalent, slave-side operations.
 * Repurpose {{setup-slave.sh}} as {{provision-spark-node.sh}} and make sure 
 it fully creates a node that can be used as either a master or slave.
 * Create a new script, {{join-to-cluster.sh}}, that takes a provisioned node, 
 configures it as a master or slave, and joins it to a cluster.
 * Move any remaining logic in {{setup.sh}} up to {{spark_ec2.py}} and delete 
 that script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org