[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183276#comment-14183276
 ] 

Dan Osipov commented on SPARK-3821:
-----------------------------------

OK, great!

> Could you elaborate on your use case? The use cases I'm currently targeting 
> are focused on improving spark-ec2 launch times and automating updates to any 
> Spark machine images or containers.

There are a few problems with spark-ec2 script:
* Large clusters take too long to spin up. This is due to serial processing of 
each slave. When done in parallel, performance is much better. 
* It doesn't handle failure well. EC2 nodes may fail to start up, but still 
report that they're running. In those cases spark-ec2 freezes, then fails, 
without cleaning up state after itself (leaves instances, security groups, EBS 
volumes).

I rewrote the steps in a scala tool. Its not on feature par with spark-ec2 yet, 
but makes some improvements in the above mentioned areas. The goal is for it to 
serve the same role as EMR cli[1], if you've ever used that - including running 
a job. The problem is that a lot of functionality is still bundled in setup.sh, 
which can be minimized by a) doing most of the work at AMI bundling step b) 
performing it in parallel through the launcher. I'd be glad to put the script 
on github so that you can evaluate the approach.

Are you also planning to create AMIs for different combinations Spark and 
Hadoop versions?

[1] 
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html

> Develop an automated way of creating Spark images (AMI, Docker, and others)
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-3821
>                 URL: https://issues.apache.org/jira/browse/SPARK-3821
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build, EC2
>            Reporter: Nicholas Chammas
>            Assignee: Nicholas Chammas
>
> Right now the creation of Spark AMIs or Docker containers is done manually. 
> With tools like [Packer|http://www.packer.io/], we should be able to automate 
> this work, and do so in such a way that multiple types of machine images can 
> be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to