[ 
https://issues.apache.org/jira/browse/SPARK-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303080#comment-14303080
 ] 

Sean Owen commented on SPARK-5552:
----------------------------------

It sounds fine, but not something that lives within Spark, as it just includes 
Spark among other things. You should host it as an AMI. Is there a change to 
Spark proposed here?

> Automated data science AMI creation and data science cluster deployment on EC2
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-5552
>                 URL: https://issues.apache.org/jira/browse/SPARK-5552
>             Project: Spark
>          Issue Type: New Feature
>          Components: EC2
>            Reporter: Florian Verhein
>
> Issue created RE: 
> https://github.com/mesos/spark-ec2/pull/90#issuecomment-72597154 (please read 
> for background)
> Goal:
> Extend spark-ec2 scripts to create an automated data science cluster 
> deployment on EC2, suitable for almost(?)-production use.
> Use cases: 
> - A user can build their own custom data science AMIs from a CentOS minimal 
> image by calling a packer configuration (good defaults should be provided, 
> some options for flexibility)
> - A user can then easily deploy a new (correctly configured) cluster using 
> these AMIs, and do so as quickly as possible.
> Components/modules: Spark + tachyon + hdfs (on instance storage) + python + R 
> + vowpal wabbit + any rpms + ... + ganglia
> Focus is on reliability (rather than e.g. supporting many versions / dev 
> testing) and speed of deployment.
> Use hadoop 2 so option to lift into yarn later.
> My current solution is here: 
> https://github.com/florianverhein/spark-ec2/tree/packer. It includes other 
> fixes/improvements as needed to get it working.
> Now that it seems to work (but has deviated a lot more from the existing code 
> base than I was expecting), I'm wondering what to do with it...
> Keen to hear ideas if anyone is interested. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to