[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276411#comment-14276411
 ] 

Nicholas Chammas commented on SPARK-3821:
-----------------------------------------

Hi [~florianverhein] and thanks for chiming in!

{quote}
Re the above, I think everything in create_image.sh can be refactored to packer 
(+ duplicate removal - e.g. root login).
{quote}

Definitely. I'm hoping to make as few changes as possible to the existing 
{{create_image.sh}} script to reduce the review burden, but after this initial 
proposal is accepted it makes sense to refactor these scripts. There is some 
related work proposed in [SPARK-5189].

Some of the things you call out regarding version mismatches and whatnot sound 
like they might merit their own JIRA issues.

For example:

{quote}
It looks like Spark needs to be built with the right hadoop profile to work, 
but this isn't adhered to. 
{quote}

I haven't tested this out, but from the Spark init script, it looks like the 
correct version of Spark is used in [the pre-built 
scenario|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/init.sh#L109].
 Not so in the [build-from-git 
scenario|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/init.sh#L21],
 so nice catch. Could you file a JIRA issue for that?

{quote}
For example, I see no reason why the module init.sh scripts can't be run from 
packer in order to speed start-up times of the cluster
{quote}

Regarding this and other ideas regarding pre-baking more on the images, [that's 
how this proposal started, 
actually|https://github.com/nchammas/spark-ec2/blob/9c28878694171ba085a10acd4405c702397d28ce/packer/README.md#base-vs-spark-pre-installed]
 (here's the [original Packer 
template|https://github.com/nchammas/spark-ec2/blob/9c28878694171ba085a10acd4405c702397d28ce/packer/spark-packer.json#L118-L133]).
 We decided to rip that out to reduce the complexity of the initial proposal 
and make it easier to specify different versions of Spark and Hadoop at launch 
time.

> Develop an automated way of creating Spark images (AMI, Docker, and others)
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-3821
>                 URL: https://issues.apache.org/jira/browse/SPARK-3821
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build, EC2
>            Reporter: Nicholas Chammas
>            Assignee: Nicholas Chammas
>         Attachments: packer-proposal.html
>
>
> Right now the creation of Spark AMIs or Docker containers is done manually. 
> With tools like [Packer|http://www.packer.io/], we should be able to automate 
> this work, and do so in such a way that multiple types of machine images can 
> be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to