[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276411#comment-14276411 ]
Nicholas Chammas commented on SPARK-3821: ----------------------------------------- Hi [~florianverhein] and thanks for chiming in! {quote} Re the above, I think everything in create_image.sh can be refactored to packer (+ duplicate removal - e.g. root login). {quote} Definitely. I'm hoping to make as few changes as possible to the existing {{create_image.sh}} script to reduce the review burden, but after this initial proposal is accepted it makes sense to refactor these scripts. There is some related work proposed in [SPARK-5189]. Some of the things you call out regarding version mismatches and whatnot sound like they might merit their own JIRA issues. For example: {quote} It looks like Spark needs to be built with the right hadoop profile to work, but this isn't adhered to. {quote} I haven't tested this out, but from the Spark init script, it looks like the correct version of Spark is used in [the pre-built scenario|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/init.sh#L109]. Not so in the [build-from-git scenario|https://github.com/mesos/spark-ec2/blob/3a95101c70e6892a8a48cc54094adaed1458487a/spark/init.sh#L21], so nice catch. Could you file a JIRA issue for that? {quote} For example, I see no reason why the module init.sh scripts can't be run from packer in order to speed start-up times of the cluster {quote} Regarding this and other ideas regarding pre-baking more on the images, [that's how this proposal started, actually|https://github.com/nchammas/spark-ec2/blob/9c28878694171ba085a10acd4405c702397d28ce/packer/README.md#base-vs-spark-pre-installed] (here's the [original Packer template|https://github.com/nchammas/spark-ec2/blob/9c28878694171ba085a10acd4405c702397d28ce/packer/spark-packer.json#L118-L133]). We decided to rip that out to reduce the complexity of the initial proposal and make it easier to specify different versions of Spark and Hadoop at launch time. > Develop an automated way of creating Spark images (AMI, Docker, and others) > --------------------------------------------------------------------------- > > Key: SPARK-3821 > URL: https://issues.apache.org/jira/browse/SPARK-3821 > Project: Spark > Issue Type: Improvement > Components: Build, EC2 > Reporter: Nicholas Chammas > Assignee: Nicholas Chammas > Attachments: packer-proposal.html > > > Right now the creation of Spark AMIs or Docker containers is done manually. > With tools like [Packer|http://www.packer.io/], we should be able to automate > this work, and do so in such a way that multiple types of machine images can > be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org