FYI: I've created SPARK-3821: Develop an automated way of creating Spark images (AMI, Docker, and others) <https://issues.apache.org/jira/browse/SPARK-3821>
On Mon, Oct 6, 2014 at 4:48 PM, Daniil Osipov <daniil.osi...@shazam.com> wrote: > I've also been looking at this. Basically, the Spark EC2 script is > excellent for small development clusters of several nodes, but isn't > suitable for production. It handles instance setup in a single threaded > manner, while it can easily be parallelized. It also doesn't handle failure > well, ex when an instance fails to start or is taking too long to respond. > > Our desire was to have an equivalent to Amazon EMR[1] API that would > trigger Spark jobs, including specified cluster setup. I've done some work > towards that end, and it would benefit from an updated AMI greatly. > > Dan > > [1] > http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-cli-commands.html > > On Sat, Oct 4, 2014 at 7:28 AM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Thanks for posting that script, Patrick. It looks like a good place to >> start. >> >> Regarding Docker vs. Packer, as I understand it you can use Packer to >> create Docker containers at the same time as AMIs and other image types. >> >> Nick >> >> >> On Sat, Oct 4, 2014 at 2:49 AM, Patrick Wendell <pwend...@gmail.com> >> wrote: >> >> > Hey All, >> > >> > Just a couple notes. I recently posted a shell script for creating the >> > AMI's from a clean Amazon Linux AMI. >> > >> > https://github.com/mesos/spark-ec2/blob/v3/create_image.sh >> > >> > I think I will update the AMI's soon to get the most recent security >> > updates. For spark-ec2's purpose this is probably sufficient (we'll >> > only need to re-create them every few months). >> > >> > However, it would be cool if someone wanted to tackle providing a more >> > general mechanism for defining Spark-friendly "images" that can be >> > used more generally. I had thought that docker might be a good way to >> > go for something like this - but maybe this packer thing is good too. >> > >> > For one thing, if we had a standard image we could use it to create >> > containers for running Spark's unit test, which would be really cool. >> > This would help a lot with random issues around port and filesystem >> > contention we have for unit tests. >> > >> > I'm not sure if the long term place for this would be inside the spark >> > codebase or a community library or what. But it would definitely be >> > very valuable to have if someone wanted to take it on. >> > >> > - Patrick >> > >> > On Fri, Oct 3, 2014 at 5:20 PM, Nicholas Chammas >> > <nicholas.cham...@gmail.com> wrote: >> > > FYI: There is an existing issue -- SPARK-3314 >> > > <https://issues.apache.org/jira/browse/SPARK-3314> -- about scripting >> > the >> > > creation of Spark AMIs. >> > > >> > > With Packer, it looks like we may be able to script the creation of >> > > multiple image types (VMWare, GCE, AMI, Docker, etc...) at once from a >> > > single Packer template. That's very cool. >> > > >> > > I'll be looking into this. >> > > >> > > Nick >> > > >> > > >> > > On Thu, Oct 2, 2014 at 8:23 PM, Nicholas Chammas < >> > nicholas.cham...@gmail.com >> > >> wrote: >> > > >> > >> Thanks for the update, Nate. I'm looking forward to seeing how these >> > >> projects turn out. >> > >> >> > >> David, Packer looks very, very interesting. I'm gonna look into it >> more >> > >> next week. >> > >> >> > >> Nick >> > >> >> > >> >> > >> On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico <n...@reactor8.com> >> wrote: >> > >> >> > >>> Bit of progress on our end, bit of lagging as well. Our guy leading >> > >>> effort got little bogged down on client project to update hive/sql >> > testbed >> > >>> to latest spark/sparkSQL, also launching public service so we have >> > been bit >> > >>> scattered recently. >> > >>> >> > >>> Will have some more updates probably after next week. We are >> planning >> > on >> > >>> taking our client work around hive/spark, plus taking over the >> bigtop >> > >>> automation work to modernize and get that fit for human consumption >> > outside >> > >>> or org. All our work and puppet modules will be open sourced, >> > documented, >> > >>> hopefully start to rally some other folks around effort that find it >> > useful >> > >>> >> > >>> Side note, another effort we are looking into is gradle >> tests/support. >> > >>> We have been leveraging serverspec for some basic infrastructure >> > tests, but >> > >>> with bigtop switching over to gradle builds/testing setup in 0.8 we >> > want to >> > >>> include support for that in our own efforts, probably some stuff >> that >> > can >> > >>> be learned and leveraged in spark world for repeatable/tested >> > infrastructure >> > >>> >> > >>> If anyone has any specific automation questions to your environment >> you >> > >>> can drop me a line directly.., will try to help out best I can. >> Else >> > will >> > >>> post update to dev list once we get on top of our own product >> release >> > and >> > >>> the bigtop work >> > >>> >> > >>> Nate >> > >>> >> > >>> >> > >>> -----Original Message----- >> > >>> From: David Rowe [mailto:davidr...@gmail.com] >> > >>> Sent: Thursday, October 02, 2014 4:44 PM >> > >>> To: Nicholas Chammas >> > >>> Cc: dev; Shivaram Venkataraman >> > >>> Subject: Re: EC2 clusters ready in launch time + 30 seconds >> > >>> >> > >>> I think this is exactly what packer is for. See e.g. >> > >>> http://www.packer.io/intro/getting-started/build-image.html >> > >>> >> > >>> On a related note, the current AMI for hvm systems (e.g. m3.*, r3.*) >> > has >> > >>> a bad package for httpd, whcih causes ganglia not to start. For some >> > reason >> > >>> I can't get access to the raw AMI to fix it. >> > >>> >> > >>> On Fri, Oct 3, 2014 at 9:30 AM, Nicholas Chammas < >> > >>> nicholas.cham...@gmail.com >> > >>> > wrote: >> > >>> >> > >>> > Is there perhaps a way to define an AMI programmatically? Like, a >> > >>> > collection of base AMI id + list of required stuff to be >> installed + >> > >>> > list of required configuration changes. I'm guessing that's what >> > >>> > people use things like Puppet, Ansible, or maybe also AWS >> > >>> CloudFormation for, right? >> > >>> > >> > >>> > If we could do something like that, then with every new release of >> > >>> > Spark we could quickly and easily create new AMIs that have >> > everything >> > >>> we need. >> > >>> > spark-ec2 would only have to bring up the instances and do a >> minimal >> > >>> > amount of configuration, and the only thing we'd need to track in >> the >> > >>> > Spark repo is the code that defines what goes on the AMI, as well >> as >> > a >> > >>> > list of the AMI ids specific to each release. >> > >>> > >> > >>> > I'm just thinking out loud here. Does this make sense? >> > >>> > >> > >>> > Nate, >> > >>> > >> > >>> > Any progress on your end with this work? >> > >>> > >> > >>> > Nick >> > >>> > >> > >>> > >> > >>> > On Sun, Jul 13, 2014 at 8:53 PM, Shivaram Venkataraman < >> > >>> > shiva...@eecs.berkeley.edu> wrote: >> > >>> > >> > >>> > > It should be possible to improve cluster launch time if we are >> > >>> > > careful about what commands we run during setup. One way to do >> this >> > >>> > > would be to walk down the list of things we do for cluster >> > >>> > > initialization and see if there is anything we can do make >> things >> > >>> > > faster. Unfortunately this might >> > >>> > be >> > >>> > > pretty time consuming, but I don't know of a better strategy. >> The >> > >>> > > place >> > >>> > to >> > >>> > > start would be the setup.sh file at >> > >>> > > https://github.com/mesos/spark-ec2/blob/v3/setup.sh >> > >>> > > >> > >>> > > Here are some things that take a lot of time and could be >> improved: >> > >>> > > 1. Creating swap partitions on all machines. We could check if >> > there >> > >>> > > is a way to get EC2 to always mount a swap partition 2. Copying >> / >> > >>> > > syncing things across slaves. The copy-dir script is called too >> > many >> > >>> > > times right now and each time it pauses for a few milliseconds >> > >>> > > between slaves [1]. This could be improved by removing >> unnecessary >> > >>> > > copies 3. We could make less frequently used modules like >> Tachyon, >> > >>> > > persistent >> > >>> > hdfs >> > >>> > > not a part of the default setup. >> > >>> > > >> > >>> > > [1] https://github.com/mesos/spark-ec2/blob/v3/copy-dir.sh#L42 >> > >>> > > >> > >>> > > Thanks >> > >>> > > Shivaram >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > >> > >>> > > On Sat, Jul 12, 2014 at 7:02 PM, Nicholas Chammas < >> > >>> > > nicholas.cham...@gmail.com> wrote: >> > >>> > > >> > >>> > > > On Thu, Jul 10, 2014 at 8:10 PM, Nate D'Amico < >> n...@reactor8.com >> > > >> > >>> > wrote: >> > >>> > > > >> > >>> > > > > Starting to work through some automation/config stuff for >> spark >> > >>> > > > > stack >> > >>> > > on >> > >>> > > > > EC2 with a project, will be focusing the work through the >> > apache >> > >>> > bigtop >> > >>> > > > > effort to start, can then share with spark community >> directly >> > as >> > >>> > things >> > >>> > > > > progress if people are interested >> > >>> > > > >> > >>> > > > >> > >>> > > > Let us know how that goes. I'm definitely interested in >> hearing >> > >>> more. >> > >>> > > > >> > >>> > > > Nick >> > >>> > > > >> > >>> > > >> > >>> > >> > >>> >> > >>> >> > >> >> > >> > >