I will echo Steve L's comment about having zinc running (with --nailed). That provides at least a 2X speedup - sometimes without it spark simply does not build for me.
2015-12-08 9:33 GMT-08:00 Josh Rosen <joshro...@databricks.com>: > @Nick, on a fresh EC2 instance a significant chunk of the initial build > time might be due to artifact resolution + downloading. Putting > pre-populated Ivy and Maven caches onto your EC2 machine could shave a > decent chunk of time off that first build. > > On Tue, Dec 8, 2015 at 9:16 AM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Thanks for the tips, Jakob and Steve. >> >> It looks like my original approach is the best for me since I'm >> installing Spark on newly launched EC2 instances and can't take advantage >> of incremental compilation. >> >> Nick >> >> On Tue, Dec 8, 2015 at 7:01 AM Steve Loughran <ste...@hortonworks.com> >> wrote: >> >>> On 7 Dec 2015, at 19:07, Jakob Odersky <joder...@gmail.com> wrote: >>> >>> make-distribution and the second code snippet both create a distribution >>> from a clean state. They therefore require that every source file be >>> compiled and that takes time (you can maybe tweak some settings or use a >>> newer compiler to gain some speed). >>> >>> I'm inferring from your question that for your use-case deployment speed >>> is a critical issue, furthermore you'd like to build Spark for lots of >>> (every?) commit in a systematic way. In that case I would suggest you try >>> using the second code snippet without the `clean` task and only resort to >>> it if the build fails. >>> >>> On my local machine, an assembly without a clean drops from 6 minutes to >>> 2. >>> >>> regards, >>> --Jakob >>> >>> >>> 1. you can use zinc -where possible- to speed up scala compilations >>> 2. you might also consider setting up a local jenkins VM, hooked to >>> whatever git repo & branch you are working off, and have it do the builds >>> and tests for you. Not so great for interactive dev, >>> >>> finally, on the mac, the "say" command is pretty handy at letting you >>> know when some work in a terminal is ready, so you can do the >>> first-thing-in-the morning build-of-the-SNAPSHOTS >>> >>> mvn install -DskipTests -Pyarn,hadoop-2.6 -Dhadoop.version=2.7.1; say moo >>> >>> After that you can work on the modules you care about (via the -pl) >>> option). That doesn't work if you are running on an EC2 instance though >>> >>> >>> >>> >>> On 23 November 2015 at 20:18, Nicholas Chammas < >>> nicholas.cham...@gmail.com> wrote: >>> >>>> Say I want to build a complete Spark distribution against Hadoop 2.6+ >>>> as fast as possible from scratch. >>>> >>>> This is what I’m doing at the moment: >>>> >>>> ./make-distribution.sh -T 1C -Phadoop-2.6 >>>> >>>> -T 1C instructs Maven to spin up 1 thread per available core. This >>>> takes around 20 minutes on an m3.large instance. >>>> >>>> I see that spark-ec2, on the other hand, builds Spark as follows >>>> <https://github.com/amplab/spark-ec2/blob/a990752575cd8b0ab25731d7820a55c714798ec3/spark/init.sh#L21-L22> >>>> when you deploy Spark at a specific git commit: >>>> >>>> sbt/sbt clean assembly >>>> sbt/sbt publish-local >>>> >>>> This seems slower than using make-distribution.sh, actually. >>>> >>>> Is there a faster way to do this? >>>> >>>> Nick >>>> >>>> >>> >>> >>> >