make-distribution and the second code snippet both create a distribution from a clean state. They therefore require that every source file be compiled and that takes time (you can maybe tweak some settings or use a newer compiler to gain some speed).
I'm inferring from your question that for your use-case deployment speed is a critical issue, furthermore you'd like to build Spark for lots of (every?) commit in a systematic way. In that case I would suggest you try using the second code snippet without the `clean` task and only resort to it if the build fails. On my local machine, an assembly without a clean drops from 6 minutes to 2. regards, --Jakob On 23 November 2015 at 20:18, Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > Say I want to build a complete Spark distribution against Hadoop 2.6+ as > fast as possible from scratch. > > This is what I’m doing at the moment: > > ./make-distribution.sh -T 1C -Phadoop-2.6 > > -T 1C instructs Maven to spin up 1 thread per available core. This takes > around 20 minutes on an m3.large instance. > > I see that spark-ec2, on the other hand, builds Spark as follows > <https://github.com/amplab/spark-ec2/blob/a990752575cd8b0ab25731d7820a55c714798ec3/spark/init.sh#L21-L22> > when you deploy Spark at a specific git commit: > > sbt/sbt clean assembly > sbt/sbt publish-local > > This seems slower than using make-distribution.sh, actually. > > Is there a faster way to do this? > > Nick > >