make-distribution and the second code snippet both create a distribution
from a clean state. They therefore require that every source file be
compiled and that takes time (you can maybe tweak some settings or use a
newer compiler to gain some speed).

I'm inferring from your question that for your use-case deployment speed is
a critical issue, furthermore you'd like to build Spark for lots of
(every?) commit in a systematic way. In that case I would suggest you try
using the second code snippet without the `clean` task and only resort to
it if the build fails.

On my local machine, an assembly without a clean drops from 6 minutes to 2.


On 23 November 2015 at 20:18, Nicholas Chammas <>

> Say I want to build a complete Spark distribution against Hadoop 2.6+ as
> fast as possible from scratch.
> This is what I’m doing at the moment:
> ./ -T 1C -Phadoop-2.6
> -T 1C instructs Maven to spin up 1 thread per available core. This takes
> around 20 minutes on an m3.large instance.
> I see that spark-ec2, on the other hand, builds Spark as follows
> <>
> when you deploy Spark at a specific git commit:
> sbt/sbt clean assembly
> sbt/sbt publish-local
> This seems slower than using, actually.
> Is there a faster way to do this?
> Nick
> ​

Reply via email to