make-distribution and the second code snippet both create a distribution
from a clean state. They therefore require that every source file be
compiled and that takes time (you can maybe tweak some settings or use a
newer compiler to gain some speed).

I'm inferring from your question that for your use-case deployment speed is
a critical issue, furthermore you'd like to build Spark for lots of
(every?) commit in a systematic way. In that case I would suggest you try
using the second code snippet without the `clean` task and only resort to
it if the build fails.

On my local machine, an assembly without a clean drops from 6 minutes to 2.

regards,
--Jakob



On 23 November 2015 at 20:18, Nicholas Chammas <nicholas.cham...@gmail.com>
wrote:

> Say I want to build a complete Spark distribution against Hadoop 2.6+ as
> fast as possible from scratch.
>
> This is what I’m doing at the moment:
>
> ./make-distribution.sh -T 1C -Phadoop-2.6
>
> -T 1C instructs Maven to spin up 1 thread per available core. This takes
> around 20 minutes on an m3.large instance.
>
> I see that spark-ec2, on the other hand, builds Spark as follows
> <https://github.com/amplab/spark-ec2/blob/a990752575cd8b0ab25731d7820a55c714798ec3/spark/init.sh#L21-L22>
> when you deploy Spark at a specific git commit:
>
> sbt/sbt clean assembly
> sbt/sbt publish-local
>
> This seems slower than using make-distribution.sh, actually.
>
> Is there a faster way to do this?
>
> Nick
> ​
>

Reply via email to