Re: Build times for Spark

Shivaram Venkataraman Fri, 25 Apr 2014 15:16:34 -0700

AFAIK the resolver does pick up things form your local ~/.m2 -- Note that
as ~/.m2 is on NFS that adds to the amount of filesystem traffic.


Shivaram


On Fri, Apr 25, 2014 at 2:57 PM, Williams, Ken
<ken.willi...@windlogics.com>wrote:

>  I am indeed, but it's a pretty fast NFS.  I don't have any SSD I can
> use, but I could try to use local disk to see what happens.
>
>
>
> For me, a large portion of the time seems to be spent on lines like
> "Resolving org.fusesource.jansi#jansi;1.4 ..." or similar .  Is this going
> out to find Maven resources?  Any way to tell it to just use my local ~/.m2
> repository instead when the resource already exists there?  Sometimes I
> even get sporadic errors like this:
>
>
>
>   [info] Resolving org.apache.hadoop#hadoop-yarn;2.2.0 ...
>
>   [error] SERVER ERROR: Bad Gateway url=
> http://repo.maven.apache.org/maven2/org/apache/hadoop/hadoop-yarn-server/2.2.0/hadoop-yarn-server-2.2.0.jar
>
>
>
>
>
> -Ken
>
>
>
> *From:* Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu]
> *Sent:* Friday, April 25, 2014 4:31 PM
>
> *To:* user@spark.apache.org
> *Subject:* Re: Build times for Spark
>
>
>
> Are you by any chance building this on NFS ? As far as I know the build is
> severely bottlenecked by filesystem calls during assembly (each class file
> in each dependency gets a fstat call or something like that).  That is
> partly why building from say a local ext4 filesystem or a SSD is much
> faster irrespective of memory / CPU.
>
>
>
> Thanks
>
> Shivaram
>
>
>
> On Fri, Apr 25, 2014 at 2:09 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
> You can always increase the sbt memory by setting
>
> export JAVA_OPTS="-Xmx10g"
>
>
>
>
>   Thanks
>
> Best Regards
>
>
>
> On Sat, Apr 26, 2014 at 2:17 AM, Williams, Ken <
> ken.willi...@windlogics.com> wrote:
>
> No, I haven't done any config for SBT.  Is there somewhere you might be
> able to point me toward for how to do that?
>
>
>
> -Ken
>
>
>
> *From:* Josh Rosen [mailto:rosenvi...@gmail.com]
> *Sent:* Friday, April 25, 2014 3:27 PM
> *To:* user@spark.apache.org
> *Subject:* Re: Build times for Spark
>
>
>
> Did you configure SBT to use the extra memory?
>
>
>
> On Fri, Apr 25, 2014 at 12:53 PM, Williams, Ken <
> ken.willi...@windlogics.com> wrote:
>
> I've cloned the github repo and I'm building Spark on a pretty beefy
> machine (24 CPUs, 78GB of RAM) and it takes a pretty long time.
>
>
>
> For instance, today I did a 'git pull' for the first time in a week or
> two, and then doing 'sbt/sbt assembly' took 43 minutes of wallclock time
> (88 minutes of CPU time).  After that, I did 'SPARK_HADOOP_VERSION=2.2.0
> SPARK_YARN=true sbt/sbt assembly' and that took 25 minutes wallclock, 73
> minutes CPU.
>
>
>
> Is that typical?  Or does that indicate some setup problem in my
> environment?
>
>
>
> --
>
> Ken Williams, Senior Research Scientist
>
> *WindLogics*
>
> http://windlogics.com
>
>
>
>
>  ------------------------------
>
>
> CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution of
> any kind is strictly prohibited. If you are not the intended recipient,
> please contact the sender via reply e-mail and destroy all copies of the
> original message. Thank you.
>
>
>
>
>
>
>

Re: Build times for Spark

Reply via email to