quick update:

things are looking slightly...  better.  the number of failing builds
due to GC overhead has decreased slightly since the reboots last
week...  in fact, in the last three days the only builds to be
affected are spark-master-test-maven-hadoop-2.7 (three failures) and
spark-master-test-maven-hadoop-2.6 (five failures).

overall percentages (over two weeks) have also dropped from ~9% to
~7%, so at least the rate of failure is dropping.

so, the while we're still bleeding, it's slowed down a bit.  we'll
still need to audit the java heap size allocs in the various tests,
however.

shane

On Fri, Jan 6, 2017 at 1:06 PM, shane knapp <skn...@berkeley.edu> wrote:
> (adding michael armbrust and josh rosen for visibility)
>
> ok.  roughly 9% of all spark tests builds (including both PRB builds
> are failing due to GC overhead limits.
>
> $ wc -l SPARK_TEST_BUILDS GC_FAIL
>  1350 SPARK_TEST_BUILDS
>   125 GC_FAIL
>
> here are the affected builds (over the past ~2 weeks):
> $ sort builds.raw | uniq -c
>       6 NewSparkPullRequestBuilder
>       1 spark-branch-2.0-test-sbt-hadoop-2.6
>       6 spark-branch-2.1-test-maven-hadoop-2.7
>       1 spark-master-test-maven-hadoop-2.4
>      10 spark-master-test-maven-hadoop-2.6
>      12 spark-master-test-maven-hadoop-2.7
>       5 spark-master-test-sbt-hadoop-2.2
>      15 spark-master-test-sbt-hadoop-2.3
>      11 spark-master-test-sbt-hadoop-2.4
>      16 spark-master-test-sbt-hadoop-2.6
>      22 spark-master-test-sbt-hadoop-2.7
>      20 SparkPullRequestBuilder
>
> please note i also included the spark 1.6 test builds in there just to
> check...  they last ran ~1 month ago, and had no GC overhead failures.
> this leads me to believe that this behavior is quite recent.
>
> so yeah...  looks like we (someone other than me?) needs to take a
> look at the sbt and maven java opts.  :)
>
> shane

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to