quick update: things are looking slightly... better. the number of failing builds due to GC overhead has decreased slightly since the reboots last week... in fact, in the last three days the only builds to be affected are spark-master-test-maven-hadoop-2.7 (three failures) and spark-master-test-maven-hadoop-2.6 (five failures).
overall percentages (over two weeks) have also dropped from ~9% to ~7%, so at least the rate of failure is dropping. so, the while we're still bleeding, it's slowed down a bit. we'll still need to audit the java heap size allocs in the various tests, however. shane On Fri, Jan 6, 2017 at 1:06 PM, shane knapp <skn...@berkeley.edu> wrote: > (adding michael armbrust and josh rosen for visibility) > > ok. roughly 9% of all spark tests builds (including both PRB builds > are failing due to GC overhead limits. > > $ wc -l SPARK_TEST_BUILDS GC_FAIL > 1350 SPARK_TEST_BUILDS > 125 GC_FAIL > > here are the affected builds (over the past ~2 weeks): > $ sort builds.raw | uniq -c > 6 NewSparkPullRequestBuilder > 1 spark-branch-2.0-test-sbt-hadoop-2.6 > 6 spark-branch-2.1-test-maven-hadoop-2.7 > 1 spark-master-test-maven-hadoop-2.4 > 10 spark-master-test-maven-hadoop-2.6 > 12 spark-master-test-maven-hadoop-2.7 > 5 spark-master-test-sbt-hadoop-2.2 > 15 spark-master-test-sbt-hadoop-2.3 > 11 spark-master-test-sbt-hadoop-2.4 > 16 spark-master-test-sbt-hadoop-2.6 > 22 spark-master-test-sbt-hadoop-2.7 > 20 SparkPullRequestBuilder > > please note i also included the spark 1.6 test builds in there just to > check... they last ran ~1 month ago, and had no GC overhead failures. > this leads me to believe that this behavior is quite recent. > > so yeah... looks like we (someone other than me?) needs to take a > look at the sbt and maven java opts. :) > > shane --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org