Spurious test failures, testing best practices

Ryan Williams Sun, 30 Nov 2014 14:42:07 -0800

In the course of trying to make contributions to Spark, I have had a lot of
trouble running Spark's tests successfully. The main pain points I've
experienced are:


    1) frequent, spurious test failures
    2) high latency of running tests
    3) difficulty running specific tests in an iterative fashion

Here is an example series of failures that I encountered this weekend
(along with footnote links to the console output from each and
approximately how long each took):

- `./dev/run-tests` [1]: failure in BroadcastSuite that I've not seen
before.
- `mvn '-Dsuites=*BroadcastSuite*' test` [2]: same failure.
- `mvn '-Dsuites=*BroadcastSuite* Unpersisting' test` [3]: BroadcastSuite
passed, but scala compiler crashed on the "catalyst" project.
- `mvn clean`: some attempts to run earlier commands (that previously
didn't crash the compiler) all result in the same compiler crash. Previous
discussion on this list implies this can only be solved by a `mvn clean`
[4].
- `mvn '-Dsuites=*BroadcastSuite*' test` [5]: immediately post-clean,
BroadcastSuite can't run because assembly is not built.
- `./dev/run-tests` again [6]: pyspark tests fail, some messages about
version mismatches and python 2.6. The machine this ran on has python 2.7,
so I don't know what that's about.
- `./dev/run-tests` again [7]: "too many open files" errors in several
tests. `ulimit -a` shows a maximum of 4864 open files. Apparently this is
not enough, but only some of the time? I increased it to 8192 and tried
again.
- `./dev/run-tests` again [8]: same pyspark errors as before. This seems to
be the issue from SPARK-3867 [9], which was supposedly fixed on October 14;
not sure how I'm seeing it now. In any case, switched to Python 2.6 and
installed unittest2, and python/run-tests seems to be unblocked.
- `./dev/run-tests` again [10]: finally passes!

This was on a spark checkout at ceb6281 (ToT Friday), with a few trivial
changes added on (that I wanted to test before sending out a PR), on a
macbook running OSX Yosemite (10.10.1), java 1.8 and mvn 3.2.3 [11].

Meanwhile, on a linux 2.6.32 / CentOS 6.4 machine, I tried similar commands
from the same repo state:

- `./dev/run-tests` [12]: YarnClusterSuite failure.
- `./dev/run-tests` [13]: same YarnClusterSuite failure. I know I've seen
this one before on this machine and am guessing it actually occurs every
time.
- `./dev/run-tests` [14]: to be sure, I reverted my changes, ran one more
time from ceb6281, and saw the same failure.

This was with java 1.7 and maven 3.2.3 [15]. In one final attempt to narrow
down the linux YarnClusterSuite failure, I ran `./dev/run-tests` on my mac,
from ceb6281, with java 1.7 (instead of 1.8, which the previous runs used),
and it passed [16], so the failure seems specific to my linux machine/arch.

At this point I believe that my changes don't break any tests (the
YarnClusterSuite failure on my linux presumably not being... "real"), and I
am ready to send out a PR. Whew!

However, reflecting on the 5 or 6 distinct failure-modes represented above:

- One of them (too many files open), is something I can (and did,
hopefully) fix once and for all. It cost me an ~hour this time (approximate
time of running ./dev/run-tests) and a few hours other times when I didn't
fully understand/fix it. It doesn't happen deterministically (why?), but
does happen somewhat frequently to people, having been discussed on the
user list multiple times [17] and on SO [18]. Maybe some note in the
documentation advising people to check their ulimit makes sense?
- One of them (unittest2 must be installed for python 2.6) was supposedly
fixed upstream of the commits I tested here; I don't know why I'm still
running into it. This cost me a few hours of running `./dev/run-tests`
multiple times to see if it was transient, plus some time researching and
working around it.
- The original BroadcastSuite failure cost me a few hours and went away
before I'd even run `mvn clean`.
- A new incarnation of the sbt-compiler-crash phenomenon cost me a few
hours of running `./dev/run-tests` in different ways before deciding that,
as usual, there was no way around it and that I'd need to run `mvn clean`
and start running tests from scratch.
- The YarnClusterSuite failures on my linux box have cost me hours of
trying to figure out whether they're my fault. I've seen them many times
over the past weeks/months, plus or minus other failures that have come and
gone, and was especially befuddled by them when I was seeing a disjoint set
of reproducible failures on my mac [19] (the triaging of which involved
dozens of runs of `./dev/run-tests`).

While I'm interested in digging into each of these issues, I also want to
discuss the frequency with which I've run into issues like these. This is
unfortunately not the first time in recent months that I've spent days
playing spurious-test-failure whack-a-mole with a 60-90min dev/run-tests
iteration time, which is no fun! So I am wondering/thinking:

- Do other people experience this level of flakiness from spark tests?
- Do other people bother running dev/run-tests locally, or just let Jenkins
do it during the CR process?
- Needing to run a full assembly post-clean just to continue running one
specific test case feels especially wasteful, and the failure output when
naively attempting to run a specific test without having built an assembly
jar is not always clear about what the issue is or how to fix it; even the
fact that certain tests require "building the world" is not something I
would have expected, and has cost me hours of confusion.
    - Should a person running spark tests assume that they must build an
assembly JAR before running anything?
    - Are there some proper "unit" tests that are actually self-contained /
able to be run without building an assembly jar?
    - Can we better document/demarcate which tests have which dependencies?
    - Is there something finer-grained than building an assembly JAR that
is sufficient in some cases?
        - If so, can we document that?
        - If not, can we move to a world of finer-grained dependencies for
some of these?
- Leaving all of these spurious failures aside, the process of assembling
and testing a new JAR is not a quick one (40 and 60 mins for me typically,
respectively). I would guess that there are dozens (hundreds?) of people
who build a Spark assembly from various ToTs on any given day, and who all
wait on the exact same compilation / assembly steps to occur. Expanding on
the recent work to publish nightly snapshots [20], can we do a better job
caching/sharing compilation artifacts at a more granular level (pre-built
assembly JARs at each SHA? pre-built JARs per-maven-module, per-SHA? more
granular maven modules, plus the previous two?), or otherwise save some of
the considerable amount of redundant compilation work that I had to do over
the course of my odyssey this weekend?

Ramping up on most projects involves some amount of supplementing the
documentation with trial and error to figure out what to run, which
"errors" are real errors and which can be ignored, etc., but navigating
that minefield on Spark has proved especially challenging and
time-consuming for me. Some of that comes directly from scala's relatively
slow compilation times and immature build-tooling ecosystem, but that is
the world we live in and it would be nice if Spark took the alleviation of
the resulting pain more seriously, as one of the more interesting and
well-known large scala projects around right now. The official
documentation around how to build different subsets of the codebase is
somewhat sparse [21], and there have been many mixed [22] accounts [23] on
this mailing list about preferred ways to build on mvn vs. sbt (none of
which has made it into official documentation, as far as I've seen).
Expecting new contributors to piece together all of this received
folk-wisdom about how to build/test in a sane way by trawling mailing list
archives seems suboptimal.

Thanks for reading, looking forward to hearing your ideas!

-Ryan

P.S. Is "best practice" for emailing this list to not incorporate any HTML
in the body? It seems like all of the archives I've seen strip it out, but
other people have used it and gmail displays it.


[1]
https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/raw/484c2fb8bc0efa0e39d142087eefa9c3d5292ea3/dev%20run-tests:%20fail
(57 mins)
[2]
https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/raw/ce264e469be3641f061eabd10beb1d71ac243991/mvn%20test:%20fail
(6 mins)
[3]
https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/raw/6bc76c67aeef9c57ddd9fb2ba260fb4189dbb927/mvn%20test%20case:%20pass%20test,%20fail%20subsequent%20compile
(4 mins)
[4]
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCUQFjAB&url=http%3A%2F%2Fapache-spark-user-list.1001560.n3.nabble.com%2Fscalac-crash-when-compiling-DataTypeConversions-scala-td17083.html&ei=aRF6VJrpNKr-iAKDgYGYBQ&usg=AFQjCNHjM9m__Hrumh-ecOsSE00-JkjKBQ&sig2=zDeSqOgs02AXJXj78w5I9g&bvm=bv.80642063,d.cGE&cad=rja
[5]
https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/raw/4ab0bd6e76d9fc5745eb4b45cdf13195d10efaa2/mvn%20test,%20post%20clean,%20need%20dependencies%20built
[6]
https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/raw/f4c7e6fc8c301f869b00598c7b541dac243fb51e/dev%20run-tests,%20post%20clean
(50 mins)
[7]
https://gist.github.com/ryan-williams/57f8bfc9328447fc5b97#file-dev-run-tests-failure-too-many-files-open-then-hang-L5260
(1hr)
[8] https://gist.github.com/ryan-williams/d0164194ad5de03f6e3f (1hr)
[9] https://issues.apache.org/jira/browse/SPARK-3867
[10] https://gist.github.com/ryan-williams/735adf543124c99647cc
[11] https://gist.github.com/ryan-williams/8d149bbcd0c6689ad564
[12]
https://gist.github.com/ryan-williams/07df5c583c9481fe1c14#file-gistfile1-txt-L853
(~90 mins)
[13]
https://gist.github.com/ryan-williams/718f6324af358819b496#file-gistfile1-txt-L852
(91 mins)
[14]
https://gist.github.com/ryan-williams/c06c1f4aa0b16f160965#file-gistfile1-txt-L854
[15] https://gist.github.com/ryan-williams/f8d410b5b9f082039c73
[16] https://gist.github.com/ryan-williams/2e94f55c9287938cf745
[17]
http://apache-spark-user-list.1001560.n3.nabble.com/quot-Too-many-open-files-quot-exception-on-reduceByKey-td2462.html
[18]
http://stackoverflow.com/questions/25707629/why-does-spark-job-fail-with-too-many-open-files
[19] https://issues.apache.org/jira/browse/SPARK-4002
[20] https://issues.apache.org/jira/browse/SPARK-4542
[21]
https://spark.apache.org/docs/latest/building-with-maven.html#spark-tests-in-maven
[22] https://www.mail-archive.com/dev@spark.apache.org/msg06443.html
[23]
http://mail-archives.apache.org/mod_mbox/spark-dev/201410.mbox/%3ccaohmdzeunhucr41b7krptewmn4cga_2tnpzrwqqb8reekok...@mail.gmail.com%3E

Spurious test failures, testing best practices

Reply via email to