Re: Spurious test failures, testing best practices

Patrick Wendell Sun, 30 Nov 2014 21:03:17 -0800

Hi Ilya - you can just submit a pull request and the way we test them
is to run it through jenkins. You don't need to do anything special.


On Sun, Nov 30, 2014 at 8:57 PM, Ganelin, Ilya
<ilya.gane...@capitalone.com> wrote:
> Hi, Patrick - with regards to testing on Jenkins, is the process for this
> to submit a pull request for the branch or is there another interface we
> can use to submit a build to Jenkins for testing?
>
> On 11/30/14, 6:49 PM, "Patrick Wendell" <pwend...@gmail.com> wrote:
>
>>Hey Ryan,
>>
>>A few more things here. You should feel free to send patches to
>>Jenkins to test them, since this is the reference environment in which
>>we regularly run tests. This is the normal workflow for most
>>developers and we spend a lot of effort provisioning/maintaining a
>>very large jenkins cluster to allow developers access this resource. A
>>common development approach is to locally run tests that you've added
>>in a patch, then send it to jenkins for the full run, and then try to
>>debug locally if you see specific unanticipated test failures.
>>
>>One challenge we have is that given the proliferation of OS versions,
>>Java versions, Python versions, ulimits, etc. there is a combinatorial
>>number of environments in which tests could be run. It is very hard in
>>some cases to figure out post-hoc why a given test is not working in a
>>specific environment. I think a good solution here would be to use a
>>standardized docker container for running Spark tests and asking folks
>>to use that locally if they are trying to run all of the hundreds of
>>Spark tests.
>>
>>Another solution would be to mock out every system interaction in
>>Spark's tests including e.g. filesystem interactions to try and reduce
>>variance across environments. However, that seems difficult.
>>
>>As the number of developers of Spark increases, it's definitely a good
>>idea for us to invest in developer infrastructure including things
>>like snapshot releases, better documentation, etc. Thanks for bringing
>>this up as a pain point.
>>
>>- Patrick
>>
>>
>>On Sun, Nov 30, 2014 at 3:35 PM, Ryan Williams
>><ryan.blake.willi...@gmail.com> wrote:
>>> thanks for the info, Matei and Brennon. I will try to switch my
>>>workflow to
>>> using sbt. Other potential action items:
>>>
>>> - currently the docs only contain information about building with maven,
>>> and even then don't cover many important cases, as I described in my
>>> previous email. If SBT is as much better as you've described then that
>>> should be made much more obvious. Wasn't it the case recently that there
>>> was only a page about building with SBT, and not one about building with
>>> maven? Clearer messaging around this needs to exist in the
>>>documentation,
>>> not just on the mailing list, imho.
>>>
>>> - +1 to better distinguishing between unit and integration tests, having
>>> separate scripts for each, improving documentation around common
>>>workflows,
>>> expectations of brittleness with each kind of test, advisability of just
>>> relying on Jenkins for certain kinds of tests to not waste too much
>>>time,
>>> etc. Things like the compiler crash should be discussed in the
>>> documentation, not just in the mailing list archives, if new
>>>contributors
>>> are likely to run into them through no fault of their own.
>>>
>>> - What is the algorithm you use to decide what tests you might have
>>>broken?
>>> Can we codify it in some scripts that other people can use?
>>>
>>>
>>>
>>> On Sun Nov 30 2014 at 4:06:41 PM Matei Zaharia <matei.zaha...@gmail.com>
>>> wrote:
>>>
>>>> Hi Ryan,
>>>>
>>>> As a tip (and maybe this isn't documented well), I normally use SBT for
>>>> development to avoid the slow build process, and use its interactive
>>>> console to run only specific tests. The nice advantage is that SBT can
>>>>keep
>>>> the Scala compiler loaded and JITed across builds, making it faster to
>>>> iterate. To use it, you can do the following:
>>>>
>>>> - Start the SBT interactive console with sbt/sbt
>>>> - Build your assembly by running the "assembly" target in the assembly
>>>> project: assembly/assembly
>>>> - Run all the tests in one module: core/test
>>>> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
>>>>(this
>>>> also supports tab completion)
>>>>
>>>> Running all the tests does take a while, and I usually just rely on
>>>> Jenkins for that once I've run the tests for the things I believed my
>>>>patch
>>>> could break. But this is because some of them are integration tests
>>>>(e.g.
>>>> DistributedSuite, which creates multi-process mini-clusters). Many of
>>>>the
>>>> individual suites run fast without requiring this, however, so you can
>>>>pick
>>>> the ones you want. Perhaps we should find a way to tag them so people
>>>>can
>>>> do a "quick-test" that skips the integration ones.
>>>>
>>>> The assembly builds are annoying but they only take about a minute for
>>>>me
>>>> on a MacBook Pro with SBT warmed up. The assembly is actually only
>>>>required
>>>> for some of the "integration" tests (which launch new processes), but
>>>>I'd
>>>> recommend doing it all the time anyway since it would be very
>>>>confusing to
>>>> run those with an old assembly. The Scala compiler crash issue can
>>>>also be
>>>> a problem, but I don't see it very often with SBT. If it happens, I
>>>>exit
>>>> SBT and do sbt clean.
>>>>
>>>> Anyway, this is useful feedback and I think we should try to improve
>>>>some
>>>> of these suites, but hopefully you can also try the faster SBT
>>>>process. At
>>>> the end of the day, if we want integration tests, the whole test
>>>>process
>>>> will take an hour, but most of the developers I know leave that to
>>>>Jenkins
>>>> and only run individual tests locally before submitting a patch.
>>>>
>>>> Matei
>>>>
>>>>
>>>> > On Nov 30, 2014, at 2:39 PM, Ryan Williams <
>>>> ryan.blake.willi...@gmail.com> wrote:
>>>> >
>>>> > In the course of trying to make contributions to Spark, I have had a
>>>>lot
>>>> of
>>>> > trouble running Spark's tests successfully. The main pain points I've
>>>> > experienced are:
>>>> >
>>>> >    1) frequent, spurious test failures
>>>> >    2) high latency of running tests
>>>> >    3) difficulty running specific tests in an iterative fashion
>>>> >
>>>> > Here is an example series of failures that I encountered this weekend
>>>> > (along with footnote links to the console output from each and
>>>> > approximately how long each took):
>>>> >
>>>> > - `./dev/run-tests` [1]: failure in BroadcastSuite that I've not seen
>>>> > before.
>>>> > - `mvn '-Dsuites=*BroadcastSuite*' test` [2]: same failure.
>>>> > - `mvn '-Dsuites=*BroadcastSuite* Unpersisting' test` [3]:
>>>>BroadcastSuite
>>>> > passed, but scala compiler crashed on the "catalyst" project.
>>>> > - `mvn clean`: some attempts to run earlier commands (that previously
>>>> > didn't crash the compiler) all result in the same compiler crash.
>>>> Previous
>>>> > discussion on this list implies this can only be solved by a `mvn
>>>>clean`
>>>> > [4].
>>>> > - `mvn '-Dsuites=*BroadcastSuite*' test` [5]: immediately post-clean,
>>>> > BroadcastSuite can't run because assembly is not built.
>>>> > - `./dev/run-tests` again [6]: pyspark tests fail, some messages
>>>>about
>>>> > version mismatches and python 2.6. The machine this ran on has python
>>>> 2.7,
>>>> > so I don't know what that's about.
>>>> > - `./dev/run-tests` again [7]: "too many open files" errors in
>>>>several
>>>> > tests. `ulimit -a` shows a maximum of 4864 open files. Apparently
>>>>this is
>>>> > not enough, but only some of the time? I increased it to 8192 and
>>>>tried
>>>> > again.
>>>> > - `./dev/run-tests` again [8]: same pyspark errors as before. This
>>>>seems
>>>> to
>>>> > be the issue from SPARK-3867 [9], which was supposedly fixed on
>>>>October
>>>> 14;
>>>> > not sure how I'm seeing it now. In any case, switched to Python 2.6
>>>>and
>>>> > installed unittest2, and python/run-tests seems to be unblocked.
>>>> > - `./dev/run-tests` again [10]: finally passes!
>>>> >
>>>> > This was on a spark checkout at ceb6281 (ToT Friday), with a few
>>>>trivial
>>>> > changes added on (that I wanted to test before sending out a PR), on
>>>>a
>>>> > macbook running OSX Yosemite (10.10.1), java 1.8 and mvn 3.2.3 [11].
>>>> >
>>>> > Meanwhile, on a linux 2.6.32 / CentOS 6.4 machine, I tried similar
>>>> commands
>>>> > from the same repo state:
>>>> >
>>>> > - `./dev/run-tests` [12]: YarnClusterSuite failure.
>>>> > - `./dev/run-tests` [13]: same YarnClusterSuite failure. I know I've
>>>>seen
>>>> > this one before on this machine and am guessing it actually occurs
>>>>every
>>>> > time.
>>>> > - `./dev/run-tests` [14]: to be sure, I reverted my changes, ran one
>>>>more
>>>> > time from ceb6281, and saw the same failure.
>>>> >
>>>> > This was with java 1.7 and maven 3.2.3 [15]. In one final attempt to
>>>> narrow
>>>> > down the linux YarnClusterSuite failure, I ran `./dev/run-tests` on
>>>>my
>>>> mac,
>>>> > from ceb6281, with java 1.7 (instead of 1.8, which the previous runs
>>>> used),
>>>> > and it passed [16], so the failure seems specific to my linux
>>>> machine/arch.
>>>> >
>>>> > At this point I believe that my changes don't break any tests (the
>>>> > YarnClusterSuite failure on my linux presumably not being... "real"),
>>>> and I
>>>> > am ready to send out a PR. Whew!
>>>> >
>>>> > However, reflecting on the 5 or 6 distinct failure-modes represented
>>>> above:
>>>> >
>>>> > - One of them (too many files open), is something I can (and did,
>>>> > hopefully) fix once and for all. It cost me an ~hour this time
>>>> (approximate
>>>> > time of running ./dev/run-tests) and a few hours other times when I
>>>> didn't
>>>> > fully understand/fix it. It doesn't happen deterministically (why?),
>>>>but
>>>> > does happen somewhat frequently to people, having been discussed on
>>>>the
>>>> > user list multiple times [17] and on SO [18]. Maybe some note in the
>>>> > documentation advising people to check their ulimit makes sense?
>>>> > - One of them (unittest2 must be installed for python 2.6) was
>>>>supposedly
>>>> > fixed upstream of the commits I tested here; I don't know why I'm
>>>>still
>>>> > running into it. This cost me a few hours of running
>>>>`./dev/run-tests`
>>>> > multiple times to see if it was transient, plus some time
>>>>researching and
>>>> > working around it.
>>>> > - The original BroadcastSuite failure cost me a few hours and went
>>>>away
>>>> > before I'd even run `mvn clean`.
>>>> > - A new incarnation of the sbt-compiler-crash phenomenon cost me a
>>>>few
>>>> > hours of running `./dev/run-tests` in different ways before deciding
>>>> that,
>>>> > as usual, there was no way around it and that I'd need to run `mvn
>>>>clean`
>>>> > and start running tests from scratch.
>>>> > - The YarnClusterSuite failures on my linux box have cost me hours of
>>>> > trying to figure out whether they're my fault. I've seen them many
>>>>times
>>>> > over the past weeks/months, plus or minus other failures that have
>>>>come
>>>> and
>>>> > gone, and was especially befuddled by them when I was seeing a
>>>>disjoint
>>>> set
>>>> > of reproducible failures on my mac [19] (the triaging of which
>>>>involved
>>>> > dozens of runs of `./dev/run-tests`).
>>>> >
>>>> > While I'm interested in digging into each of these issues, I also
>>>>want to
>>>> > discuss the frequency with which I've run into issues like these.
>>>>This is
>>>> > unfortunately not the first time in recent months that I've spent
>>>>days
>>>> > playing spurious-test-failure whack-a-mole with a 60-90min
>>>>dev/run-tests
>>>> > iteration time, which is no fun! So I am wondering/thinking:
>>>> >
>>>> > - Do other people experience this level of flakiness from spark
>>>>tests?
>>>> > - Do other people bother running dev/run-tests locally, or just let
>>>> Jenkins
>>>> > do it during the CR process?
>>>> > - Needing to run a full assembly post-clean just to continue running
>>>>one
>>>> > specific test case feels especially wasteful, and the failure output
>>>>when
>>>> > naively attempting to run a specific test without having built an
>>>> assembly
>>>> > jar is not always clear about what the issue is or how to fix it;
>>>>even
>>>> the
>>>> > fact that certain tests require "building the world" is not
>>>>something I
>>>> > would have expected, and has cost me hours of confusion.
>>>> >    - Should a person running spark tests assume that they must build
>>>>an
>>>> > assembly JAR before running anything?
>>>> >    - Are there some proper "unit" tests that are actually
>>>>self-contained
>>>> /
>>>> > able to be run without building an assembly jar?
>>>> >    - Can we better document/demarcate which tests have which
>>>> dependencies?
>>>> >    - Is there something finer-grained than building an assembly JAR
>>>>that
>>>> > is sufficient in some cases?
>>>> >        - If so, can we document that?
>>>> >        - If not, can we move to a world of finer-grained
>>>>dependencies for
>>>> > some of these?
>>>> > - Leaving all of these spurious failures aside, the process of
>>>>assembling
>>>> > and testing a new JAR is not a quick one (40 and 60 mins for me
>>>> typically,
>>>> > respectively). I would guess that there are dozens (hundreds?) of
>>>>people
>>>> > who build a Spark assembly from various ToTs on any given day, and
>>>>who
>>>> all
>>>> > wait on the exact same compilation / assembly steps to occur.
>>>>Expanding
>>>> on
>>>> > the recent work to publish nightly snapshots [20], can we do a
>>>>better job
>>>> > caching/sharing compilation artifacts at a more granular level
>>>>(pre-built
>>>> > assembly JARs at each SHA? pre-built JARs per-maven-module, per-SHA?
>>>>more
>>>> > granular maven modules, plus the previous two?), or otherwise save
>>>>some
>>>> of
>>>> > the considerable amount of redundant compilation work that I had to
>>>>do
>>>> over
>>>> > the course of my odyssey this weekend?
>>>> >
>>>> > Ramping up on most projects involves some amount of supplementing the
>>>> > documentation with trial and error to figure out what to run, which
>>>> > "errors" are real errors and which can be ignored, etc., but
>>>>navigating
>>>> > that minefield on Spark has proved especially challenging and
>>>> > time-consuming for me. Some of that comes directly from scala's
>>>> relatively
>>>> > slow compilation times and immature build-tooling ecosystem, but
>>>>that is
>>>> > the world we live in and it would be nice if Spark took the
>>>>alleviation
>>>> of
>>>> > the resulting pain more seriously, as one of the more interesting and
>>>> > well-known large scala projects around right now. The official
>>>> > documentation around how to build different subsets of the codebase
>>>>is
>>>> > somewhat sparse [21], and there have been many mixed [22] accounts
>>>>[23]
>>>> on
>>>> > this mailing list about preferred ways to build on mvn vs. sbt (none
>>>>of
>>>> > which has made it into official documentation, as far as I've seen).
>>>> > Expecting new contributors to piece together all of this received
>>>> > folk-wisdom about how to build/test in a sane way by trawling mailing
>>>> list
>>>> > archives seems suboptimal.
>>>> >
>>>> > Thanks for reading, looking forward to hearing your ideas!
>>>> >
>>>> > -Ryan
>>>> >
>>>> > P.S. Is "best practice" for emailing this list to not incorporate any
>>>> HTML
>>>> > in the body? It seems like all of the archives I've seen strip it
>>>>out,
>>>> but
>>>> > other people have used it and gmail displays it.
>>>> >
>>>> >
>>>> > [1]
>>>> >
>>>>https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/
>>>> raw/484c2fb8bc0efa0e39d142087eefa9c3d5292ea3/dev%20run-tests:%20fail
>>>> > (57 mins)
>>>> > [2]
>>>> >
>>>>https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/
>>>> raw/ce264e469be3641f061eabd10beb1d71ac243991/mvn%20test:%20fail
>>>> > (6 mins)
>>>> > [3]
>>>> >
>>>>https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/
>>>> raw/6bc76c67aeef9c57ddd9fb2ba260fb4189dbb927/mvn%20test%20case:%
>>>> 20pass%20test,%20fail%20subsequent%20compile
>>>> > (4 mins)
>>>> > [4]
>>>> > https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&;
>>>> cd=2&ved=0CCUQFjAB&url=http%3A%2F%2Fapache-spark-user-
>>>> list.1001560.n3.nabble.com%2Fscalac-crash-when-compiling-
>>>> DataTypeConversions-scala-td17083.html&ei=aRF6VJrpNKr-
>>>> iAKDgYGYBQ&usg=AFQjCNHjM9m__Hrumh-ecOsSE00-JkjKBQ&sig2=
>>>> zDeSqOgs02AXJXj78w5I9g&bvm=bv.80642063,d.cGE&cad=rja
>>>> > [5]
>>>> >
>>>>https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/
>>>> raw/4ab0bd6e76d9fc5745eb4b45cdf13195d10efaa2/mvn%20test,%20post%
>>>> 20clean,%20need%20dependencies%20built
>>>> > [6]
>>>> >
>>>>https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/
>>>> raw/f4c7e6fc8c301f869b00598c7b541dac243fb51e/dev%20run-tests,%
>>>> 20post%20clean
>>>> > (50 mins)
>>>> > [7]
>>>> > https://gist.github.com/ryan-williams/57f8bfc9328447fc5b97#
>>>> file-dev-run-tests-failure-too-many-files-open-then-hang-L5260
>>>> > (1hr)
>>>> > [8] https://gist.github.com/ryan-williams/d0164194ad5de03f6e3f (1hr)
>>>> > [9] https://issues.apache.org/jira/browse/SPARK-3867
>>>> > [10] https://gist.github.com/ryan-williams/735adf543124c99647cc
>>>> > [11] https://gist.github.com/ryan-williams/8d149bbcd0c6689ad564
>>>> > [12]
>>>> > https://gist.github.com/ryan-williams/07df5c583c9481fe1c14#
>>>> file-gistfile1-txt-L853
>>>> > (~90 mins)
>>>> > [13]
>>>> > https://gist.github.com/ryan-williams/718f6324af358819b496#
>>>> file-gistfile1-txt-L852
>>>> > (91 mins)
>>>> > [14]
>>>> > https://gist.github.com/ryan-williams/c06c1f4aa0b16f160965#
>>>> file-gistfile1-txt-L854
>>>> > [15] https://gist.github.com/ryan-williams/f8d410b5b9f082039c73
>>>> > [16] https://gist.github.com/ryan-williams/2e94f55c9287938cf745
>>>> > [17]
>>>> > http://apache-spark-user-list.1001560.n3.nabble.com/quot-
>>>> Too-many-open-files-quot-exception-on-reduceByKey-td2462.html
>>>> > [18]
>>>> > http://stackoverflow.com/questions/25707629/why-does-
>>>> spark-job-fail-with-too-many-open-files
>>>> > [19] https://issues.apache.org/jira/browse/SPARK-4002
>>>> > [20] https://issues.apache.org/jira/browse/SPARK-4542
>>>> > [21]
>>>> > https://spark.apache.org/docs/latest/building-with-maven.
>>>> html#spark-tests-in-maven
>>>> > [22] https://www.mail-archive.com/dev@spark.apache.org/msg06443.html
>>>> > [23]
>>>> > http://mail-archives.apache.org/mod_mbox/spark-dev/201410.mbox/%
>>>> 3ccaohmdzeunhucr41b7krptewmn4cga_2tnpzrwqqb8reekok...@mail.gmail.com%3E
>>>>
>>>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>For additional commands, e-mail: dev-h...@spark.apache.org
>>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates. The information transmitted herewith is 
> intended only for use by the individual or entity to which it is addressed.  
> If the reader of this message is not the intended recipient, you are hereby 
> notified that any review, retransmission, dissemination, distribution, 
> copying or other use of, or taking of any action in reliance upon this 
> information is strictly prohibited. If you have received this communication 
> in error, please contact the sender and delete the material from your 
> computer.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Spurious test failures, testing best practices

Reply via email to