thanks for the info, Matei and Brennon. I will try to switch my workflow to
using sbt. Other potential action items:

- currently the docs only contain information about building with maven,
and even then don't cover many important cases, as I described in my
previous email. If SBT is as much better as you've described then that
should be made much more obvious. Wasn't it the case recently that there
was only a page about building with SBT, and not one about building with
maven? Clearer messaging around this needs to exist in the documentation,
not just on the mailing list, imho.

- +1 to better distinguishing between unit and integration tests, having
separate scripts for each, improving documentation around common workflows,
expectations of brittleness with each kind of test, advisability of just
relying on Jenkins for certain kinds of tests to not waste too much time,
etc. Things like the compiler crash should be discussed in the
documentation, not just in the mailing list archives, if new contributors
are likely to run into them through no fault of their own.

- What is the algorithm you use to decide what tests you might have broken?
Can we codify it in some scripts that other people can use?



On Sun Nov 30 2014 at 4:06:41 PM Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> Hi Ryan,
>
> As a tip (and maybe this isn't documented well), I normally use SBT for
> development to avoid the slow build process, and use its interactive
> console to run only specific tests. The nice advantage is that SBT can keep
> the Scala compiler loaded and JITed across builds, making it faster to
> iterate. To use it, you can do the following:
>
> - Start the SBT interactive console with sbt/sbt
> - Build your assembly by running the "assembly" target in the assembly
> project: assembly/assembly
> - Run all the tests in one module: core/test
> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this
> also supports tab completion)
>
> Running all the tests does take a while, and I usually just rely on
> Jenkins for that once I've run the tests for the things I believed my patch
> could break. But this is because some of them are integration tests (e.g.
> DistributedSuite, which creates multi-process mini-clusters). Many of the
> individual suites run fast without requiring this, however, so you can pick
> the ones you want. Perhaps we should find a way to tag them so people  can
> do a "quick-test" that skips the integration ones.
>
> The assembly builds are annoying but they only take about a minute for me
> on a MacBook Pro with SBT warmed up. The assembly is actually only required
> for some of the "integration" tests (which launch new processes), but I'd
> recommend doing it all the time anyway since it would be very confusing to
> run those with an old assembly. The Scala compiler crash issue can also be
> a problem, but I don't see it very often with SBT. If it happens, I exit
> SBT and do sbt clean.
>
> Anyway, this is useful feedback and I think we should try to improve some
> of these suites, but hopefully you can also try the faster SBT process. At
> the end of the day, if we want integration tests, the whole test process
> will take an hour, but most of the developers I know leave that to Jenkins
> and only run individual tests locally before submitting a patch.
>
> Matei
>
>
> > On Nov 30, 2014, at 2:39 PM, Ryan Williams <
> ryan.blake.willi...@gmail.com> wrote:
> >
> > In the course of trying to make contributions to Spark, I have had a lot
> of
> > trouble running Spark's tests successfully. The main pain points I've
> > experienced are:
> >
> >    1) frequent, spurious test failures
> >    2) high latency of running tests
> >    3) difficulty running specific tests in an iterative fashion
> >
> > Here is an example series of failures that I encountered this weekend
> > (along with footnote links to the console output from each and
> > approximately how long each took):
> >
> > - `./dev/run-tests` [1]: failure in BroadcastSuite that I've not seen
> > before.
> > - `mvn '-Dsuites=*BroadcastSuite*' test` [2]: same failure.
> > - `mvn '-Dsuites=*BroadcastSuite* Unpersisting' test` [3]: BroadcastSuite
> > passed, but scala compiler crashed on the "catalyst" project.
> > - `mvn clean`: some attempts to run earlier commands (that previously
> > didn't crash the compiler) all result in the same compiler crash.
> Previous
> > discussion on this list implies this can only be solved by a `mvn clean`
> > [4].
> > - `mvn '-Dsuites=*BroadcastSuite*' test` [5]: immediately post-clean,
> > BroadcastSuite can't run because assembly is not built.
> > - `./dev/run-tests` again [6]: pyspark tests fail, some messages about
> > version mismatches and python 2.6. The machine this ran on has python
> 2.7,
> > so I don't know what that's about.
> > - `./dev/run-tests` again [7]: "too many open files" errors in several
> > tests. `ulimit -a` shows a maximum of 4864 open files. Apparently this is
> > not enough, but only some of the time? I increased it to 8192 and tried
> > again.
> > - `./dev/run-tests` again [8]: same pyspark errors as before. This seems
> to
> > be the issue from SPARK-3867 [9], which was supposedly fixed on October
> 14;
> > not sure how I'm seeing it now. In any case, switched to Python 2.6 and
> > installed unittest2, and python/run-tests seems to be unblocked.
> > - `./dev/run-tests` again [10]: finally passes!
> >
> > This was on a spark checkout at ceb6281 (ToT Friday), with a few trivial
> > changes added on (that I wanted to test before sending out a PR), on a
> > macbook running OSX Yosemite (10.10.1), java 1.8 and mvn 3.2.3 [11].
> >
> > Meanwhile, on a linux 2.6.32 / CentOS 6.4 machine, I tried similar
> commands
> > from the same repo state:
> >
> > - `./dev/run-tests` [12]: YarnClusterSuite failure.
> > - `./dev/run-tests` [13]: same YarnClusterSuite failure. I know I've seen
> > this one before on this machine and am guessing it actually occurs every
> > time.
> > - `./dev/run-tests` [14]: to be sure, I reverted my changes, ran one more
> > time from ceb6281, and saw the same failure.
> >
> > This was with java 1.7 and maven 3.2.3 [15]. In one final attempt to
> narrow
> > down the linux YarnClusterSuite failure, I ran `./dev/run-tests` on my
> mac,
> > from ceb6281, with java 1.7 (instead of 1.8, which the previous runs
> used),
> > and it passed [16], so the failure seems specific to my linux
> machine/arch.
> >
> > At this point I believe that my changes don't break any tests (the
> > YarnClusterSuite failure on my linux presumably not being... "real"),
> and I
> > am ready to send out a PR. Whew!
> >
> > However, reflecting on the 5 or 6 distinct failure-modes represented
> above:
> >
> > - One of them (too many files open), is something I can (and did,
> > hopefully) fix once and for all. It cost me an ~hour this time
> (approximate
> > time of running ./dev/run-tests) and a few hours other times when I
> didn't
> > fully understand/fix it. It doesn't happen deterministically (why?), but
> > does happen somewhat frequently to people, having been discussed on the
> > user list multiple times [17] and on SO [18]. Maybe some note in the
> > documentation advising people to check their ulimit makes sense?
> > - One of them (unittest2 must be installed for python 2.6) was supposedly
> > fixed upstream of the commits I tested here; I don't know why I'm still
> > running into it. This cost me a few hours of running `./dev/run-tests`
> > multiple times to see if it was transient, plus some time researching and
> > working around it.
> > - The original BroadcastSuite failure cost me a few hours and went away
> > before I'd even run `mvn clean`.
> > - A new incarnation of the sbt-compiler-crash phenomenon cost me a few
> > hours of running `./dev/run-tests` in different ways before deciding
> that,
> > as usual, there was no way around it and that I'd need to run `mvn clean`
> > and start running tests from scratch.
> > - The YarnClusterSuite failures on my linux box have cost me hours of
> > trying to figure out whether they're my fault. I've seen them many times
> > over the past weeks/months, plus or minus other failures that have come
> and
> > gone, and was especially befuddled by them when I was seeing a disjoint
> set
> > of reproducible failures on my mac [19] (the triaging of which involved
> > dozens of runs of `./dev/run-tests`).
> >
> > While I'm interested in digging into each of these issues, I also want to
> > discuss the frequency with which I've run into issues like these. This is
> > unfortunately not the first time in recent months that I've spent days
> > playing spurious-test-failure whack-a-mole with a 60-90min dev/run-tests
> > iteration time, which is no fun! So I am wondering/thinking:
> >
> > - Do other people experience this level of flakiness from spark tests?
> > - Do other people bother running dev/run-tests locally, or just let
> Jenkins
> > do it during the CR process?
> > - Needing to run a full assembly post-clean just to continue running one
> > specific test case feels especially wasteful, and the failure output when
> > naively attempting to run a specific test without having built an
> assembly
> > jar is not always clear about what the issue is or how to fix it; even
> the
> > fact that certain tests require "building the world" is not something I
> > would have expected, and has cost me hours of confusion.
> >    - Should a person running spark tests assume that they must build an
> > assembly JAR before running anything?
> >    - Are there some proper "unit" tests that are actually self-contained
> /
> > able to be run without building an assembly jar?
> >    - Can we better document/demarcate which tests have which
> dependencies?
> >    - Is there something finer-grained than building an assembly JAR that
> > is sufficient in some cases?
> >        - If so, can we document that?
> >        - If not, can we move to a world of finer-grained dependencies for
> > some of these?
> > - Leaving all of these spurious failures aside, the process of assembling
> > and testing a new JAR is not a quick one (40 and 60 mins for me
> typically,
> > respectively). I would guess that there are dozens (hundreds?) of people
> > who build a Spark assembly from various ToTs on any given day, and who
> all
> > wait on the exact same compilation / assembly steps to occur. Expanding
> on
> > the recent work to publish nightly snapshots [20], can we do a better job
> > caching/sharing compilation artifacts at a more granular level (pre-built
> > assembly JARs at each SHA? pre-built JARs per-maven-module, per-SHA? more
> > granular maven modules, plus the previous two?), or otherwise save some
> of
> > the considerable amount of redundant compilation work that I had to do
> over
> > the course of my odyssey this weekend?
> >
> > Ramping up on most projects involves some amount of supplementing the
> > documentation with trial and error to figure out what to run, which
> > "errors" are real errors and which can be ignored, etc., but navigating
> > that minefield on Spark has proved especially challenging and
> > time-consuming for me. Some of that comes directly from scala's
> relatively
> > slow compilation times and immature build-tooling ecosystem, but that is
> > the world we live in and it would be nice if Spark took the alleviation
> of
> > the resulting pain more seriously, as one of the more interesting and
> > well-known large scala projects around right now. The official
> > documentation around how to build different subsets of the codebase is
> > somewhat sparse [21], and there have been many mixed [22] accounts [23]
> on
> > this mailing list about preferred ways to build on mvn vs. sbt (none of
> > which has made it into official documentation, as far as I've seen).
> > Expecting new contributors to piece together all of this received
> > folk-wisdom about how to build/test in a sane way by trawling mailing
> list
> > archives seems suboptimal.
> >
> > Thanks for reading, looking forward to hearing your ideas!
> >
> > -Ryan
> >
> > P.S. Is "best practice" for emailing this list to not incorporate any
> HTML
> > in the body? It seems like all of the archives I've seen strip it out,
> but
> > other people have used it and gmail displays it.
> >
> >
> > [1]
> > https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/
> raw/484c2fb8bc0efa0e39d142087eefa9c3d5292ea3/dev%20run-tests:%20fail
> > (57 mins)
> > [2]
> > https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/
> raw/ce264e469be3641f061eabd10beb1d71ac243991/mvn%20test:%20fail
> > (6 mins)
> > [3]
> > https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/
> raw/6bc76c67aeef9c57ddd9fb2ba260fb4189dbb927/mvn%20test%20case:%
> 20pass%20test,%20fail%20subsequent%20compile
> > (4 mins)
> > [4]
> > https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&;
> cd=2&ved=0CCUQFjAB&url=http%3A%2F%2Fapache-spark-user-
> list.1001560.n3.nabble.com%2Fscalac-crash-when-compiling-
> DataTypeConversions-scala-td17083.html&ei=aRF6VJrpNKr-
> iAKDgYGYBQ&usg=AFQjCNHjM9m__Hrumh-ecOsSE00-JkjKBQ&sig2=
> zDeSqOgs02AXJXj78w5I9g&bvm=bv.80642063,d.cGE&cad=rja
> > [5]
> > https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/
> raw/4ab0bd6e76d9fc5745eb4b45cdf13195d10efaa2/mvn%20test,%20post%
> 20clean,%20need%20dependencies%20built
> > [6]
> > https://gist.githubusercontent.com/ryan-williams/8a162367c4dc157d2479/
> raw/f4c7e6fc8c301f869b00598c7b541dac243fb51e/dev%20run-tests,%
> 20post%20clean
> > (50 mins)
> > [7]
> > https://gist.github.com/ryan-williams/57f8bfc9328447fc5b97#
> file-dev-run-tests-failure-too-many-files-open-then-hang-L5260
> > (1hr)
> > [8] https://gist.github.com/ryan-williams/d0164194ad5de03f6e3f (1hr)
> > [9] https://issues.apache.org/jira/browse/SPARK-3867
> > [10] https://gist.github.com/ryan-williams/735adf543124c99647cc
> > [11] https://gist.github.com/ryan-williams/8d149bbcd0c6689ad564
> > [12]
> > https://gist.github.com/ryan-williams/07df5c583c9481fe1c14#
> file-gistfile1-txt-L853
> > (~90 mins)
> > [13]
> > https://gist.github.com/ryan-williams/718f6324af358819b496#
> file-gistfile1-txt-L852
> > (91 mins)
> > [14]
> > https://gist.github.com/ryan-williams/c06c1f4aa0b16f160965#
> file-gistfile1-txt-L854
> > [15] https://gist.github.com/ryan-williams/f8d410b5b9f082039c73
> > [16] https://gist.github.com/ryan-williams/2e94f55c9287938cf745
> > [17]
> > http://apache-spark-user-list.1001560.n3.nabble.com/quot-
> Too-many-open-files-quot-exception-on-reduceByKey-td2462.html
> > [18]
> > http://stackoverflow.com/questions/25707629/why-does-
> spark-job-fail-with-too-many-open-files
> > [19] https://issues.apache.org/jira/browse/SPARK-4002
> > [20] https://issues.apache.org/jira/browse/SPARK-4542
> > [21]
> > https://spark.apache.org/docs/latest/building-with-maven.
> html#spark-tests-in-maven
> > [22] https://www.mail-archive.com/dev@spark.apache.org/msg06443.html
> > [23]
> > http://mail-archives.apache.org/mod_mbox/spark-dev/201410.mbox/%
> 3ccaohmdzeunhucr41b7krptewmn4cga_2tnpzrwqqb8reekok...@mail.gmail.com%3E
>
>

Reply via email to