Re: Spurious test failures, testing best practices

2014-12-04 Thread Imran Rashid
I agree we should separate out the integration tests so it's easy for dev
to just run the other fast tests locally.  I opened a jira for it

https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4746
On Nov 30, 2014 3:08 PM, "Matei Zaharia"  wrote:

> Hi Ryan,
>
> As a tip (and maybe this isn't documented well), I normally use SBT for
> development to avoid the slow build process, and use its interactive
> console to run only specific tests. The nice advantage is that SBT can keep
> the Scala compiler loaded and JITed across builds, making it faster to
> iterate. To use it, you can do the following:
>
> - Start the SBT interactive console with sbt/sbt
> - Build your assembly by running the "assembly" target in the assembly
> project: assembly/assembly
> - Run all the tests in one module: core/test
> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this
> also supports tab completion)
>
> Running all the tests does take a while, and I usually just rely on
> Jenkins for that once I've run the tests for the things I believed my patch
> could break. But this is because some of them are integration tests (e.g.
> DistributedSuite, which creates multi-process mini-clusters). Many of the
> individual suites run fast without requiring this, however, so you can pick
> the ones you want. Perhaps we should find a way to tag them so people  can
> do a "quick-test" that skips the integration ones.
>
> The assembly builds are annoying but they only take about a minute for me
> on a MacBook Pro with SBT warmed up. The assembly is actually only required
> for some of the "integration" tests (which launch new processes), but I'd
> recommend doing it all the time anyway since it would be very confusing to
> run those with an old assembly. The Scala compiler crash issue can also be
> a problem, but I don't see it very often with SBT. If it happens, I exit
> SBT and do sbt clean.
>
> Anyway, this is useful feedback and I think we should try to improve some
> of these suites, but hopefully you can also try the faster SBT process. At
> the end of the day, if we want integration tests, the whole test process
> will take an hour, but most of the developers I know leave that to Jenkins
> and only run individual tests locally before submitting a patch.
>
> Matei
>
>
> > On Nov 30, 2014, at 2:39 PM, Ryan Williams <
> ryan.blake.willi...@gmail.com> wrote:
> >
> > In the course of trying to make contributions to Spark, I have had a lot
> of
> > trouble running Spark's tests successfully. The main pain points I've
> > experienced are:
> >
> >1) frequent, spurious test failures
> >2) high latency of running tests
> >3) difficulty running specific tests in an iterative fashion
> >
> > Here is an example series of failures that I encountered this weekend
> > (along with footnote links to the console output from each and
> > approximately how long each took):
> >
> > - `./dev/run-tests` [1]: failure in BroadcastSuite that I've not seen
> > before.
> > - `mvn '-Dsuites=*BroadcastSuite*' test` [2]: same failure.
> > - `mvn '-Dsuites=*BroadcastSuite* Unpersisting' test` [3]: BroadcastSuite
> > passed, but scala compiler crashed on the "catalyst" project.
> > - `mvn clean`: some attempts to run earlier commands (that previously
> > didn't crash the compiler) all result in the same compiler crash.
> Previous
> > discussion on this list implies this can only be solved by a `mvn clean`
> > [4].
> > - `mvn '-Dsuites=*BroadcastSuite*' test` [5]: immediately post-clean,
> > BroadcastSuite can't run because assembly is not built.
> > - `./dev/run-tests` again [6]: pyspark tests fail, some messages about
> > version mismatches and python 2.6. The machine this ran on has python
> 2.7,
> > so I don't know what that's about.
> > - `./dev/run-tests` again [7]: "too many open files" errors in several
> > tests. `ulimit -a` shows a maximum of 4864 open files. Apparently this is
> > not enough, but only some of the time? I increased it to 8192 and tried
> > again.
> > - `./dev/run-tests` again [8]: same pyspark errors as before. This seems
> to
> > be the issue from SPARK-3867 [9], which was supposedly fixed on October
> 14;
> > not sure how I'm seeing it now. In any case, switched to Python 2.6 and
> > installed unittest2, and python/run-tests seems to be unblocked.
> > - `./dev/run-tests` again [10]: finally passes!
> >
> > This was on a spark checkout at ceb6281 (ToT Friday), with a few trivial
> > changes added on (that I wanted to test before sending out a PR), on a
> > macbook running OSX Yosemite (10.10.1), java 1.8 and mvn 3.2.3 [11].
> >
> > Meanwhile, on a linux 2.6.32 / CentOS 6.4 machine, I tried similar
> commands
> > from the same repo state:
> >
> > - `./dev/run-tests` [12]: YarnClusterSuite failure.
> > - `./dev/run-tests` [13]: same YarnClusterSuite failure. I know I've seen
> > this one before on this machine and am guessing it actually occurs every
> > time.
> > - `./dev/run-tests` [14]: to be su

Re: Spurious test failures, testing best practices

2014-12-04 Thread Ryan Williams
Thanks Marcelo, "this is just how Maven works (unfortunately)" answers my
question.

Another related question: I tried to use `mvn scala:cc` and discovered that
it only seems to work scan src/main and src/test directories (according to its
docs ),
and so can only be run from within submodules, not from the root directory.

I'll add a note about this to building-spark.html unless there is a way to
do it for all modules / from the root directory that I've missed. Let me
know!




On Tue Dec 02 2014 at 5:49:58 PM Marcelo Vanzin  wrote:

> On Tue, Dec 2, 2014 at 4:40 PM, Ryan Williams
>  wrote:
> >> But you only need to compile the others once.
> >
> > once... every time I rebase off master, or am obliged to `mvn clean` by
> some
> > other build-correctness bug, as I said before. In my experience this
> works
> > out to a few times per week.
>
> No, you only need to do it something upstream from core changed (i.e.,
> spark-parent, network/common or network/shuffle) in an incompatible
> way. Otherwise, you can rebase and just recompile / retest core,
> without having to install everything else. I do this kind of thing all
> the time. If you have to do "mvn clean" often you're probably doing
> something wrong somewhere else.
>
> I understand where you're coming from, but the way you're thinking is
> just not how maven works. I too find annoying that maven requires lots
> of things to be "installed" before you can use them, when they're all
> part of the same project. But well, that's the way things are.
>
> --
> Marcelo
>


Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 4:40 PM, Ryan Williams
 wrote:
>> But you only need to compile the others once.
>
> once... every time I rebase off master, or am obliged to `mvn clean` by some
> other build-correctness bug, as I said before. In my experience this works
> out to a few times per week.

No, you only need to do it something upstream from core changed (i.e.,
spark-parent, network/common or network/shuffle) in an incompatible
way. Otherwise, you can rebase and just recompile / retest core,
without having to install everything else. I do this kind of thing all
the time. If you have to do "mvn clean" often you're probably doing
something wrong somewhere else.

I understand where you're coming from, but the way you're thinking is
just not how maven works. I too find annoying that maven requires lots
of things to be "installed" before you can use them, when they're all
part of the same project. But well, that's the way things are.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spurious test failures, testing best practices

2014-12-02 Thread Ryan Williams
On Tue Dec 02 2014 at 4:46:20 PM Marcelo Vanzin  wrote:

> On Tue, Dec 2, 2014 at 3:39 PM, Ryan Williams
>  wrote:
> > Marcelo: by my count, there are 19 maven modules in the codebase. I am
> > typically only concerned with "core" (and therefore its two dependencies
> as
> > well, `network/{shuffle,common}`).
>
> But you only need to compile the others once.


once... every time I rebase off master, or am obliged to `mvn clean` by
some other build-correctness bug, as I said before. In my experience this
works out to a few times per week.


> Once you've established
> the baseline, you can just compile / test "core" to your heart's
> desire.


I understand that this is a workflow that does what I want as a side effect
of doing 3-5x more work (depending whether you count [number of modules
built] or [lines of scala/java compiled]), none of the extra work being
useful to me (more on that below).


> Core tests won't even run until you build the assembly anyway,
> since some of them require the assembly to be present.


The tests you refer to are exactly the ones that I'd like to let Jenkins
run from here on out, per advice I was given elsewhere in this thread and
due to the myriad unpleasantries I've encountered in trying to run them
myself.


>
> Also, even if you work in core - I'd say especially if you work in
> core - you should still, at some point, compile and test everything
> else that depends on it.
>

Last response applies.


>
> So, do this ONCE:
>

again, s/ONCE/several times a week/, in my experience.


>
>   mvn install -DskipTests
>
> Then do this as many times as you want:
>
>   mvn -pl spark-core_2.10 something
>
> That doesn't seem too bad to me.

(Be aware of the "assembly" comment
> above, since testing spark-core means you may have to rebuild the
> assembly from time to time, if your changes affect those tests.)
>
> > re: Marcelo's comment about "missing the 'spark-parent' project", I saw
> that
> > error message too and tried to ascertain what it could mean. Why would
> > `network/shuffle` need something from the parent project?
>
> The "spark-parent" project is the main pom that defines dependencies
> and their version, along with lots of build plugins and
> configurations. It's needed by all modules to compile correctly.
>

- I understand the parent POM has that information.

- I don't understand why Maven would feel that it is unable to compile the
`network/shuffle` module without having first compiled, packaged, and
installed 17 modules (19 minus `network/shuffle` and its dependency
`network/common`) that are not transitive dependencies of `network/shuffle`.

- I am trying to understand whether my failure to get Maven to compile
`network/shuffle` stems from my not knowing the correct incantation to feed
to Maven or from Maven's having a different (and seemingly worse) model for
how it handles module dependencies than I expected.



>
> --
> Marcelo
>


Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 3:39 PM, Ryan Williams
 wrote:
> Marcelo: by my count, there are 19 maven modules in the codebase. I am
> typically only concerned with "core" (and therefore its two dependencies as
> well, `network/{shuffle,common}`).

But you only need to compile the others once. Once you've established
the baseline, you can just compile / test "core" to your heart's
desire. Core tests won't even run until you build the assembly anyway,
since some of them require the assembly to be present.

Also, even if you work in core - I'd say especially if you work in
core - you should still, at some point, compile and test everything
else that depends on it.

So, do this ONCE:

  mvn install -DskipTests

Then do this as many times as you want:

  mvn -pl spark-core_2.10 something

That doesn't seem too bad to me. (Be aware of the "assembly" comment
above, since testing spark-core means you may have to rebuild the
assembly from time to time, if your changes affect those tests.)

> re: Marcelo's comment about "missing the 'spark-parent' project", I saw that
> error message too and tried to ascertain what it could mean. Why would
> `network/shuffle` need something from the parent project?

The "spark-parent" project is the main pom that defines dependencies
and their version, along with lots of build plugins and
configurations. It's needed by all modules to compile correctly.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spurious test failures, testing best practices

2014-12-02 Thread Ryan Williams
Marcelo: by my count, there are 19 maven modules in the codebase. I am
typically only concerned with "core" (and therefore its two dependencies as
well, `network/{shuffle,common}`).

The `mvn package` workflow (and its sbt equivalent) that most people
apparently use involves (for me) compiling+packaging 16 other modules that
I don't care about; I pay this cost whenever I rebase off of master or
encounter the sbt-compiler-crash bug, among other possible scenarios.

Compiling one module (after building/installing its dependencies) seems
like the sort of thing that should be possible, and I don't see why my
previously-documented attempt is failing.

re: Marcelo's comment about "missing the 'spark-parent' project", I saw
that error message too and tried to ascertain what it could mean. Why would
`network/shuffle` need something from the parent project? AFAICT
`network/common` has the same references to the parent project as
`network/shuffle` (namely just a  block in its POM), and yet I can
`mvn install -pl` the former but not the latter. Why would this be? One
difference is that `network/shuffle` has a dependency on another module,
while `network/common` does not.

Does Maven not let you build modules that depend on *any* other modules
without building *all* modules, or is there a way to do this that we've not
found yet?

Patrick: per my response to Marcelo above, I am trying to avoid having to
compile and package a bunch of stuff I am not using, which both `mvn
package` and `mvn install` on the parent project do.





On Tue Dec 02 2014 at 3:45:48 PM Marcelo Vanzin  wrote:

> On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams
>  wrote:
> > Following on Mark's Maven examples, here is another related issue I'm
> > having:
> >
> > I'd like to compile just the `core` module after a `mvn clean`, without
> > building an assembly JAR first. Is this possible?
>
> Out of curiosity, may I ask why? What's the problem with running "mvn
> install -DskipTests" first (or "package" instead of "install",
> although I generally do the latter)?
>
> You can probably do what you want if you manually build / install all
> the needed dependencies first; you found two, but it seems you're also
> missing the "spark-parent" project (which is the top-level pom). That
> sounds like a lot of trouble though, for not any gains that I can
> see... after the first build you should be able to do what you want
> easily.
>
> --
> Marcelo
>


Re: Spurious test failures, testing best practices

2014-12-02 Thread Patrick Wendell
Hey Ryan,

What if you run a single "mvn install" to install all libraries
locally - then can you "mvn compile -pl core"? I think this may be the
only way to make it work.

- Patrick

On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams
 wrote:
> Following on Mark's Maven examples, here is another related issue I'm
> having:
>
> I'd like to compile just the `core` module after a `mvn clean`, without
> building an assembly JAR first. Is this possible?
>
> Attempting to do it myself, the steps I performed were:
>
> - `mvn compile -pl core`: fails because `core` depends on `network/common`
> and `network/shuffle`, neither of which is installed in my local maven
> cache (and which don't exist in central Maven repositories, I guess? I
> thought Spark is publishing snapshot releases?)
>
> - `network/shuffle` also depends on `network/common`, so I'll `mvn install`
> the latter first: `mvn install -DskipTests -pl network/common`. That
> succeeds, and I see a newly built 1.3.0-SNAPSHOT jar in my local maven
> repository.
>
> - However, `mvn install -DskipTests -pl network/shuffle` subsequently
> fails, seemingly due to not finding network/core. Here's
>  a sample full
> output from running `mvn install -X -U -DskipTests -pl network/shuffle`
> from such a state (the -U was to get around a previous failure based on
> having cached a failed lookup of network-common-1.3.0-SNAPSHOT).
>
> - Thinking maven might be special-casing "-SNAPSHOT" versions, I tried
> replacing "1.3.0-SNAPSHOT" with "1.3.0.1" globally and repeating these
> steps, but the error seems to be the same
> .
>
> Any ideas?
>
> Thanks,
>
> -Ryan
>
> On Sun Nov 30 2014 at 6:37:28 PM Mark Hamstra 
> wrote:
>
>> >
>> > - Start the SBT interactive console with sbt/sbt
>> > - Build your assembly by running the "assembly" target in the assembly
>> > project: assembly/assembly
>> > - Run all the tests in one module: core/test
>> > - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
>> (this
>> > also supports tab completion)
>>
>>
>> The equivalent using Maven:
>>
>> - Start zinc
>> - Build your assembly using the mvn "package" or "install" target
>> ("install" is actually the equivalent of SBT's "publishLocal") -- this step
>> is the first step in
>> http://spark.apache.org/docs/latest/building-with-maven.
>> html#spark-tests-in-maven
>> - Run all the tests in one module: mvn -pl core test
>> - Run a specific suite: mvn -pl core
>> -DwildcardSuites=org.apache.spark.rdd.RDDSuite test (the -pl option isn't
>> strictly necessary if you don't mind waiting for Maven to scan through all
>> the other sub-projects only to do nothing; and, of course, it needs to be
>> something other than "core" if the test you want to run is in another
>> sub-project.)
>>
>> You also typically want to carry along in each subsequent step any relevant
>> command line options you added in the "package"/"install" step.
>>
>> On Sun, Nov 30, 2014 at 3:06 PM, Matei Zaharia 
>> wrote:
>>
>> > Hi Ryan,
>> >
>> > As a tip (and maybe this isn't documented well), I normally use SBT for
>> > development to avoid the slow build process, and use its interactive
>> > console to run only specific tests. The nice advantage is that SBT can
>> keep
>> > the Scala compiler loaded and JITed across builds, making it faster to
>> > iterate. To use it, you can do the following:
>> >
>> > - Start the SBT interactive console with sbt/sbt
>> > - Build your assembly by running the "assembly" target in the assembly
>> > project: assembly/assembly
>> > - Run all the tests in one module: core/test
>> > - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
>> (this
>> > also supports tab completion)
>> >
>> > Running all the tests does take a while, and I usually just rely on
>> > Jenkins for that once I've run the tests for the things I believed my
>> patch
>> > could break. But this is because some of them are integration tests (e.g.
>> > DistributedSuite, which creates multi-process mini-clusters). Many of the
>> > individual suites run fast without requiring this, however, so you can
>> pick
>> > the ones you want. Perhaps we should find a way to tag them so people
>> can
>> > do a "quick-test" that skips the integration ones.
>> >
>> > The assembly builds are annoying but they only take about a minute for me
>> > on a MacBook Pro with SBT warmed up. The assembly is actually only
>> required
>> > for some of the "integration" tests (which launch new processes), but I'd
>> > recommend doing it all the time anyway since it would be very confusing
>> to
>> > run those with an old assembly. The Scala compiler crash issue can also
>> be
>> > a problem, but I don't see it very often with SBT. If it happens, I exit
>> > SBT and do sbt clean.
>> >
>> > Anyway, this is useful feedback and I think we should try to improve some
>> > of these suites, but hopefully you can also try

Re: Spurious test failures, testing best practices

2014-12-02 Thread Marcelo Vanzin
On Tue, Dec 2, 2014 at 2:40 PM, Ryan Williams
 wrote:
> Following on Mark's Maven examples, here is another related issue I'm
> having:
>
> I'd like to compile just the `core` module after a `mvn clean`, without
> building an assembly JAR first. Is this possible?

Out of curiosity, may I ask why? What's the problem with running "mvn
install -DskipTests" first (or "package" instead of "install",
although I generally do the latter)?

You can probably do what you want if you manually build / install all
the needed dependencies first; you found two, but it seems you're also
missing the "spark-parent" project (which is the top-level pom). That
sounds like a lot of trouble though, for not any gains that I can
see... after the first build you should be able to do what you want
easily.

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spurious test failures, testing best practices

2014-12-02 Thread Ryan Williams
Following on Mark's Maven examples, here is another related issue I'm
having:

I'd like to compile just the `core` module after a `mvn clean`, without
building an assembly JAR first. Is this possible?

Attempting to do it myself, the steps I performed were:

- `mvn compile -pl core`: fails because `core` depends on `network/common`
and `network/shuffle`, neither of which is installed in my local maven
cache (and which don't exist in central Maven repositories, I guess? I
thought Spark is publishing snapshot releases?)

- `network/shuffle` also depends on `network/common`, so I'll `mvn install`
the latter first: `mvn install -DskipTests -pl network/common`. That
succeeds, and I see a newly built 1.3.0-SNAPSHOT jar in my local maven
repository.

- However, `mvn install -DskipTests -pl network/shuffle` subsequently
fails, seemingly due to not finding network/core. Here's
 a sample full
output from running `mvn install -X -U -DskipTests -pl network/shuffle`
from such a state (the -U was to get around a previous failure based on
having cached a failed lookup of network-common-1.3.0-SNAPSHOT).

- Thinking maven might be special-casing "-SNAPSHOT" versions, I tried
replacing "1.3.0-SNAPSHOT" with "1.3.0.1" globally and repeating these
steps, but the error seems to be the same
.

Any ideas?

Thanks,

-Ryan

On Sun Nov 30 2014 at 6:37:28 PM Mark Hamstra 
wrote:

> >
> > - Start the SBT interactive console with sbt/sbt
> > - Build your assembly by running the "assembly" target in the assembly
> > project: assembly/assembly
> > - Run all the tests in one module: core/test
> > - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
> (this
> > also supports tab completion)
>
>
> The equivalent using Maven:
>
> - Start zinc
> - Build your assembly using the mvn "package" or "install" target
> ("install" is actually the equivalent of SBT's "publishLocal") -- this step
> is the first step in
> http://spark.apache.org/docs/latest/building-with-maven.
> html#spark-tests-in-maven
> - Run all the tests in one module: mvn -pl core test
> - Run a specific suite: mvn -pl core
> -DwildcardSuites=org.apache.spark.rdd.RDDSuite test (the -pl option isn't
> strictly necessary if you don't mind waiting for Maven to scan through all
> the other sub-projects only to do nothing; and, of course, it needs to be
> something other than "core" if the test you want to run is in another
> sub-project.)
>
> You also typically want to carry along in each subsequent step any relevant
> command line options you added in the "package"/"install" step.
>
> On Sun, Nov 30, 2014 at 3:06 PM, Matei Zaharia 
> wrote:
>
> > Hi Ryan,
> >
> > As a tip (and maybe this isn't documented well), I normally use SBT for
> > development to avoid the slow build process, and use its interactive
> > console to run only specific tests. The nice advantage is that SBT can
> keep
> > the Scala compiler loaded and JITed across builds, making it faster to
> > iterate. To use it, you can do the following:
> >
> > - Start the SBT interactive console with sbt/sbt
> > - Build your assembly by running the "assembly" target in the assembly
> > project: assembly/assembly
> > - Run all the tests in one module: core/test
> > - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
> (this
> > also supports tab completion)
> >
> > Running all the tests does take a while, and I usually just rely on
> > Jenkins for that once I've run the tests for the things I believed my
> patch
> > could break. But this is because some of them are integration tests (e.g.
> > DistributedSuite, which creates multi-process mini-clusters). Many of the
> > individual suites run fast without requiring this, however, so you can
> pick
> > the ones you want. Perhaps we should find a way to tag them so people
> can
> > do a "quick-test" that skips the integration ones.
> >
> > The assembly builds are annoying but they only take about a minute for me
> > on a MacBook Pro with SBT warmed up. The assembly is actually only
> required
> > for some of the "integration" tests (which launch new processes), but I'd
> > recommend doing it all the time anyway since it would be very confusing
> to
> > run those with an old assembly. The Scala compiler crash issue can also
> be
> > a problem, but I don't see it very often with SBT. If it happens, I exit
> > SBT and do sbt clean.
> >
> > Anyway, this is useful feedback and I think we should try to improve some
> > of these suites, but hopefully you can also try the faster SBT process.
> At
> > the end of the day, if we want integration tests, the whole test process
> > will take an hour, but most of the developers I know leave that to
> Jenkins
> > and only run individual tests locally before submitting a patch.
> >
> > Matei
> >
> >
> > > On Nov 30, 2014, at 2:39 PM, Ryan Williams <
> > ryan.blake.willi...@gmail.com> wrote:
> > >

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
Hi Ilya - you can just submit a pull request and the way we test them
is to run it through jenkins. You don't need to do anything special.

On Sun, Nov 30, 2014 at 8:57 PM, Ganelin, Ilya
 wrote:
> Hi, Patrick - with regards to testing on Jenkins, is the process for this
> to submit a pull request for the branch or is there another interface we
> can use to submit a build to Jenkins for testing?
>
> On 11/30/14, 6:49 PM, "Patrick Wendell"  wrote:
>
>>Hey Ryan,
>>
>>A few more things here. You should feel free to send patches to
>>Jenkins to test them, since this is the reference environment in which
>>we regularly run tests. This is the normal workflow for most
>>developers and we spend a lot of effort provisioning/maintaining a
>>very large jenkins cluster to allow developers access this resource. A
>>common development approach is to locally run tests that you've added
>>in a patch, then send it to jenkins for the full run, and then try to
>>debug locally if you see specific unanticipated test failures.
>>
>>One challenge we have is that given the proliferation of OS versions,
>>Java versions, Python versions, ulimits, etc. there is a combinatorial
>>number of environments in which tests could be run. It is very hard in
>>some cases to figure out post-hoc why a given test is not working in a
>>specific environment. I think a good solution here would be to use a
>>standardized docker container for running Spark tests and asking folks
>>to use that locally if they are trying to run all of the hundreds of
>>Spark tests.
>>
>>Another solution would be to mock out every system interaction in
>>Spark's tests including e.g. filesystem interactions to try and reduce
>>variance across environments. However, that seems difficult.
>>
>>As the number of developers of Spark increases, it's definitely a good
>>idea for us to invest in developer infrastructure including things
>>like snapshot releases, better documentation, etc. Thanks for bringing
>>this up as a pain point.
>>
>>- Patrick
>>
>>
>>On Sun, Nov 30, 2014 at 3:35 PM, Ryan Williams
>> wrote:
>>> thanks for the info, Matei and Brennon. I will try to switch my
>>>workflow to
>>> using sbt. Other potential action items:
>>>
>>> - currently the docs only contain information about building with maven,
>>> and even then don't cover many important cases, as I described in my
>>> previous email. If SBT is as much better as you've described then that
>>> should be made much more obvious. Wasn't it the case recently that there
>>> was only a page about building with SBT, and not one about building with
>>> maven? Clearer messaging around this needs to exist in the
>>>documentation,
>>> not just on the mailing list, imho.
>>>
>>> - +1 to better distinguishing between unit and integration tests, having
>>> separate scripts for each, improving documentation around common
>>>workflows,
>>> expectations of brittleness with each kind of test, advisability of just
>>> relying on Jenkins for certain kinds of tests to not waste too much
>>>time,
>>> etc. Things like the compiler crash should be discussed in the
>>> documentation, not just in the mailing list archives, if new
>>>contributors
>>> are likely to run into them through no fault of their own.
>>>
>>> - What is the algorithm you use to decide what tests you might have
>>>broken?
>>> Can we codify it in some scripts that other people can use?
>>>
>>>
>>>
>>> On Sun Nov 30 2014 at 4:06:41 PM Matei Zaharia 
>>> wrote:
>>>
 Hi Ryan,

 As a tip (and maybe this isn't documented well), I normally use SBT for
 development to avoid the slow build process, and use its interactive
 console to run only specific tests. The nice advantage is that SBT can
keep
 the Scala compiler loaded and JITed across builds, making it faster to
 iterate. To use it, you can do the following:

 - Start the SBT interactive console with sbt/sbt
 - Build your assembly by running the "assembly" target in the assembly
 project: assembly/assembly
 - Run all the tests in one module: core/test
 - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
(this
 also supports tab completion)

 Running all the tests does take a while, and I usually just rely on
 Jenkins for that once I've run the tests for the things I believed my
patch
 could break. But this is because some of them are integration tests
(e.g.
 DistributedSuite, which creates multi-process mini-clusters). Many of
the
 individual suites run fast without requiring this, however, so you can
pick
 the ones you want. Perhaps we should find a way to tag them so people
can
 do a "quick-test" that skips the integration ones.

 The assembly builds are annoying but they only take about a minute for
me
 on a MacBook Pro with SBT warmed up. The assembly is actually only
required
 for some of the "integration" tests (which launch new processes), but
I'd
>

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ganelin, Ilya
Hi, Patrick - with regards to testing on Jenkins, is the process for this
to submit a pull request for the branch or is there another interface we
can use to submit a build to Jenkins for testing?

On 11/30/14, 6:49 PM, "Patrick Wendell"  wrote:

>Hey Ryan,
>
>A few more things here. You should feel free to send patches to
>Jenkins to test them, since this is the reference environment in which
>we regularly run tests. This is the normal workflow for most
>developers and we spend a lot of effort provisioning/maintaining a
>very large jenkins cluster to allow developers access this resource. A
>common development approach is to locally run tests that you've added
>in a patch, then send it to jenkins for the full run, and then try to
>debug locally if you see specific unanticipated test failures.
>
>One challenge we have is that given the proliferation of OS versions,
>Java versions, Python versions, ulimits, etc. there is a combinatorial
>number of environments in which tests could be run. It is very hard in
>some cases to figure out post-hoc why a given test is not working in a
>specific environment. I think a good solution here would be to use a
>standardized docker container for running Spark tests and asking folks
>to use that locally if they are trying to run all of the hundreds of
>Spark tests.
>
>Another solution would be to mock out every system interaction in
>Spark's tests including e.g. filesystem interactions to try and reduce
>variance across environments. However, that seems difficult.
>
>As the number of developers of Spark increases, it's definitely a good
>idea for us to invest in developer infrastructure including things
>like snapshot releases, better documentation, etc. Thanks for bringing
>this up as a pain point.
>
>- Patrick
>
>
>On Sun, Nov 30, 2014 at 3:35 PM, Ryan Williams
> wrote:
>> thanks for the info, Matei and Brennon. I will try to switch my
>>workflow to
>> using sbt. Other potential action items:
>>
>> - currently the docs only contain information about building with maven,
>> and even then don't cover many important cases, as I described in my
>> previous email. If SBT is as much better as you've described then that
>> should be made much more obvious. Wasn't it the case recently that there
>> was only a page about building with SBT, and not one about building with
>> maven? Clearer messaging around this needs to exist in the
>>documentation,
>> not just on the mailing list, imho.
>>
>> - +1 to better distinguishing between unit and integration tests, having
>> separate scripts for each, improving documentation around common
>>workflows,
>> expectations of brittleness with each kind of test, advisability of just
>> relying on Jenkins for certain kinds of tests to not waste too much
>>time,
>> etc. Things like the compiler crash should be discussed in the
>> documentation, not just in the mailing list archives, if new
>>contributors
>> are likely to run into them through no fault of their own.
>>
>> - What is the algorithm you use to decide what tests you might have
>>broken?
>> Can we codify it in some scripts that other people can use?
>>
>>
>>
>> On Sun Nov 30 2014 at 4:06:41 PM Matei Zaharia 
>> wrote:
>>
>>> Hi Ryan,
>>>
>>> As a tip (and maybe this isn't documented well), I normally use SBT for
>>> development to avoid the slow build process, and use its interactive
>>> console to run only specific tests. The nice advantage is that SBT can
>>>keep
>>> the Scala compiler loaded and JITed across builds, making it faster to
>>> iterate. To use it, you can do the following:
>>>
>>> - Start the SBT interactive console with sbt/sbt
>>> - Build your assembly by running the "assembly" target in the assembly
>>> project: assembly/assembly
>>> - Run all the tests in one module: core/test
>>> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
>>>(this
>>> also supports tab completion)
>>>
>>> Running all the tests does take a while, and I usually just rely on
>>> Jenkins for that once I've run the tests for the things I believed my
>>>patch
>>> could break. But this is because some of them are integration tests
>>>(e.g.
>>> DistributedSuite, which creates multi-process mini-clusters). Many of
>>>the
>>> individual suites run fast without requiring this, however, so you can
>>>pick
>>> the ones you want. Perhaps we should find a way to tag them so people
>>>can
>>> do a "quick-test" that skips the integration ones.
>>>
>>> The assembly builds are annoying but they only take about a minute for
>>>me
>>> on a MacBook Pro with SBT warmed up. The assembly is actually only
>>>required
>>> for some of the "integration" tests (which launch new processes), but
>>>I'd
>>> recommend doing it all the time anyway since it would be very
>>>confusing to
>>> run those with an old assembly. The Scala compiler crash issue can
>>>also be
>>> a problem, but I don't see it very often with SBT. If it happens, I
>>>exit
>>> SBT and do sbt clean.
>>>
>>> Anyway, this is useful feedback

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ryan Williams
Thanks Patrick, great to hear that docs-snapshots-via-jenkins is already
JIRA'd; you can interpret some of this thread as a gigantic +1 from me on
prioritizing that, which it looks like you are doing :)

I do understand the limitations of the "github vs. official site" status
quo; I was mostly responding to a perceived implication that I should have
been getting building/testing-spark advice from the github .md files
instead of from /latest. I agree that neither one works very well
currently, and that docs-snapshots-via-jenkins is the right solution. Per
my other email, leaving /latest as-is sounds reasonable, as long as jenkins
is putting the latest docs *somewhere*.

On Sun Nov 30 2014 at 7:19:33 PM Patrick Wendell  wrote:

> Btw - the documnetation on github represents the source code of our
> docs, which is versioned with each release. Unfortunately github will
> always try to render ".md" files so it could look to a passerby like
> this is supposed to represent published docs. This is a feature
> limitation of github, AFAIK we cannot disable it.
>
> The official published docs are associated with each release and
> available on the apache.org website. I think "/latest" is a common
> convention for referring to the latest *published release* docs, so
> probably we can't change that (the audience for /latest is orders of
> magnitude larger than for snapshot docs). However we could just add
> /snapshot and publish docs there.
>
> - Patrick
>
> On Sun, Nov 30, 2014 at 6:15 PM, Patrick Wendell 
> wrote:
> > Hey Ryan,
> >
> > The existing JIRA also covers publishing nightly docs:
> > https://issues.apache.org/jira/browse/SPARK-1517
> >
> > - Patrick
> >
> > On Sun, Nov 30, 2014 at 5:53 PM, Ryan Williams
> >  wrote:
> >> Thanks Nicholas, glad to hear that some of this info will be pushed to
> the
> >> main site soon, but this brings up yet another point of confusion that
> I've
> >> struggled with, namely whether the documentation on github or that on
> >> spark.apache.org should be considered the primary reference for people
> >> seeking to learn about best practices for developing Spark.
> >>
> >> Trying to read docs starting from
> >> https://github.com/apache/spark/blob/master/docs/index.md right now, I
> find
> >> that all of the links to other parts of the documentation are broken:
> they
> >> point to relative paths that end in ".html", which will work when
> published
> >> on the docs-site, but that would have to end in ".md" if a person was
> to be
> >> able to navigate them on github.
> >>
> >> So expecting people to use the up-to-date docs on github (where all
> >> internal URLs 404 and the main github README suggests that the "latest
> >> Spark documentation" can be found on the actually-months-old docs-site
> >> ) is not a good
> >> solution. On the other hand, consulting months-old docs on the site is
> also
> >> problematic, as this thread and your last email have borne out.  The
> result
> >> is that there is no good place on the internet to learn about the most
> >> up-to-date best practices for using/developing Spark.
> >>
> >> Why not build http://spark.apache.org/docs/latest/ nightly (or every
> >> commit) off of what's in github, rather than having that URL point to
> the
> >> last release's docs (up to ~3 months old)? This way, casual users who
> want
> >> the docs for the released version they happen to be using (which is
> already
> >> frequently != "/latest" today, for many Spark users) can (still) find
> them
> >> at http://spark.apache.org/docs/X.Y.Z, and the github README can safely
> >> point people to a site (/latest) that actually has up-to-date docs that
> >> reflect ToT and whose links work.
> >>
> >> If there are concerns about existing semantics around "/latest" URLs
> being
> >> broken, some new URL could be used, like
> >> http://spark.apache.org/docs/snapshot/, but given that everything under
> >> http://spark.apache.org/docs/latest/ is in a state of
> >> planned-backwards-incompatible-changes every ~3mos, that doesn't sound
> like
> >> that serious an issue to me; anyone sending around permanent links to
> >> things under /latest is already going to have those links break / not
> make
> >> sense in the near future.
> >>
> >>
> >> On Sun Nov 30 2014 at 5:24:33 PM Nicholas Chammas <
> >> nicholas.cham...@gmail.com> wrote:
> >>
> >>>
> >>>- currently the docs only contain information about building with
> >>>maven,
> >>>and even then don't cover many important cases
> >>>
> >>>  All other points aside, I just want to point out that the docs
> document
> >>> both how to use Maven and SBT and clearly state
> >>>  building-spark.md#building-with-sbt>
> >>> that Maven is the "build of reference" while SBT may be preferable for
> >>> day-to-day development.
> >>>
> >>> I believe the main reason most people miss this documentation is that,
> >>> though it's up-to

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ryan Williams
Thanks Mark, most of those commands are things I've been using and used in
my original post except for "Start zinc". I now see the section about it on
the "unpublished" building-spark

page and will try using it.

Even so, finding those commands took a nontrivial amount of trial and
error, I've not seen them very-well-documented outside of this list (your
and Matei's emails (and previous emails to this list) each have more info
about building/testing with Maven and SBT (resp.) than building-spark

does),
the per-suite invocation is still subject to requiring assembly in some
cases ("without warning" from my perspective, having not read up on the
names of all Spark integration tests), spurious failures still abound,
there's no good way to run only the things that a given change actually
could have broken, etc.

Anyway, hopefully zinc brings me to the world of ~minute iteration times
that have been reported on this thread.


On Sun Nov 30 2014 at 6:53:57 PM Ryan Williams <
ryan.blake.willi...@gmail.com> wrote:

> Thanks Nicholas, glad to hear that some of this info will be pushed to the
> main site soon, but this brings up yet another point of confusion that I've
> struggled with, namely whether the documentation on github or that on
> spark.apache.org should be considered the primary reference for people
> seeking to learn about best practices for developing Spark.
>
> Trying to read docs starting from
> https://github.com/apache/spark/blob/master/docs/index.md right now, I
> find that all of the links to other parts of the documentation are broken:
> they point to relative paths that end in ".html", which will work when
> published on the docs-site, but that would have to end in ".md" if a person
> was to be able to navigate them on github.
>
> So expecting people to use the up-to-date docs on github (where all
> internal URLs 404 and the main github README suggests that the "latest
> Spark documentation" can be found on the actually-months-old docs-site
> ) is not a good
> solution. On the other hand, consulting months-old docs on the site is also
> problematic, as this thread and your last email have borne out.  The result
> is that there is no good place on the internet to learn about the most
> up-to-date best practices for using/developing Spark.
>
> Why not build http://spark.apache.org/docs/latest/ nightly (or every
> commit) off of what's in github, rather than having that URL point to the
> last release's docs (up to ~3 months old)? This way, casual users who want
> the docs for the released version they happen to be using (which is already
> frequently != "/latest" today, for many Spark users) can (still) find them
> at http://spark.apache.org/docs/X.Y.Z, and the github README can safely
> point people to a site (/latest) that actually has up-to-date docs that
> reflect ToT and whose links work.
>
> If there are concerns about existing semantics around "/latest" URLs being
> broken, some new URL could be used, like
> http://spark.apache.org/docs/snapshot/, but given that everything under
> http://spark.apache.org/docs/latest/ is in a state of
> planned-backwards-incompatible-changes every ~3mos, that doesn't sound like
> that serious an issue to me; anyone sending around permanent links to
> things under /latest is already going to have those links break / not make
> sense in the near future.
>
>
> On Sun Nov 30 2014 at 5:24:33 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>>
>>- currently the docs only contain information about building with
>>maven,
>>and even then don’t cover many important cases
>>
>>  All other points aside, I just want to point out that the docs document
>> both how to use Maven and SBT and clearly state
>> 
>> that Maven is the “build of reference” while SBT may be preferable for
>> day-to-day development.
>>
>> I believe the main reason most people miss this documentation is that,
>> though it’s up-to-date on GitHub, it has’t been published yet to the docs
>> site. It should go out with the 1.2 release.
>>
>> Improvements to the documentation on building Spark belong here:
>> https://github.com/apache/spark/blob/master/docs/building-spark.md
>>
>> If there are clear recommendations that come out of this thread but are
>> not in that doc, they should be added in there. Other, less important
>> details may possibly be better suited for the Contributing to Spark
>> 
>> guide.
>>
>> Nick
>> ​
>>
>> On Sun Nov 30 2014 at 6:50:55 PM Patrick Wendell 
>> wrote:
>>
>>> Hey Ryan,
>>>
>>> A few more things here. You should feel free to send patches to
>>> Jenk

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
Btw - the documnetation on github represents the source code of our
docs, which is versioned with each release. Unfortunately github will
always try to render ".md" files so it could look to a passerby like
this is supposed to represent published docs. This is a feature
limitation of github, AFAIK we cannot disable it.

The official published docs are associated with each release and
available on the apache.org website. I think "/latest" is a common
convention for referring to the latest *published release* docs, so
probably we can't change that (the audience for /latest is orders of
magnitude larger than for snapshot docs). However we could just add
/snapshot and publish docs there.

- Patrick

On Sun, Nov 30, 2014 at 6:15 PM, Patrick Wendell  wrote:
> Hey Ryan,
>
> The existing JIRA also covers publishing nightly docs:
> https://issues.apache.org/jira/browse/SPARK-1517
>
> - Patrick
>
> On Sun, Nov 30, 2014 at 5:53 PM, Ryan Williams
>  wrote:
>> Thanks Nicholas, glad to hear that some of this info will be pushed to the
>> main site soon, but this brings up yet another point of confusion that I've
>> struggled with, namely whether the documentation on github or that on
>> spark.apache.org should be considered the primary reference for people
>> seeking to learn about best practices for developing Spark.
>>
>> Trying to read docs starting from
>> https://github.com/apache/spark/blob/master/docs/index.md right now, I find
>> that all of the links to other parts of the documentation are broken: they
>> point to relative paths that end in ".html", which will work when published
>> on the docs-site, but that would have to end in ".md" if a person was to be
>> able to navigate them on github.
>>
>> So expecting people to use the up-to-date docs on github (where all
>> internal URLs 404 and the main github README suggests that the "latest
>> Spark documentation" can be found on the actually-months-old docs-site
>> ) is not a good
>> solution. On the other hand, consulting months-old docs on the site is also
>> problematic, as this thread and your last email have borne out.  The result
>> is that there is no good place on the internet to learn about the most
>> up-to-date best practices for using/developing Spark.
>>
>> Why not build http://spark.apache.org/docs/latest/ nightly (or every
>> commit) off of what's in github, rather than having that URL point to the
>> last release's docs (up to ~3 months old)? This way, casual users who want
>> the docs for the released version they happen to be using (which is already
>> frequently != "/latest" today, for many Spark users) can (still) find them
>> at http://spark.apache.org/docs/X.Y.Z, and the github README can safely
>> point people to a site (/latest) that actually has up-to-date docs that
>> reflect ToT and whose links work.
>>
>> If there are concerns about existing semantics around "/latest" URLs being
>> broken, some new URL could be used, like
>> http://spark.apache.org/docs/snapshot/, but given that everything under
>> http://spark.apache.org/docs/latest/ is in a state of
>> planned-backwards-incompatible-changes every ~3mos, that doesn't sound like
>> that serious an issue to me; anyone sending around permanent links to
>> things under /latest is already going to have those links break / not make
>> sense in the near future.
>>
>>
>> On Sun Nov 30 2014 at 5:24:33 PM Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>>
>>>- currently the docs only contain information about building with
>>>maven,
>>>and even then don't cover many important cases
>>>
>>>  All other points aside, I just want to point out that the docs document
>>> both how to use Maven and SBT and clearly state
>>> 
>>> that Maven is the "build of reference" while SBT may be preferable for
>>> day-to-day development.
>>>
>>> I believe the main reason most people miss this documentation is that,
>>> though it's up-to-date on GitHub, it has't been published yet to the docs
>>> site. It should go out with the 1.2 release.
>>>
>>> Improvements to the documentation on building Spark belong here:
>>> https://github.com/apache/spark/blob/master/docs/building-spark.md
>>>
>>> If there are clear recommendations that come out of this thread but are
>>> not in that doc, they should be added in there. Other, less important
>>> details may possibly be better suited for the Contributing to Spark
>>> 
>>> guide.
>>>
>>> Nick
>>>
>>>
>>> On Sun Nov 30 2014 at 6:50:55 PM Patrick Wendell 
>>> wrote:
>>>
 Hey Ryan,

 A few more things here. You should feel free to send patches to
 Jenkins to test them, since this is the reference environment in which
 we regularly run tests. This is the normal workflow for most
 developers and we spend a lot of effort pro

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
Hey Ryan,

The existing JIRA also covers publishing nightly docs:
https://issues.apache.org/jira/browse/SPARK-1517

- Patrick

On Sun, Nov 30, 2014 at 5:53 PM, Ryan Williams
 wrote:
> Thanks Nicholas, glad to hear that some of this info will be pushed to the
> main site soon, but this brings up yet another point of confusion that I've
> struggled with, namely whether the documentation on github or that on
> spark.apache.org should be considered the primary reference for people
> seeking to learn about best practices for developing Spark.
>
> Trying to read docs starting from
> https://github.com/apache/spark/blob/master/docs/index.md right now, I find
> that all of the links to other parts of the documentation are broken: they
> point to relative paths that end in ".html", which will work when published
> on the docs-site, but that would have to end in ".md" if a person was to be
> able to navigate them on github.
>
> So expecting people to use the up-to-date docs on github (where all
> internal URLs 404 and the main github README suggests that the "latest
> Spark documentation" can be found on the actually-months-old docs-site
> ) is not a good
> solution. On the other hand, consulting months-old docs on the site is also
> problematic, as this thread and your last email have borne out.  The result
> is that there is no good place on the internet to learn about the most
> up-to-date best practices for using/developing Spark.
>
> Why not build http://spark.apache.org/docs/latest/ nightly (or every
> commit) off of what's in github, rather than having that URL point to the
> last release's docs (up to ~3 months old)? This way, casual users who want
> the docs for the released version they happen to be using (which is already
> frequently != "/latest" today, for many Spark users) can (still) find them
> at http://spark.apache.org/docs/X.Y.Z, and the github README can safely
> point people to a site (/latest) that actually has up-to-date docs that
> reflect ToT and whose links work.
>
> If there are concerns about existing semantics around "/latest" URLs being
> broken, some new URL could be used, like
> http://spark.apache.org/docs/snapshot/, but given that everything under
> http://spark.apache.org/docs/latest/ is in a state of
> planned-backwards-incompatible-changes every ~3mos, that doesn't sound like
> that serious an issue to me; anyone sending around permanent links to
> things under /latest is already going to have those links break / not make
> sense in the near future.
>
>
> On Sun Nov 30 2014 at 5:24:33 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>>
>>- currently the docs only contain information about building with
>>maven,
>>and even then don't cover many important cases
>>
>>  All other points aside, I just want to point out that the docs document
>> both how to use Maven and SBT and clearly state
>> 
>> that Maven is the "build of reference" while SBT may be preferable for
>> day-to-day development.
>>
>> I believe the main reason most people miss this documentation is that,
>> though it's up-to-date on GitHub, it has't been published yet to the docs
>> site. It should go out with the 1.2 release.
>>
>> Improvements to the documentation on building Spark belong here:
>> https://github.com/apache/spark/blob/master/docs/building-spark.md
>>
>> If there are clear recommendations that come out of this thread but are
>> not in that doc, they should be added in there. Other, less important
>> details may possibly be better suited for the Contributing to Spark
>> 
>> guide.
>>
>> Nick
>>
>>
>> On Sun Nov 30 2014 at 6:50:55 PM Patrick Wendell 
>> wrote:
>>
>>> Hey Ryan,
>>>
>>> A few more things here. You should feel free to send patches to
>>> Jenkins to test them, since this is the reference environment in which
>>> we regularly run tests. This is the normal workflow for most
>>> developers and we spend a lot of effort provisioning/maintaining a
>>> very large jenkins cluster to allow developers access this resource. A
>>> common development approach is to locally run tests that you've added
>>> in a patch, then send it to jenkins for the full run, and then try to
>>> debug locally if you see specific unanticipated test failures.
>>>
>>> One challenge we have is that given the proliferation of OS versions,
>>> Java versions, Python versions, ulimits, etc. there is a combinatorial
>>> number of environments in which tests could be run. It is very hard in
>>> some cases to figure out post-hoc why a given test is not working in a
>>> specific environment. I think a good solution here would be to use a
>>> standardized docker container for running Spark tests and asking folks
>>> to use that locally if they are trying to run all of the hundreds of
>>> Spark tests.
>>

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ryan Williams
Thanks Nicholas, glad to hear that some of this info will be pushed to the
main site soon, but this brings up yet another point of confusion that I've
struggled with, namely whether the documentation on github or that on
spark.apache.org should be considered the primary reference for people
seeking to learn about best practices for developing Spark.

Trying to read docs starting from
https://github.com/apache/spark/blob/master/docs/index.md right now, I find
that all of the links to other parts of the documentation are broken: they
point to relative paths that end in ".html", which will work when published
on the docs-site, but that would have to end in ".md" if a person was to be
able to navigate them on github.

So expecting people to use the up-to-date docs on github (where all
internal URLs 404 and the main github README suggests that the "latest
Spark documentation" can be found on the actually-months-old docs-site
) is not a good
solution. On the other hand, consulting months-old docs on the site is also
problematic, as this thread and your last email have borne out.  The result
is that there is no good place on the internet to learn about the most
up-to-date best practices for using/developing Spark.

Why not build http://spark.apache.org/docs/latest/ nightly (or every
commit) off of what's in github, rather than having that URL point to the
last release's docs (up to ~3 months old)? This way, casual users who want
the docs for the released version they happen to be using (which is already
frequently != "/latest" today, for many Spark users) can (still) find them
at http://spark.apache.org/docs/X.Y.Z, and the github README can safely
point people to a site (/latest) that actually has up-to-date docs that
reflect ToT and whose links work.

If there are concerns about existing semantics around "/latest" URLs being
broken, some new URL could be used, like
http://spark.apache.org/docs/snapshot/, but given that everything under
http://spark.apache.org/docs/latest/ is in a state of
planned-backwards-incompatible-changes every ~3mos, that doesn't sound like
that serious an issue to me; anyone sending around permanent links to
things under /latest is already going to have those links break / not make
sense in the near future.


On Sun Nov 30 2014 at 5:24:33 PM Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

>
>- currently the docs only contain information about building with
>maven,
>and even then don’t cover many important cases
>
>  All other points aside, I just want to point out that the docs document
> both how to use Maven and SBT and clearly state
> 
> that Maven is the “build of reference” while SBT may be preferable for
> day-to-day development.
>
> I believe the main reason most people miss this documentation is that,
> though it’s up-to-date on GitHub, it has’t been published yet to the docs
> site. It should go out with the 1.2 release.
>
> Improvements to the documentation on building Spark belong here:
> https://github.com/apache/spark/blob/master/docs/building-spark.md
>
> If there are clear recommendations that come out of this thread but are
> not in that doc, they should be added in there. Other, less important
> details may possibly be better suited for the Contributing to Spark
> 
> guide.
>
> Nick
> ​
>
> On Sun Nov 30 2014 at 6:50:55 PM Patrick Wendell 
> wrote:
>
>> Hey Ryan,
>>
>> A few more things here. You should feel free to send patches to
>> Jenkins to test them, since this is the reference environment in which
>> we regularly run tests. This is the normal workflow for most
>> developers and we spend a lot of effort provisioning/maintaining a
>> very large jenkins cluster to allow developers access this resource. A
>> common development approach is to locally run tests that you've added
>> in a patch, then send it to jenkins for the full run, and then try to
>> debug locally if you see specific unanticipated test failures.
>>
>> One challenge we have is that given the proliferation of OS versions,
>> Java versions, Python versions, ulimits, etc. there is a combinatorial
>> number of environments in which tests could be run. It is very hard in
>> some cases to figure out post-hoc why a given test is not working in a
>> specific environment. I think a good solution here would be to use a
>> standardized docker container for running Spark tests and asking folks
>> to use that locally if they are trying to run all of the hundreds of
>> Spark tests.
>>
>> Another solution would be to mock out every system interaction in
>> Spark's tests including e.g. filesystem interactions to try and reduce
>> variance across environments. However, that seems difficult.
>>
>> As the number of developers of Spark increases, it's definitely a good
>> idea for us to invest 

Re: Spurious test failures, testing best practices

2014-11-30 Thread Mark Hamstra
>
> - Start the SBT interactive console with sbt/sbt
> - Build your assembly by running the "assembly" target in the assembly
> project: assembly/assembly
> - Run all the tests in one module: core/test
> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this
> also supports tab completion)


The equivalent using Maven:

- Start zinc
- Build your assembly using the mvn "package" or "install" target
("install" is actually the equivalent of SBT's "publishLocal") -- this step
is the first step in
http://spark.apache.org/docs/latest/building-with-maven.html#spark-tests-in-maven
- Run all the tests in one module: mvn -pl core test
- Run a specific suite: mvn -pl core
-DwildcardSuites=org.apache.spark.rdd.RDDSuite test (the -pl option isn't
strictly necessary if you don't mind waiting for Maven to scan through all
the other sub-projects only to do nothing; and, of course, it needs to be
something other than "core" if the test you want to run is in another
sub-project.)

You also typically want to carry along in each subsequent step any relevant
command line options you added in the "package"/"install" step.

On Sun, Nov 30, 2014 at 3:06 PM, Matei Zaharia 
wrote:

> Hi Ryan,
>
> As a tip (and maybe this isn't documented well), I normally use SBT for
> development to avoid the slow build process, and use its interactive
> console to run only specific tests. The nice advantage is that SBT can keep
> the Scala compiler loaded and JITed across builds, making it faster to
> iterate. To use it, you can do the following:
>
> - Start the SBT interactive console with sbt/sbt
> - Build your assembly by running the "assembly" target in the assembly
> project: assembly/assembly
> - Run all the tests in one module: core/test
> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this
> also supports tab completion)
>
> Running all the tests does take a while, and I usually just rely on
> Jenkins for that once I've run the tests for the things I believed my patch
> could break. But this is because some of them are integration tests (e.g.
> DistributedSuite, which creates multi-process mini-clusters). Many of the
> individual suites run fast without requiring this, however, so you can pick
> the ones you want. Perhaps we should find a way to tag them so people  can
> do a "quick-test" that skips the integration ones.
>
> The assembly builds are annoying but they only take about a minute for me
> on a MacBook Pro with SBT warmed up. The assembly is actually only required
> for some of the "integration" tests (which launch new processes), but I'd
> recommend doing it all the time anyway since it would be very confusing to
> run those with an old assembly. The Scala compiler crash issue can also be
> a problem, but I don't see it very often with SBT. If it happens, I exit
> SBT and do sbt clean.
>
> Anyway, this is useful feedback and I think we should try to improve some
> of these suites, but hopefully you can also try the faster SBT process. At
> the end of the day, if we want integration tests, the whole test process
> will take an hour, but most of the developers I know leave that to Jenkins
> and only run individual tests locally before submitting a patch.
>
> Matei
>
>
> > On Nov 30, 2014, at 2:39 PM, Ryan Williams <
> ryan.blake.willi...@gmail.com> wrote:
> >
> > In the course of trying to make contributions to Spark, I have had a lot
> of
> > trouble running Spark's tests successfully. The main pain points I've
> > experienced are:
> >
> >1) frequent, spurious test failures
> >2) high latency of running tests
> >3) difficulty running specific tests in an iterative fashion
> >
> > Here is an example series of failures that I encountered this weekend
> > (along with footnote links to the console output from each and
> > approximately how long each took):
> >
> > - `./dev/run-tests` [1]: failure in BroadcastSuite that I've not seen
> > before.
> > - `mvn '-Dsuites=*BroadcastSuite*' test` [2]: same failure.
> > - `mvn '-Dsuites=*BroadcastSuite* Unpersisting' test` [3]: BroadcastSuite
> > passed, but scala compiler crashed on the "catalyst" project.
> > - `mvn clean`: some attempts to run earlier commands (that previously
> > didn't crash the compiler) all result in the same compiler crash.
> Previous
> > discussion on this list implies this can only be solved by a `mvn clean`
> > [4].
> > - `mvn '-Dsuites=*BroadcastSuite*' test` [5]: immediately post-clean,
> > BroadcastSuite can't run because assembly is not built.
> > - `./dev/run-tests` again [6]: pyspark tests fail, some messages about
> > version mismatches and python 2.6. The machine this ran on has python
> 2.7,
> > so I don't know what that's about.
> > - `./dev/run-tests` again [7]: "too many open files" errors in several
> > tests. `ulimit -a` shows a maximum of 4864 open files. Apparently this is
> > not enough, but only some of the time? I increased it to 8192 and tried
> > again.
> > - `./dev/run-tests` again [8]: s

Re: Spurious test failures, testing best practices

2014-11-30 Thread Nicholas Chammas
   - currently the docs only contain information about building with maven,
   and even then don’t cover many important cases

 All other points aside, I just want to point out that the docs document
both how to use Maven and SBT and clearly state

that Maven is the “build of reference” while SBT may be preferable for
day-to-day development.

I believe the main reason most people miss this documentation is that,
though it’s up-to-date on GitHub, it has’t been published yet to the docs
site. It should go out with the 1.2 release.

Improvements to the documentation on building Spark belong here:
https://github.com/apache/spark/blob/master/docs/building-spark.md

If there are clear recommendations that come out of this thread but are not
in that doc, they should be added in there. Other, less important details
may possibly be better suited for the Contributing to Spark

guide.

Nick
​

On Sun Nov 30 2014 at 6:50:55 PM Patrick Wendell  wrote:

> Hey Ryan,
>
> A few more things here. You should feel free to send patches to
> Jenkins to test them, since this is the reference environment in which
> we regularly run tests. This is the normal workflow for most
> developers and we spend a lot of effort provisioning/maintaining a
> very large jenkins cluster to allow developers access this resource. A
> common development approach is to locally run tests that you've added
> in a patch, then send it to jenkins for the full run, and then try to
> debug locally if you see specific unanticipated test failures.
>
> One challenge we have is that given the proliferation of OS versions,
> Java versions, Python versions, ulimits, etc. there is a combinatorial
> number of environments in which tests could be run. It is very hard in
> some cases to figure out post-hoc why a given test is not working in a
> specific environment. I think a good solution here would be to use a
> standardized docker container for running Spark tests and asking folks
> to use that locally if they are trying to run all of the hundreds of
> Spark tests.
>
> Another solution would be to mock out every system interaction in
> Spark's tests including e.g. filesystem interactions to try and reduce
> variance across environments. However, that seems difficult.
>
> As the number of developers of Spark increases, it's definitely a good
> idea for us to invest in developer infrastructure including things
> like snapshot releases, better documentation, etc. Thanks for bringing
> this up as a pain point.
>
> - Patrick
>
>
> On Sun, Nov 30, 2014 at 3:35 PM, Ryan Williams
>  wrote:
> > thanks for the info, Matei and Brennon. I will try to switch my workflow
> to
> > using sbt. Other potential action items:
> >
> > - currently the docs only contain information about building with maven,
> > and even then don't cover many important cases, as I described in my
> > previous email. If SBT is as much better as you've described then that
> > should be made much more obvious. Wasn't it the case recently that there
> > was only a page about building with SBT, and not one about building with
> > maven? Clearer messaging around this needs to exist in the documentation,
> > not just on the mailing list, imho.
> >
> > - +1 to better distinguishing between unit and integration tests, having
> > separate scripts for each, improving documentation around common
> workflows,
> > expectations of brittleness with each kind of test, advisability of just
> > relying on Jenkins for certain kinds of tests to not waste too much time,
> > etc. Things like the compiler crash should be discussed in the
> > documentation, not just in the mailing list archives, if new contributors
> > are likely to run into them through no fault of their own.
> >
> > - What is the algorithm you use to decide what tests you might have
> broken?
> > Can we codify it in some scripts that other people can use?
> >
> >
> >
> > On Sun Nov 30 2014 at 4:06:41 PM Matei Zaharia 
> > wrote:
> >
> >> Hi Ryan,
> >>
> >> As a tip (and maybe this isn't documented well), I normally use SBT for
> >> development to avoid the slow build process, and use its interactive
> >> console to run only specific tests. The nice advantage is that SBT can
> keep
> >> the Scala compiler loaded and JITed across builds, making it faster to
> >> iterate. To use it, you can do the following:
> >>
> >> - Start the SBT interactive console with sbt/sbt
> >> - Build your assembly by running the "assembly" target in the assembly
> >> project: assembly/assembly
> >> - Run all the tests in one module: core/test
> >> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite
> (this
> >> also supports tab completion)
> >>
> >> Running all the tests does take a while, and I usually just rely on
> >> Jenkins for that once I've run the tests for the things I believed my
> patch
> >> could 

Re: Spurious test failures, testing best practices

2014-11-30 Thread Patrick Wendell
Hey Ryan,

A few more things here. You should feel free to send patches to
Jenkins to test them, since this is the reference environment in which
we regularly run tests. This is the normal workflow for most
developers and we spend a lot of effort provisioning/maintaining a
very large jenkins cluster to allow developers access this resource. A
common development approach is to locally run tests that you've added
in a patch, then send it to jenkins for the full run, and then try to
debug locally if you see specific unanticipated test failures.

One challenge we have is that given the proliferation of OS versions,
Java versions, Python versions, ulimits, etc. there is a combinatorial
number of environments in which tests could be run. It is very hard in
some cases to figure out post-hoc why a given test is not working in a
specific environment. I think a good solution here would be to use a
standardized docker container for running Spark tests and asking folks
to use that locally if they are trying to run all of the hundreds of
Spark tests.

Another solution would be to mock out every system interaction in
Spark's tests including e.g. filesystem interactions to try and reduce
variance across environments. However, that seems difficult.

As the number of developers of Spark increases, it's definitely a good
idea for us to invest in developer infrastructure including things
like snapshot releases, better documentation, etc. Thanks for bringing
this up as a pain point.

- Patrick


On Sun, Nov 30, 2014 at 3:35 PM, Ryan Williams
 wrote:
> thanks for the info, Matei and Brennon. I will try to switch my workflow to
> using sbt. Other potential action items:
>
> - currently the docs only contain information about building with maven,
> and even then don't cover many important cases, as I described in my
> previous email. If SBT is as much better as you've described then that
> should be made much more obvious. Wasn't it the case recently that there
> was only a page about building with SBT, and not one about building with
> maven? Clearer messaging around this needs to exist in the documentation,
> not just on the mailing list, imho.
>
> - +1 to better distinguishing between unit and integration tests, having
> separate scripts for each, improving documentation around common workflows,
> expectations of brittleness with each kind of test, advisability of just
> relying on Jenkins for certain kinds of tests to not waste too much time,
> etc. Things like the compiler crash should be discussed in the
> documentation, not just in the mailing list archives, if new contributors
> are likely to run into them through no fault of their own.
>
> - What is the algorithm you use to decide what tests you might have broken?
> Can we codify it in some scripts that other people can use?
>
>
>
> On Sun Nov 30 2014 at 4:06:41 PM Matei Zaharia 
> wrote:
>
>> Hi Ryan,
>>
>> As a tip (and maybe this isn't documented well), I normally use SBT for
>> development to avoid the slow build process, and use its interactive
>> console to run only specific tests. The nice advantage is that SBT can keep
>> the Scala compiler loaded and JITed across builds, making it faster to
>> iterate. To use it, you can do the following:
>>
>> - Start the SBT interactive console with sbt/sbt
>> - Build your assembly by running the "assembly" target in the assembly
>> project: assembly/assembly
>> - Run all the tests in one module: core/test
>> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this
>> also supports tab completion)
>>
>> Running all the tests does take a while, and I usually just rely on
>> Jenkins for that once I've run the tests for the things I believed my patch
>> could break. But this is because some of them are integration tests (e.g.
>> DistributedSuite, which creates multi-process mini-clusters). Many of the
>> individual suites run fast without requiring this, however, so you can pick
>> the ones you want. Perhaps we should find a way to tag them so people  can
>> do a "quick-test" that skips the integration ones.
>>
>> The assembly builds are annoying but they only take about a minute for me
>> on a MacBook Pro with SBT warmed up. The assembly is actually only required
>> for some of the "integration" tests (which launch new processes), but I'd
>> recommend doing it all the time anyway since it would be very confusing to
>> run those with an old assembly. The Scala compiler crash issue can also be
>> a problem, but I don't see it very often with SBT. If it happens, I exit
>> SBT and do sbt clean.
>>
>> Anyway, this is useful feedback and I think we should try to improve some
>> of these suites, but hopefully you can also try the faster SBT process. At
>> the end of the day, if we want integration tests, the whole test process
>> will take an hour, but most of the developers I know leave that to Jenkins
>> and only run individual tests locally before submitting a patch.
>>
>> Matei
>>
>>
>> > On Nov 30, 2014, at 2:39

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ryan Williams
thanks for the info, Matei and Brennon. I will try to switch my workflow to
using sbt. Other potential action items:

- currently the docs only contain information about building with maven,
and even then don't cover many important cases, as I described in my
previous email. If SBT is as much better as you've described then that
should be made much more obvious. Wasn't it the case recently that there
was only a page about building with SBT, and not one about building with
maven? Clearer messaging around this needs to exist in the documentation,
not just on the mailing list, imho.

- +1 to better distinguishing between unit and integration tests, having
separate scripts for each, improving documentation around common workflows,
expectations of brittleness with each kind of test, advisability of just
relying on Jenkins for certain kinds of tests to not waste too much time,
etc. Things like the compiler crash should be discussed in the
documentation, not just in the mailing list archives, if new contributors
are likely to run into them through no fault of their own.

- What is the algorithm you use to decide what tests you might have broken?
Can we codify it in some scripts that other people can use?



On Sun Nov 30 2014 at 4:06:41 PM Matei Zaharia 
wrote:

> Hi Ryan,
>
> As a tip (and maybe this isn't documented well), I normally use SBT for
> development to avoid the slow build process, and use its interactive
> console to run only specific tests. The nice advantage is that SBT can keep
> the Scala compiler loaded and JITed across builds, making it faster to
> iterate. To use it, you can do the following:
>
> - Start the SBT interactive console with sbt/sbt
> - Build your assembly by running the "assembly" target in the assembly
> project: assembly/assembly
> - Run all the tests in one module: core/test
> - Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this
> also supports tab completion)
>
> Running all the tests does take a while, and I usually just rely on
> Jenkins for that once I've run the tests for the things I believed my patch
> could break. But this is because some of them are integration tests (e.g.
> DistributedSuite, which creates multi-process mini-clusters). Many of the
> individual suites run fast without requiring this, however, so you can pick
> the ones you want. Perhaps we should find a way to tag them so people  can
> do a "quick-test" that skips the integration ones.
>
> The assembly builds are annoying but they only take about a minute for me
> on a MacBook Pro with SBT warmed up. The assembly is actually only required
> for some of the "integration" tests (which launch new processes), but I'd
> recommend doing it all the time anyway since it would be very confusing to
> run those with an old assembly. The Scala compiler crash issue can also be
> a problem, but I don't see it very often with SBT. If it happens, I exit
> SBT and do sbt clean.
>
> Anyway, this is useful feedback and I think we should try to improve some
> of these suites, but hopefully you can also try the faster SBT process. At
> the end of the day, if we want integration tests, the whole test process
> will take an hour, but most of the developers I know leave that to Jenkins
> and only run individual tests locally before submitting a patch.
>
> Matei
>
>
> > On Nov 30, 2014, at 2:39 PM, Ryan Williams <
> ryan.blake.willi...@gmail.com> wrote:
> >
> > In the course of trying to make contributions to Spark, I have had a lot
> of
> > trouble running Spark's tests successfully. The main pain points I've
> > experienced are:
> >
> >1) frequent, spurious test failures
> >2) high latency of running tests
> >3) difficulty running specific tests in an iterative fashion
> >
> > Here is an example series of failures that I encountered this weekend
> > (along with footnote links to the console output from each and
> > approximately how long each took):
> >
> > - `./dev/run-tests` [1]: failure in BroadcastSuite that I've not seen
> > before.
> > - `mvn '-Dsuites=*BroadcastSuite*' test` [2]: same failure.
> > - `mvn '-Dsuites=*BroadcastSuite* Unpersisting' test` [3]: BroadcastSuite
> > passed, but scala compiler crashed on the "catalyst" project.
> > - `mvn clean`: some attempts to run earlier commands (that previously
> > didn't crash the compiler) all result in the same compiler crash.
> Previous
> > discussion on this list implies this can only be solved by a `mvn clean`
> > [4].
> > - `mvn '-Dsuites=*BroadcastSuite*' test` [5]: immediately post-clean,
> > BroadcastSuite can't run because assembly is not built.
> > - `./dev/run-tests` again [6]: pyspark tests fail, some messages about
> > version mismatches and python 2.6. The machine this ran on has python
> 2.7,
> > so I don't know what that's about.
> > - `./dev/run-tests` again [7]: "too many open files" errors in several
> > tests. `ulimit -a` shows a maximum of 4864 open files. Apparently this is
> > not enough, but only some of the time?

Re: Spurious test failures, testing best practices

2014-11-30 Thread Matei Zaharia
Hi Ryan,

As a tip (and maybe this isn't documented well), I normally use SBT for 
development to avoid the slow build process, and use its interactive console to 
run only specific tests. The nice advantage is that SBT can keep the Scala 
compiler loaded and JITed across builds, making it faster to iterate. To use 
it, you can do the following:

- Start the SBT interactive console with sbt/sbt
- Build your assembly by running the "assembly" target in the assembly project: 
assembly/assembly
- Run all the tests in one module: core/test
- Run a specific suite: core/test-only org.apache.spark.rdd.RDDSuite (this also 
supports tab completion)

Running all the tests does take a while, and I usually just rely on Jenkins for 
that once I've run the tests for the things I believed my patch could break. 
But this is because some of them are integration tests (e.g. DistributedSuite, 
which creates multi-process mini-clusters). Many of the individual suites run 
fast without requiring this, however, so you can pick the ones you want. 
Perhaps we should find a way to tag them so people  can do a "quick-test" that 
skips the integration ones.

The assembly builds are annoying but they only take about a minute for me on a 
MacBook Pro with SBT warmed up. The assembly is actually only required for some 
of the "integration" tests (which launch new processes), but I'd recommend 
doing it all the time anyway since it would be very confusing to run those with 
an old assembly. The Scala compiler crash issue can also be a problem, but I 
don't see it very often with SBT. If it happens, I exit SBT and do sbt clean.

Anyway, this is useful feedback and I think we should try to improve some of 
these suites, but hopefully you can also try the faster SBT process. At the end 
of the day, if we want integration tests, the whole test process will take an 
hour, but most of the developers I know leave that to Jenkins and only run 
individual tests locally before submitting a patch.

Matei


> On Nov 30, 2014, at 2:39 PM, Ryan Williams  
> wrote:
> 
> In the course of trying to make contributions to Spark, I have had a lot of
> trouble running Spark's tests successfully. The main pain points I've
> experienced are:
> 
>1) frequent, spurious test failures
>2) high latency of running tests
>3) difficulty running specific tests in an iterative fashion
> 
> Here is an example series of failures that I encountered this weekend
> (along with footnote links to the console output from each and
> approximately how long each took):
> 
> - `./dev/run-tests` [1]: failure in BroadcastSuite that I've not seen
> before.
> - `mvn '-Dsuites=*BroadcastSuite*' test` [2]: same failure.
> - `mvn '-Dsuites=*BroadcastSuite* Unpersisting' test` [3]: BroadcastSuite
> passed, but scala compiler crashed on the "catalyst" project.
> - `mvn clean`: some attempts to run earlier commands (that previously
> didn't crash the compiler) all result in the same compiler crash. Previous
> discussion on this list implies this can only be solved by a `mvn clean`
> [4].
> - `mvn '-Dsuites=*BroadcastSuite*' test` [5]: immediately post-clean,
> BroadcastSuite can't run because assembly is not built.
> - `./dev/run-tests` again [6]: pyspark tests fail, some messages about
> version mismatches and python 2.6. The machine this ran on has python 2.7,
> so I don't know what that's about.
> - `./dev/run-tests` again [7]: "too many open files" errors in several
> tests. `ulimit -a` shows a maximum of 4864 open files. Apparently this is
> not enough, but only some of the time? I increased it to 8192 and tried
> again.
> - `./dev/run-tests` again [8]: same pyspark errors as before. This seems to
> be the issue from SPARK-3867 [9], which was supposedly fixed on October 14;
> not sure how I'm seeing it now. In any case, switched to Python 2.6 and
> installed unittest2, and python/run-tests seems to be unblocked.
> - `./dev/run-tests` again [10]: finally passes!
> 
> This was on a spark checkout at ceb6281 (ToT Friday), with a few trivial
> changes added on (that I wanted to test before sending out a PR), on a
> macbook running OSX Yosemite (10.10.1), java 1.8 and mvn 3.2.3 [11].
> 
> Meanwhile, on a linux 2.6.32 / CentOS 6.4 machine, I tried similar commands
> from the same repo state:
> 
> - `./dev/run-tests` [12]: YarnClusterSuite failure.
> - `./dev/run-tests` [13]: same YarnClusterSuite failure. I know I've seen
> this one before on this machine and am guessing it actually occurs every
> time.
> - `./dev/run-tests` [14]: to be sure, I reverted my changes, ran one more
> time from ceb6281, and saw the same failure.
> 
> This was with java 1.7 and maven 3.2.3 [15]. In one final attempt to narrow
> down the linux YarnClusterSuite failure, I ran `./dev/run-tests` on my mac,
> from ceb6281, with java 1.7 (instead of 1.8, which the previous runs used),
> and it passed [16], so the failure seems specific to my linux machine/arch.
> 
> At this point I believe that my changes do

Re: Spurious test failures, testing best practices

2014-11-30 Thread York, Brennon
+1, you aren¹t alone in this. I certainly would like some clarity in these
things well, but, as its been said on this listserv a few times (and you
noted), most developers use `sbt` for their day-to-day compilations to
greatly speed up the iterative testing process. I personally use `sbt` for
all builds until I¹m ready to submit a PR and *then* run ./dev/run-tests
to ensure all the tests / code I¹ve written still pass (i.e. nothing
breaks in the code I¹ve changed or downstream). Sometimes, like you¹ve
said, you still get errors with the ./dev/run-tests script, but, for me,
it comes down to where the errors initiate from and whether I¹m confident
the code I wrote caused it or not as the delimiter to whether I submit the
PR.

Again, not a great answer and hoping others can shed more light, but thats
my 2c on the problem.

On 11/30/14, 5:39 PM, "Ryan Williams" 
wrote:

>In the course of trying to make contributions to Spark, I have had a lot
>of
>trouble running Spark's tests successfully. The main pain points I've
>experienced are:
>
>1) frequent, spurious test failures
>2) high latency of running tests
>3) difficulty running specific tests in an iterative fashion
>
>Here is an example series of failures that I encountered this weekend
>(along with footnote links to the console output from each and
>approximately how long each took):
>
>- `./dev/run-tests` [1]: failure in BroadcastSuite that I've not seen
>before.
>- `mvn '-Dsuites=*BroadcastSuite*' test` [2]: same failure.
>- `mvn '-Dsuites=*BroadcastSuite* Unpersisting' test` [3]: BroadcastSuite
>passed, but scala compiler crashed on the "catalyst" project.
>- `mvn clean`: some attempts to run earlier commands (that previously
>didn't crash the compiler) all result in the same compiler crash. Previous
>discussion on this list implies this can only be solved by a `mvn clean`
>[4].
>- `mvn '-Dsuites=*BroadcastSuite*' test` [5]: immediately post-clean,
>BroadcastSuite can't run because assembly is not built.
>- `./dev/run-tests` again [6]: pyspark tests fail, some messages about
>version mismatches and python 2.6. The machine this ran on has python 2.7,
>so I don't know what that's about.
>- `./dev/run-tests` again [7]: "too many open files" errors in several
>tests. `ulimit -a` shows a maximum of 4864 open files. Apparently this is
>not enough, but only some of the time? I increased it to 8192 and tried
>again.
>- `./dev/run-tests` again [8]: same pyspark errors as before. This seems
>to
>be the issue from SPARK-3867 [9], which was supposedly fixed on October
>14;
>not sure how I'm seeing it now. In any case, switched to Python 2.6 and
>installed unittest2, and python/run-tests seems to be unblocked.
>- `./dev/run-tests` again [10]: finally passes!
>
>This was on a spark checkout at ceb6281 (ToT Friday), with a few trivial
>changes added on (that I wanted to test before sending out a PR), on a
>macbook running OSX Yosemite (10.10.1), java 1.8 and mvn 3.2.3 [11].
>
>Meanwhile, on a linux 2.6.32 / CentOS 6.4 machine, I tried similar
>commands
>from the same repo state:
>
>- `./dev/run-tests` [12]: YarnClusterSuite failure.
>- `./dev/run-tests` [13]: same YarnClusterSuite failure. I know I've seen
>this one before on this machine and am guessing it actually occurs every
>time.
>- `./dev/run-tests` [14]: to be sure, I reverted my changes, ran one more
>time from ceb6281, and saw the same failure.
>
>This was with java 1.7 and maven 3.2.3 [15]. In one final attempt to
>narrow
>down the linux YarnClusterSuite failure, I ran `./dev/run-tests` on my
>mac,
>from ceb6281, with java 1.7 (instead of 1.8, which the previous runs
>used),
>and it passed [16], so the failure seems specific to my linux
>machine/arch.
>
>At this point I believe that my changes don't break any tests (the
>YarnClusterSuite failure on my linux presumably not being... "real"), and
>I
>am ready to send out a PR. Whew!
>
>However, reflecting on the 5 or 6 distinct failure-modes represented
>above:
>
>- One of them (too many files open), is something I can (and did,
>hopefully) fix once and for all. It cost me an ~hour this time
>(approximate
>time of running ./dev/run-tests) and a few hours other times when I didn't
>fully understand/fix it. It doesn't happen deterministically (why?), but
>does happen somewhat frequently to people, having been discussed on the
>user list multiple times [17] and on SO [18]. Maybe some note in the
>documentation advising people to check their ulimit makes sense?
>- One of them (unittest2 must be installed for python 2.6) was supposedly
>fixed upstream of the commits I tested here; I don't know why I'm still
>running into it. This cost me a few hours of running `./dev/run-tests`
>multiple times to see if it was transient, plus some time researching and
>working around it.
>- The original BroadcastSuite failure cost me a few hours and went away
>before I'd even run `mvn clean`.
>- A new incarnation of the sbt-compiler-crash phenomenon cost me a few
>hours o