I dug around this a bit a while ago, I think if someone sat down and
profiled the tests it's likely we could find some things to optimize.
In particular, there may be overheads in starting up a local spark
context that could be minimized and speed up all the tests. Also,
there are some tests (especially in Streaming) that take really long,
like 60 seconds for a single test (see some of the new flume tests).
These could almost certainly be optimized.

I think 5 minutes might be out of reach, but something like a 2X
improvement might be possible and would be very valuable if
accomplished.

- Patrick

On Fri, Aug 8, 2014 at 11:24 AM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> Just as a note, when you're developing stuff, you can use "test-only" in sbt, 
> or the equivalent feature in Maven, to run just some of the tests. This is 
> what I do, I don't wait for Jenkins to run things. 90% of the time if it 
> passes the tests that I know could break stuff, it will pass all of Jenkins.
>
> Jenkins should always be doing all the integration tests, so I don't think it 
> will become *that* much shorter in the long run, though it can certainly be 
> improved.
>
> Matei
>
> On August 8, 2014 at 10:20:35 AM, Nicolas Liochon (nkey...@gmail.com) wrote:
>
> fwiw, when we did this work in HBase, we categorized the tests. Then some
> tests can share a single jvm, while some others need to be isolated in
> their own jvm. Nevertheless surefire can still run them in parallel by
> starting/stopping several jvm.
>
> Nicolas
>
>
> On Fri, Aug 8, 2014 at 7:10 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> ScalaTest actually has support for parallelization built-in. We can use
>> that.
>>
>> The main challenge is to make sure all the test suites can work in parallel
>> when running along side each other.
>>
>>
>> On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>> > How about using parallel execution feature of maven-surefire-plugin
>> > (assuming all the tests were made parallel friendly) ?
>> >
>> >
>> >
>> http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
>> >
>> > Cheers
>> >
>> >
>> > On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen <so...@cloudera.com> wrote:
>> >
>> > > A common approach is to separate unit tests from integration tests.
>> > > Maven has support for this distinction. I'm not sure it helps a lot
>> > > though, since it only helps you to not run integration tests all the
>> > > time. But lots of Spark tests are integration-test-like and are
>> > > important to run to know a change works.
>> > >
>> > > I haven't heard of a plugin to run different test suites remotely on
>> > > many machines, but I would not be surprised if it exists.
>> > >
>> > > The Jenkins servers aren't CPU-bound as far as I can tell. It's that
>> > > the tests spend a lot of time waiting for bits to start up or
>> > > complete. That implies the existing tests could be sped up by just
>> > > running in parallel locally. I recall someone recently proposed this?
>> > >
>> > > And I think the problem with that is simply that some of the tests
>> > > collide with each other, by opening up the same port at the same time
>> > > for example. I know that kind of problem is being attacked even right
>> > > now. But if all the tests were made parallel friendly, I imagine
>> > > parallelism could be enabled and speed up builds greatly without any
>> > > remote machines.
>> > >
>> > >
>> > > On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas
>> > > <nicholas.cham...@gmail.com> wrote:
>> > > > Howdy,
>> > > >
>> > > > Do we think it's both feasible and worthwhile to invest in getting
>> our
>> > > unit
>> > > > tests to finish in under 5 minutes (or something similarly brief)
>> when
>> > > run
>> > > > by Jenkins?
>> > > >
>> > > > Unit tests currently seem to take anywhere from 30 min to 2 hours. As
>> > > > people add more tests, I imagine this time will only grow. I think it
>> > > would
>> > > > be better for both contributors and reviewers if they didn't have to
>> > wait
>> > > > so long for test results; PR reviews would be shorter, if nothing
>> else.
>> > > >
>> > > > I don't know how how this is normally done, but maybe it wouldn't be
>> > too
>> > > > much work to get a test cycle to feel lighter.
>> > > >
>> > > > Most unit tests are independent and can be run concurrently, right?
>> > Would
>> > > > it make sense to build a given patch on many servers at once and send
>> > > > disjoint sets of unit tests to each?
>> > > >
>> > > > I'd be interested in working on something like that if possible (and
>> > > > sensible).
>> > > >
>> > > > Nick
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> > > For additional commands, e-mail: dev-h...@spark.apache.org
>> > >
>> > >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to