Paring down / tagging tests (or some other way to avoid timeouts)?

2015-08-25 Thread Marcelo Vanzin
Hello y'all, So I've been getting kinda annoyed with how many PR tests have been timing out. I took one of the logs from one of my PRs and started to do some crunching on the data from the output, and here's a list of the 5 slowest suites: 307.14s HiveSparkSubmitSuite 382.641s VersionsSuite 398s

Re: Paring down / tagging tests (or some other way to avoid timeouts)?

2015-08-25 Thread Michael Armbrust
I'd be okay skipping the HiveCompatibilitySuite for core-only changes. They do often catch bugs in changes to catalyst or sql though. Same for HashJoinCompatibilitySuite/VersionsSuite. HiveSparkSubmitSuite/CliSuite should probably stay, as they do test things like addJar that have been broken by

Re: Paring down / tagging tests (or some other way to avoid timeouts)?

2015-08-25 Thread Patrick Wendell
There is already code in place that restricts which tests run depending on which code is modified. However, changes inside of Spark's core currently require running all dependent tests. If you have some ideas about how to improve that heuristic, it would be great. - Patrick On Tue, Aug 25, 2015

Re: Paring down / tagging tests (or some other way to avoid timeouts)?

2015-08-25 Thread Marcelo Vanzin
I chatted with Patrick briefly offline. It would be interesting to know whether the scripts have some way of saying run a smaller version of certain tests (e.g. by setting a system property that the tests look at to decide what to run). That way, if there are no changes under sql/, we could still