Paring down / tagging tests (or some other way to avoid timeouts)?
Hello y'all, So I've been getting kinda annoyed with how many PR tests have been timing out. I took one of the logs from one of my PRs and started to do some crunching on the data from the output, and here's a list of the 5 slowest suites: 307.14s HiveSparkSubmitSuite 382.641s VersionsSuite 398s CliSuite 410.52s HashJoinCompatibilitySuite 2508.61s HiveCompatibilitySuite Looking at those, I'm not surprised at all that we see so many timeouts. Is there any ongoing effort to trim down those tests (especially HiveCompatibilitySuite) or somehow restrict when they're run? Almost 1 hour to run a single test suite that affects a rather isolated part of the code base looks a little excessive to me. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Paring down / tagging tests (or some other way to avoid timeouts)?
I'd be okay skipping the HiveCompatibilitySuite for core-only changes. They do often catch bugs in changes to catalyst or sql though. Same for HashJoinCompatibilitySuite/VersionsSuite. HiveSparkSubmitSuite/CliSuite should probably stay, as they do test things like addJar that have been broken by core in the past. On Tue, Aug 25, 2015 at 1:40 PM, Patrick Wendell pwend...@gmail.com wrote: There is already code in place that restricts which tests run depending on which code is modified. However, changes inside of Spark's core currently require running all dependent tests. If you have some ideas about how to improve that heuristic, it would be great. - Patrick On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin van...@cloudera.com wrote: Hello y'all, So I've been getting kinda annoyed with how many PR tests have been timing out. I took one of the logs from one of my PRs and started to do some crunching on the data from the output, and here's a list of the 5 slowest suites: 307.14s HiveSparkSubmitSuite 382.641s VersionsSuite 398s CliSuite 410.52s HashJoinCompatibilitySuite 2508.61s HiveCompatibilitySuite Looking at those, I'm not surprised at all that we see so many timeouts. Is there any ongoing effort to trim down those tests (especially HiveCompatibilitySuite) or somehow restrict when they're run? Almost 1 hour to run a single test suite that affects a rather isolated part of the code base looks a little excessive to me. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Paring down / tagging tests (or some other way to avoid timeouts)?
There is already code in place that restricts which tests run depending on which code is modified. However, changes inside of Spark's core currently require running all dependent tests. If you have some ideas about how to improve that heuristic, it would be great. - Patrick On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin van...@cloudera.com wrote: Hello y'all, So I've been getting kinda annoyed with how many PR tests have been timing out. I took one of the logs from one of my PRs and started to do some crunching on the data from the output, and here's a list of the 5 slowest suites: 307.14s HiveSparkSubmitSuite 382.641s VersionsSuite 398s CliSuite 410.52s HashJoinCompatibilitySuite 2508.61s HiveCompatibilitySuite Looking at those, I'm not surprised at all that we see so many timeouts. Is there any ongoing effort to trim down those tests (especially HiveCompatibilitySuite) or somehow restrict when they're run? Almost 1 hour to run a single test suite that affects a rather isolated part of the code base looks a little excessive to me. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Paring down / tagging tests (or some other way to avoid timeouts)?
I chatted with Patrick briefly offline. It would be interesting to know whether the scripts have some way of saying run a smaller version of certain tests (e.g. by setting a system property that the tests look at to decide what to run). That way, if there are no changes under sql/, we could still run a small part of HiveCompatibilitySuite, just not all of it. The reasoning being that if a core change breaks something in Hive, it will probably break many tests, not a specific one. On Tue, Aug 25, 2015 at 1:48 PM, Michael Armbrust mich...@databricks.com wrote: I'd be okay skipping the HiveCompatibilitySuite for core-only changes. They do often catch bugs in changes to catalyst or sql though. Same for HashJoinCompatibilitySuite/VersionsSuite. HiveSparkSubmitSuite/CliSuite should probably stay, as they do test things like addJar that have been broken by core in the past. On Tue, Aug 25, 2015 at 1:40 PM, Patrick Wendell pwend...@gmail.com wrote: There is already code in place that restricts which tests run depending on which code is modified. However, changes inside of Spark's core currently require running all dependent tests. If you have some ideas about how to improve that heuristic, it would be great. - Patrick On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin van...@cloudera.com wrote: Hello y'all, So I've been getting kinda annoyed with how many PR tests have been timing out. I took one of the logs from one of my PRs and started to do some crunching on the data from the output, and here's a list of the 5 slowest suites: 307.14s HiveSparkSubmitSuite 382.641s VersionsSuite 398s CliSuite 410.52s HashJoinCompatibilitySuite 2508.61s HiveCompatibilitySuite Looking at those, I'm not surprised at all that we see so many timeouts. Is there any ongoing effort to trim down those tests (especially HiveCompatibilitySuite) or somehow restrict when they're run? Almost 1 hour to run a single test suite that affects a rather isolated part of the code base looks a little excessive to me. -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org