Re: Paring down / tagging tests (or some other way to avoid timeouts)?

2015-08-25 Thread Michael Armbrust
I'd be okay skipping the HiveCompatibilitySuite for core-only changes.
They do often catch bugs in changes to catalyst or sql though.  Same for
HashJoinCompatibilitySuite/VersionsSuite.

HiveSparkSubmitSuite/CliSuite should probably stay, as they do test things
like addJar that have been broken by core in the past.

On Tue, Aug 25, 2015 at 1:40 PM, Patrick Wendell pwend...@gmail.com wrote:

 There is already code in place that restricts which tests run
 depending on which code is modified. However, changes inside of
 Spark's core currently require running all dependent tests. If you
 have some ideas about how to improve that heuristic, it would be
 great.

 - Patrick

 On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hello y'all,
 
  So I've been getting kinda annoyed with how many PR tests have been
  timing out. I took one of the logs from one of my PRs and started to
  do some crunching on the data from the output, and here's a list of
  the 5 slowest suites:
 
  307.14s HiveSparkSubmitSuite
  382.641s VersionsSuite
  398s CliSuite
  410.52s HashJoinCompatibilitySuite
  2508.61s HiveCompatibilitySuite
 
  Looking at those, I'm not surprised at all that we see so many
  timeouts. Is there any ongoing effort to trim down those tests
  (especially HiveCompatibilitySuite) or somehow restrict when they're
  run?
 
  Almost 1 hour to run a single test suite that affects a rather
  isolated part of the code base looks a little excessive to me.
 
  --
  Marcelo
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: Paring down / tagging tests (or some other way to avoid timeouts)?

2015-08-25 Thread Patrick Wendell
There is already code in place that restricts which tests run
depending on which code is modified. However, changes inside of
Spark's core currently require running all dependent tests. If you
have some ideas about how to improve that heuristic, it would be
great.

- Patrick

On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin van...@cloudera.com wrote:
 Hello y'all,

 So I've been getting kinda annoyed with how many PR tests have been
 timing out. I took one of the logs from one of my PRs and started to
 do some crunching on the data from the output, and here's a list of
 the 5 slowest suites:

 307.14s HiveSparkSubmitSuite
 382.641s VersionsSuite
 398s CliSuite
 410.52s HashJoinCompatibilitySuite
 2508.61s HiveCompatibilitySuite

 Looking at those, I'm not surprised at all that we see so many
 timeouts. Is there any ongoing effort to trim down those tests
 (especially HiveCompatibilitySuite) or somehow restrict when they're
 run?

 Almost 1 hour to run a single test suite that affects a rather
 isolated part of the code base looks a little excessive to me.

 --
 Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Paring down / tagging tests (or some other way to avoid timeouts)?

2015-08-25 Thread Marcelo Vanzin
I chatted with Patrick briefly offline. It would be interesting to
know whether the scripts have some way of saying run a smaller
version of certain tests (e.g. by setting a system property that the
tests look at to decide what to run). That way, if there are no
changes under sql/, we could still run a small part of
HiveCompatibilitySuite, just not all of it. The reasoning being that
if a core change breaks something in Hive, it will probably break many
tests, not a specific one.

On Tue, Aug 25, 2015 at 1:48 PM, Michael Armbrust
mich...@databricks.com wrote:
 I'd be okay skipping the HiveCompatibilitySuite for core-only changes.  They
 do often catch bugs in changes to catalyst or sql though.  Same for
 HashJoinCompatibilitySuite/VersionsSuite.

 HiveSparkSubmitSuite/CliSuite should probably stay, as they do test things
 like addJar that have been broken by core in the past.

 On Tue, Aug 25, 2015 at 1:40 PM, Patrick Wendell pwend...@gmail.com wrote:

 There is already code in place that restricts which tests run
 depending on which code is modified. However, changes inside of
 Spark's core currently require running all dependent tests. If you
 have some ideas about how to improve that heuristic, it would be
 great.

 - Patrick

 On Tue, Aug 25, 2015 at 1:33 PM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hello y'all,
 
  So I've been getting kinda annoyed with how many PR tests have been
  timing out. I took one of the logs from one of my PRs and started to
  do some crunching on the data from the output, and here's a list of
  the 5 slowest suites:
 
  307.14s HiveSparkSubmitSuite
  382.641s VersionsSuite
  398s CliSuite
  410.52s HashJoinCompatibilitySuite
  2508.61s HiveCompatibilitySuite
 
  Looking at those, I'm not surprised at all that we see so many
  timeouts. Is there any ongoing effort to trim down those tests
  (especially HiveCompatibilitySuite) or somehow restrict when they're
  run?
 
  Almost 1 hour to run a single test suite that affects a rather
  isolated part of the code base looks a little excessive to me.
 
  --
  Marcelo
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org