[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240530#comment-14240530 ] Nicholas Chammas commented on SPARK-3431: - For the record, the suite that I'm running is as follows: {code} sbt/sbt -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive -Phive-thriftserver 'testOnly org.apache.spark.sql.hive.execution.HiveQuerySuite' {code} I modified it to print the current working directory and confirmed that at least that is different when JVMs are forked vs. not forked (i.e. I just comment out [this line|https://github.com/apache/spark/pull/3564/files#diff-c3580fe26fb42eb3aac6e180ae11e947R440]). > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > Attachments: SPARK-3431-srowen-attempt.patch > > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240519#comment-14240519 ] Nicholas Chammas commented on SPARK-3431: - OK, thanks for the updates, Sean and Nicolas. On my side, I've gone back to testing with SBT to better understand what's going wrong there. Specifically, why the [working directory appears to be different|https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14236540&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236540] when when we fork the JVM or not. I came across [this SBT issue|https://github.com/sbt/sbt/issues/1032] which seems to document a known behavior of SBT in multiproject builds. Forking vs. not forking does appear to change the working directory, which I can confirm is what broke the HiveQuerySuite test with {{java.io.IOException: Cannot run program "/usr/bin/hadoop"}}. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > Attachments: SPARK-3431-srowen-attempt.patch > > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240497#comment-14240497 ] Nicolas Liochon commented on SPARK-3431: Yep, it seems ok from a maven point of view. The safest at the beginning it to try forkCount=1/reuseForks=false, then you can increase the forkCount. The default is forkCount=1/reuseForks=true, but I doubt it's the issue as Sean already reproduced it outside of maven. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > Attachments: SPARK-3431-srowen-attempt.patch > > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239988#comment-14239988 ] Sean Owen commented on SPARK-3431: -- Hm. Well, when I run {{BlockTransferMessagesSuite}} by itself in my IDE, it also fails. I'm wondering if we're simply discovering that lots of the Java tests don't actually succeed. Step 1 may be SPARK-4159, getting Java tests running too. I think I can do that. Once that's cleared up, which will entail something a lot like my patch here, I think it's easier to move forward (possibly) with surefire for all, parallel tests. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > Attachments: SPARK-3431-srowen-attempt.patch > > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238769#comment-14238769 ] Nicholas Chammas commented on SPARK-3431: - [~nkeywal] - I took a quick look at HBase's {{pom.xml}} based on your comments in the Spark [dev list discussion|http://apache-spark-developers-list.1001551.n3.nabble.com/Unit-tests-in-lt-5-minutes-td7757.html] about speeding up unit tests. It looks a bit complex, but perhaps Spark's pom file will eventually end up looking similar for tests. For Spark, I've taken an initial step by just having Maven use Surefire to run tests ([here's the patch|https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14238666&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14238666]), but tests aren't running successfully. Is there anything off the top of your head that I've obviously missed? > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238703#comment-14238703 ] Nicholas Chammas commented on SPARK-3431: - Here are some of the errors: {code} Running org.apache.spark.network.shuffle.BlockTransferMessagesSuite Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec <<< FAILURE! - in org.apache.spark.network.shuffle.BlockTransferMessagesSuite serializeOpenShuffleBlocks(org.apache.spark.network.shuffle.BlockTransferMessagesSuite) Time elapsed: 0.008 sec <<< FAILURE! java.lang.AssertionError: Writable bytes remain: 28 at org.apache.spark.network.shuffle.protocol.BlockTransferMessage.toByteArray(BlockTransferMessage.java:73) at org.apache.spark.network.shuffle.BlockTransferMessagesSuite.checkSerializeDeserialize(BlockTransferMessagesSuite.java:39) at org.apache.spark.network.shuffle.BlockTransferMessagesSuite.serializeOpenShuffleBlocks(BlockTransferMessagesSuite.java:30) Running org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite Tests run: 3, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 0.237 sec <<< FAILURE! - in org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite testRegisterExecutor(org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite) Time elapsed: 0.023 sec <<< FAILURE! java.lang.AssertionError: Writable bytes remain: 18 at org.apache.spark.network.shuffle.protocol.BlockTransferMessage.toByteArray(BlockTransferMessage.java:73) at org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite.testRegisterExecutor(ExternalShuffleBlockHandlerSuite.java:63) testOpenShuffleBlocks(org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite) Time elapsed: 0.017 sec <<< FAILURE! java.lang.AssertionError: Writable bytes remain: 30 at org.apache.spark.network.shuffle.protocol.BlockTransferMessage.toByteArray(BlockTransferMessage.java:73) at org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite.testOpenShuffleBlocks(ExternalShuffleBlockHandlerSuite.java:80) testBadMessages(org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite) Time elapsed: 0.003 sec <<< FAILURE! java.lang.AssertionError: Writable bytes remain: 37 at org.apache.spark.network.shuffle.protocol.BlockTransferMessage.toByteArray(BlockTransferMessage.java:73) at org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite.testBadMessages(ExternalShuffleBlockHandlerSuite.java:113) {code} I'll remove Surefire from the dependency list. When you say "also refer to it under " where else exactly do I need to add references? > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238686#comment-14238686 ] Sean Owen commented on SPARK-3431: -- What are the errors? Problems with the tests or the test config? I don't think you need to make the plugin a dependency since it isn't something the code uses. You declare and configure it in , and then also refer to it under so that all submodules activate surefire. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238666#comment-14238666 ] Nicholas Chammas commented on SPARK-3431: - OK, here's a patch for {{pom.xml}} that represents my first attempt at having Maven use Surefire. {code} diff --git a/pom.xml b/pom.xml index b7df53d..78a5b8a 100644 --- a/pom.xml +++ b/pom.xml @@ -533,6 +533,12 @@ ${scala.version} +org.apache.maven.plugins +maven-surefire-plugin +2.17 +test + + org.scalatest scalatest_${scala.binary.version} 2.2.1 @@ -946,15 +952,6 @@ maven-surefire-plugin 2.17 - -true - - - - org.scalatest - scalatest-maven-plugin - 1.0 - ${project.build.directory}/surefire-reports . SparkTestSuite.txt @@ -969,6 +966,12 @@ ${test_classpath} true + +**/*Suite.java +**/*Test.java +**/*Suite.scala +**/*Test.scala + {code} I'm building and running tests as follows: {code} mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean package mvn -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test {code} Does that look sensible to you? The build runs fine, but when I run tests in this way, I get errors in the {{org.apache.spark.network.shuffle.BlockTransferMessagesSuite}} and a few other {{network.shuffle}} suites. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238615#comment-14238615 ] Sean Owen commented on SPARK-3431: -- Surefire is definitely the main Maven testing plugin and has all the bells and whistles. scalatest is a fork of a quite old verison. Well, in theory I think these are all the things that need to happen: - Make sure that the test-compile phase compiles all of the Scala-based tests as well as Java-based tests. I am pretty sure this happens correctly already from the Maven Scala plugin. - Port the scalatest config to the surefire plugin. I bet it all Just Works given that scalatest is derived from surefire. - Delete scalatest config - Un-disable the surefire config - Probably add config to make sure "" includes all of the names of all Java and Scala tests Then you get a lot of parallelization options for sure. Off the top of my head it should work, but then again, maybe there was a good reason surefire was never used. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238596#comment-14238596 ] Nicholas Chammas commented on SPARK-3431: - Thanks for assigning the issue to me, Josh. [~srowen] - I'm starting to look at Maven + Surefire. Surefire seems to be the most mature and fully-featured test framework among the ones we've discussed here. What would it take to have Maven run tests as they are (no parallelization) using Surefire instead of ScalaTest? I'm having trouble understanding how {{pom.xml}} needs to be updated, assuming that's all that needs to be updated. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238330#comment-14238330 ] Nicholas Chammas commented on SPARK-3431: - I am currently (and have been) actively working on this issue. Can someone assign this issue to me? I don't appear to be able to do that myself. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236540#comment-14236540 ] Nicholas Chammas commented on SPARK-3431: - Here's an example failure I don't understand. I fire up {{sbt/sbt}} with {{SparkBuild.scala}} at [this version|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala]: {code} def groupBySuite(tests: Seq[TestDefinition], javaOptions: Seq[String]) = { tests groupBy (_.name.split('.').slice(0,4).mkString(".")) map { case (suite, tests) => new Group( name = suite, tests = tests, // runPolicy = Tests.InProcess) runPolicy = SubProcess(javaOptions = javaOptions)) } toSeq } testGrouping in Test <<= (definedTests in Test, javaOptions in Test) map groupBySuite, {code} Then I run this at the SBT prompt: {code} testOnly org.apache.spark.sql.hive.execution.HiveQuerySuite {code} I get a lot of errors, but this one stands out: {code} 21:53:56.662 WARN org.apache.spark.sql.hive.execution.HiveQuerySuite: Running query 1/1 with hive. java.io.IOException: Cannot run program "/usr/bin/hadoop" (in directory "/path/to/my/copy/of/spark"): error=2, No such file or directory {code} If I comment out [the {{testGrouping in Test}} line|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L429], the test runs fine. So it smells like the forked JVMs are somehow not getting passed the [configured paths|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L403-L418] or something. There are some related posts about this [on Stack Overflow|http://stackoverflow.com/questions/18002205/sbt-test-only-not-picking-up-jvm-option-when-forking-a-jvm-for-tests] and [SBT's issue tracker|https://github.com/sbt/sbt/issues/975]. I'm not sure how to proceed with SBT, or whether I've identified a legitimate blocker or not. I may just move on to Maven unless I make some kind of breakthrough. Any pointers would be appreciated. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234783#comment-14234783 ] Nicholas Chammas commented on SPARK-3431: - As an aside, I expect there to be some work required to let certain tests play nicely with one another. But if we figure out how to specify test groupings and make sure the forked JVMs are configured correctly, refactoring tests where necessary should be very doable. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234702#comment-14234702 ] Nicholas Chammas commented on SPARK-3431: - I think I'm on to something, but I need some help. I think I understand how to tell SBT to fork JVMs for tests, and I also think I got how to specify how the tests should be grouped in the various forked JVMs. It's not working because I think the forked JVMs are not getting passed all the options they need. Basically, I don't think that the reference to {{javaOptions}} [here in this line|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L429] actually has all the options [defined earlier|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L403-L418]. I don't know much Scala. If anyone could review what I have so far give me some pointers, that would be great! You can see all the variations I've tried along with the associated output in [the open pull request|https://github.com/apache/spark/pull/3564]. I know we want to get this working with Maven, but I figured getting it to work first with SBT wouldn't be a bad thing. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233568#comment-14233568 ] Nicholas Chammas commented on SPARK-3431: - [~joshrosen] I tried [that patch you posted earlier here|https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14168038&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14168038]. It appears to fork a JVM for every individual test (e.g. {{org.apache.spark.streaming.DurationSuite}}). When I tried it out on Jenkins, the tests [timed out after 2 hours|https://github.com/apache/spark/pull/3564#issuecomment-65349149]. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232309#comment-14232309 ] Apache Spark commented on SPARK-3431: - User 'nchammas' has created a pull request for this issue: https://github.com/apache/spark/pull/3564 > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220065#comment-14220065 ] Josh Rosen commented on SPARK-3431: --- [~nchammas]: It's been a while since I tried that patch, so I don't remember offhand, but it seemed significantly faster. Tests completed ~20 minutes-ish on my laptop, maybe? Thats just a guess though; I could be misremembering. The port contention issues should have been solved; even without parallelization in Maven, we still run into the risk of multiple Jenkins builds running on the same box contending for ports. AFAIK we haven't seen any recent failures due to port contention, so I think it should be safe to increase the degree of parallelism. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1421#comment-1421 ] Nicholas Chammas commented on SPARK-3431: - [~joshrosen] - Per [this comment | https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14168038&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14168038], are you saying we have a way to parallelize tests and the only problem is the interleaved output? How quickly do tests run in that mode? I thought another problem we had with parallelizing tests was that certain tests fought over the same ports or something. Is that not the case? > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173618#comment-14173618 ] Sean Owen commented on SPARK-3431: -- Yes that should be what scalatest does. It is a fork of an old surefire so only has a very few options. This parallelization failed as above for a few reasons. I have not gotten surefire to run the scala tests > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173300#comment-14173300 ] Patrick Wendell commented on SPARK-3431: [~srowen] - just wondering, is it trivial to parallelize the tests in maven at the granularity of test suites? > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169087#comment-14169087 ] Sean Owen commented on SPARK-3431: -- I just tried parallelizing scalatest and it failed fairly spectacularly. With forkMode=once, lots of errors pop out like: {code} akka.actor.InvalidActorNameException: actor name [LocalBackendActor] is not unique! org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 11.0 failed 1 times, most recent failure: Lost task 1.0 in stage 11.0 (TID 10, localhost): java.io.IOException: PARSING_ERROR(2) org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) {code} I think it may work with even more fine-grained separation of tests into JVMs, as evidenced by Josh's success with SBT, but scalatest doesn't support that. (surefire does.) With forkMode=never I see different errors: {code} java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries! {code} Maybe the several UIs start up at much more the same time when not forking JVMs, and some end up failing through 16 retries as so many compete to allocate ports from 4040 onwards. This too might be better if you could control the level of parallelism, and again surefire does that. So I will try to see if surefire can be used, but this probably also indicates some more work in the tests could make them more parallel-friendly too. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168948#comment-14168948 ] Patrick Wendell commented on SPARK-3431: If we can get the maven build times down to be similar or less than that of SBT, I'd prefer to use it to run the tests. So looking at parallel test execution in Maven would be great. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168038#comment-14168038 ] Josh Rosen commented on SPARK-3431: --- I've been playing around with this via SBT configurations and I've come up with something that allows multiple test suites to execute in parallel with each suite in its own JVM (at far as I can tell): {code} diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 01a5b20..4e84c94 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -21,6 +21,7 @@ import scala.collection.JavaConversions._ import sbt._ import sbt.Classpaths.publishTask import sbt.Keys._ +import sbt.Tests._ import sbtunidoc.Plugin.genjavadocSettings import org.scalastyle.sbt.ScalastylePlugin.{Settings => ScalaStyleSettings} import com.typesafe.sbt.pom.{PomBuild, SbtPomKeys} @@ -333,6 +334,17 @@ object Unidoc { object TestSettings { import BuildCommons._ + // See http://stackoverflow.com/questions/15798341 for notes on how to fork + // a new JVM for each test in SBT: + def singleTests(tests: Seq[TestDefinition]) = +tests map { test => + new Group( +name = test.name, +tests = Seq(test), +runPolicy = SubProcess(javaOptions = Seq.empty[String])) +} + + lazy val settings = Seq ( // Fork new JVMs for tests and set Java options for those fork := true, @@ -352,9 +364,9 @@ object TestSettings { testOptions += Tests.Argument(TestFrameworks.JUnit, "-v", "-a"), // Enable Junit testing. libraryDependencies += "com.novocode" % "junit-interface" % "0.9" % "test", -// Only allow one test at a time, even across projects, since they run in the same JVM -parallelExecution in Test := false, -concurrentRestrictions in Global += Tags.limit(Tags.Test, 1), +parallelExecution in Test := true, +testGrouping in Test <<= definedTests in Test map singleTests, +logBuffered in Test := true, // Remove certain packages from Scaladoc scalacOptions in (Compile, doc) := Seq( "-groups", {code} One snag that I ran into: it seems that running multiple tests suites in parallel in separate JVMs leads to interleaved test output, making it hard to debug failures or hangs: https://groups.google.com/forum/#!topic/simple-build-tool/SOq8gl4zd6E. I think that we need to fix this issue before enabling parallel tests. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143835#comment-14143835 ] Sean Owen commented on SPARK-3431: -- For your experiments, scalatest just copies an old subset of surefire's config: http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin vs http://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html You can see discussion of how forkMode works: http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html Bad news is that scalatest's support is much more limited, but parallel=true and forkMode=once might do the trick. Otherwise... I guess we can figure out if it's realistic to use standard surefire instead of scalatest. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143744#comment-14143744 ] Nicholas Chammas commented on SPARK-3431: - I see. I'll try to look into it then. I don't know much about Maven, frankly, but this sounds doable for the relative n00b. Since for starters we're just gonna try parallelizing the execution of entire test suites, we may not need to make many modifications to the tests upfront. We'll see. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143724#comment-14143724 ] Sean Owen commented on SPARK-3431: -- It's trivial to configure Maven surefire/failsafe to execute tests in parallel. It can parallelize by class or method, fork or not, control number of concurrent forks as a multiple of cores, etc. For example, it's no problem to make test classes use their own JVM, and not even reuse JVMs if you don't want. The harder part is making the tests play nice with each other on one machine when it comes to shared resources: files and ports, really. I think the tests have had several passes of improvements to reliably use their own temp space, and try to use an unused port, but this is one typical cause of test breakage. It's not yet clear that tests don't clobber each other by trying to use the same default Spark working dir or something. Finally, some tests that depend on a certain sequence of random numbers may need to be made more robust. but the parallelization is trivial in Maven, at least. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143707#comment-14143707 ] Nicholas Chammas commented on SPARK-3431: - {quote} Do you know how maven / sbt plugins handle this? {quote} Not really. What I can do for starters is just experiment with GNU parallel and see how it works. {quote} The GNU parallel approach ... has the nice advantage of only affecting Jenkins {quote} Well, if we are modifying {{dev/run-tests}} then developers should also be able to use it locally. The contributing guide recommends running tests using that script. If we do go the GNU parallel route, we can have it trigger only if it detects GNU parallel on the host. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143684#comment-14143684 ] Josh Rosen commented on SPARK-3431: --- [~nchammas] I'm not sure. The different test suites depend on the same build artifacts, but it looks like we call {{sbt assembly}} before running any tests. The GNU parallel approach would certainly be easy to implement and it has the nice advantage of only affecting Jenkins, but I have one concern about test reporting. How will output from tests be printed and will the test report XML files be generated at the same locations? It might be confusing to see the output of several test suites interleaved in an arbitrary way. Do you know how maven / sbt plugins handle this? > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143656#comment-14143656 ] Nicholas Chammas commented on SPARK-3431: - [~joshrosen] I can take a crack at this in the next week or so if it's a simple matter of breaking up [this line|https://github.com/apache/spark/blob/56dae30ca70489a62686cb245728b09b2179bb5a/dev/run-tests#L170] into several invocations of {{sbt}} and parallelizing them with [GNU parallel|http://www.gnu.org/software/parallel/]. Would that work? I remember on the dev list we were discussing using some plugin to Maven to parallelize tests, but I don't know much about that at this time. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3431) Parallelize execution of tests
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143559#comment-14143559 ] Josh Rosen commented on SPARK-3431: --- It would be great to address this soon, since several open PRs plan to add expensive new test suites (Hive integration tests, Selenium tests for the web UI, etc.). There are some thread-safety issues when running multiple SparkContexts in the same JVM, so for now we're restricted to running one test suite per JVM. However, I think we should be able to parallelize the execution of tests from different subprojects, e.g. by running Spark SQL tests in parallel with Spark Streaming tests (each using its own JVM). Our Jenkins cluster is pretty underutilized, so I don't think this will cause problems. We also recently increased the file descriptor ulimits, so this shouldn't cause any issues with port exhaustion, etc. > Parallelize execution of tests > -- > > Key: SPARK-3431 > URL: https://issues.apache.org/jira/browse/SPARK-3431 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Nicholas Chammas > > Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common > strategy to cut test time down is to parallelize the execution of the tests. > Doing that may in turn require some prerequisite changes to be made to how > certain tests run. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org