[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-09 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240530#comment-14240530
 ] 

Nicholas Chammas commented on SPARK-3431:
-

For the record, the suite that I'm running is as follows:

{code}
sbt/sbt -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -Pkinesis-asl -Phive 
-Phive-thriftserver 'testOnly 
org.apache.spark.sql.hive.execution.HiveQuerySuite'
{code}

I modified it to print the current working directory and confirmed that at 
least that is different when JVMs are forked vs. not forked (i.e. I just 
comment out [this 
line|https://github.com/apache/spark/pull/3564/files#diff-c3580fe26fb42eb3aac6e180ae11e947R440]).

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
> Attachments: SPARK-3431-srowen-attempt.patch
>
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-09 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240519#comment-14240519
 ] 

Nicholas Chammas commented on SPARK-3431:
-

OK, thanks for the updates, Sean and Nicolas.

On my side, I've gone back to testing with SBT to better understand what's 
going wrong there. Specifically, why the [working directory appears to be 
different|https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14236540&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236540]
 when when we fork the JVM or not.

I came across [this SBT issue|https://github.com/sbt/sbt/issues/1032] which 
seems to document a known behavior of SBT in multiproject builds. Forking vs. 
not forking does appear to change the working directory, which I can confirm is 
what broke the HiveQuerySuite test with {{java.io.IOException: Cannot run 
program "/usr/bin/hadoop"}}.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
> Attachments: SPARK-3431-srowen-attempt.patch
>
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-09 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240497#comment-14240497
 ] 

Nicolas Liochon commented on SPARK-3431:


Yep, it seems ok from a maven point of view. The safest at the beginning it to 
try forkCount=1/reuseForks=false, then you can increase the forkCount. The 
default is forkCount=1/reuseForks=true, but I doubt it's the issue as Sean 
already reproduced it outside of maven.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
> Attachments: SPARK-3431-srowen-attempt.patch
>
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-09 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239988#comment-14239988
 ] 

Sean Owen commented on SPARK-3431:
--

Hm. Well, when I run {{BlockTransferMessagesSuite}} by itself in my IDE, it 
also fails. I'm wondering if we're simply discovering that lots of the Java 
tests don't actually succeed. Step 1 may be SPARK-4159, getting Java tests 
running too. I think I can do that. Once that's cleared up, which will entail 
something a lot like my patch here, I think it's easier to move forward 
(possibly) with surefire for all, parallel tests.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
> Attachments: SPARK-3431-srowen-attempt.patch
>
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-08 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238769#comment-14238769
 ] 

Nicholas Chammas commented on SPARK-3431:
-

[~nkeywal] - I took a quick look at HBase's {{pom.xml}} based on your comments 
in the Spark [dev list 
discussion|http://apache-spark-developers-list.1001551.n3.nabble.com/Unit-tests-in-lt-5-minutes-td7757.html]
 about speeding up unit tests. It looks a bit complex, but perhaps Spark's pom 
file will eventually end up looking similar for tests.

For Spark, I've taken an initial step by just having Maven use Surefire to run 
tests ([here's the 
patch|https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14238666&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14238666]),
 but tests aren't running successfully.

Is there anything off the top of your head that I've obviously missed?

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-08 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238703#comment-14238703
 ] 

Nicholas Chammas commented on SPARK-3431:
-

Here are some of the errors:

{code}
Running org.apache.spark.network.shuffle.BlockTransferMessagesSuite
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec <<< 
FAILURE! - in org.apache.spark.network.shuffle.BlockTransferMessagesSuite
serializeOpenShuffleBlocks(org.apache.spark.network.shuffle.BlockTransferMessagesSuite)
  Time elapsed: 0.008 sec  <<< FAILURE!
java.lang.AssertionError: Writable bytes remain: 28
at 
org.apache.spark.network.shuffle.protocol.BlockTransferMessage.toByteArray(BlockTransferMessage.java:73)
at 
org.apache.spark.network.shuffle.BlockTransferMessagesSuite.checkSerializeDeserialize(BlockTransferMessagesSuite.java:39)
at 
org.apache.spark.network.shuffle.BlockTransferMessagesSuite.serializeOpenShuffleBlocks(BlockTransferMessagesSuite.java:30)

Running org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite
Tests run: 3, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 0.237 sec <<< 
FAILURE! - in org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite
testRegisterExecutor(org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite)
  Time elapsed: 0.023 sec  <<< FAILURE!
java.lang.AssertionError: Writable bytes remain: 18
at 
org.apache.spark.network.shuffle.protocol.BlockTransferMessage.toByteArray(BlockTransferMessage.java:73)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite.testRegisterExecutor(ExternalShuffleBlockHandlerSuite.java:63)

testOpenShuffleBlocks(org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite)
  Time elapsed: 0.017 sec  <<< FAILURE!
java.lang.AssertionError: Writable bytes remain: 30
at 
org.apache.spark.network.shuffle.protocol.BlockTransferMessage.toByteArray(BlockTransferMessage.java:73)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite.testOpenShuffleBlocks(ExternalShuffleBlockHandlerSuite.java:80)

testBadMessages(org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite)
  Time elapsed: 0.003 sec  <<< FAILURE!
java.lang.AssertionError: Writable bytes remain: 37
at 
org.apache.spark.network.shuffle.protocol.BlockTransferMessage.toByteArray(BlockTransferMessage.java:73)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandlerSuite.testBadMessages(ExternalShuffleBlockHandlerSuite.java:113)
{code}

I'll remove Surefire from the dependency list. When you say "also refer to it 
under " where else exactly do I need to add references?

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238686#comment-14238686
 ] 

Sean Owen commented on SPARK-3431:
--

What are the errors? Problems with the tests or the test config?

I don't think you need to make the plugin a dependency since it isn't something 
the code uses. You declare and configure it in , and then 
also refer to it under  so that all submodules activate surefire.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-08 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238666#comment-14238666
 ] 

Nicholas Chammas commented on SPARK-3431:
-

OK, here's a patch for {{pom.xml}} that represents my first attempt at having 
Maven use Surefire.

{code}
diff --git a/pom.xml b/pom.xml
index b7df53d..78a5b8a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -533,6 +533,12 @@
 ${scala.version}
   
   
+org.apache.maven.plugins
+maven-surefire-plugin
+2.17
+test
+  
+  
 org.scalatest
 scalatest_${scala.binary.version}
 2.2.1
@@ -946,15 +952,6 @@
   maven-surefire-plugin
   2.17
   
-
-true
-  
-
-
-  org.scalatest
-  scalatest-maven-plugin
-  1.0
-  
 
${project.build.directory}/surefire-reports
 .
 SparkTestSuite.txt
@@ -969,6 +966,12 @@
   
${test_classpath}
   
true
 
+
+**/*Suite.java
+**/*Test.java
+**/*Suite.scala
+**/*Test.scala
+
   
   
 
{code}

I'm building and running tests as follows:

{code}
mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-thriftserver clean package

mvn -Pyarn -Phadoop-2.3 -Phive -Phive-thriftserver test
{code}

Does that look sensible to you?

The build runs fine, but when I run tests in this way, I get errors in the 
{{org.apache.spark.network.shuffle.BlockTransferMessagesSuite}} and a few other 
{{network.shuffle}} suites.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238615#comment-14238615
 ] 

Sean Owen commented on SPARK-3431:
--

Surefire is definitely the main Maven testing plugin and has all the bells and 
whistles. scalatest is a fork of a quite old verison.

Well, in theory I think these are all the things that need to happen:

- Make sure that the test-compile phase compiles all of the Scala-based tests 
as well as Java-based tests. I am pretty sure this happens correctly already 
from the Maven Scala plugin.
- Port the scalatest config to the surefire plugin. I bet it all Just Works 
given that scalatest is derived from surefire.
- Delete scalatest config
- Un-disable the surefire config
- Probably add config to make sure "" includes all of the names of 
all Java and Scala tests

Then you get a lot of parallelization options for sure.

Off the top of my head it should work, but then again, maybe there was a good 
reason surefire was never used.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-08 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238596#comment-14238596
 ] 

Nicholas Chammas commented on SPARK-3431:
-

Thanks for assigning the issue to me, Josh.

[~srowen] - I'm starting to look at Maven + Surefire. Surefire seems to be the 
most mature and fully-featured test framework among the ones we've discussed 
here.

What would it take to have Maven run tests as they are (no parallelization) 
using Surefire instead of ScalaTest? I'm having trouble understanding how 
{{pom.xml}} needs to be updated, assuming that's all that needs to be updated.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-08 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238330#comment-14238330
 ] 

Nicholas Chammas commented on SPARK-3431:
-

I am currently (and have been) actively working on this issue.

Can someone assign this issue to me? I don't appear to be able to do that 
myself.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-05 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236540#comment-14236540
 ] 

Nicholas Chammas commented on SPARK-3431:
-

Here's an example failure I don't understand.

I fire up {{sbt/sbt}} with {{SparkBuild.scala}} at [this 
version|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala]:

{code}
  def groupBySuite(tests: Seq[TestDefinition], javaOptions: Seq[String]) = {
tests groupBy (_.name.split('.').slice(0,4).mkString(".")) map {
  case (suite, tests) =>
new Group(
  name = suite,
  tests = tests,
  // runPolicy = Tests.InProcess)
  runPolicy = SubProcess(javaOptions = javaOptions))
} toSeq
  }



testGrouping in Test <<= (definedTests in Test, javaOptions in Test) map 
groupBySuite,
{code}

Then I run this at the SBT prompt:

{code}
testOnly org.apache.spark.sql.hive.execution.HiveQuerySuite
{code}

I get a lot of errors, but this one stands out:

{code}
21:53:56.662 WARN org.apache.spark.sql.hive.execution.HiveQuerySuite: Running 
query 1/1 with hive.
java.io.IOException: Cannot run program "/usr/bin/hadoop" (in directory 
"/path/to/my/copy/of/spark"): error=2, No such file or directory
{code}

If I comment out [the {{testGrouping in Test}} 
line|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L429],
 the test runs fine.

So it smells like the forked JVMs are somehow not getting passed the 
[configured 
paths|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L403-L418]
 or something. There are some related posts about this [on Stack 
Overflow|http://stackoverflow.com/questions/18002205/sbt-test-only-not-picking-up-jvm-option-when-forking-a-jvm-for-tests]
 and [SBT's issue tracker|https://github.com/sbt/sbt/issues/975].

I'm not sure how to proceed with SBT, or whether I've identified a legitimate 
blocker or not. I may just move on to Maven unless I make some kind of 
breakthrough. Any pointers would be appreciated.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-04 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234783#comment-14234783
 ] 

Nicholas Chammas commented on SPARK-3431:
-

As an aside, I expect there to be some work required to let certain tests play 
nicely with one another. But if we figure out how to specify test groupings and 
make sure the forked JVMs are configured correctly, refactoring tests where 
necessary should be very doable.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-04 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234702#comment-14234702
 ] 

Nicholas Chammas commented on SPARK-3431:
-

I think I'm on to something, but I need some help. I think I understand how to 
tell SBT to fork JVMs for tests, and I also think I got how to specify how the 
tests should be grouped in the various forked JVMs.

It's not working because I think the forked JVMs are not getting passed all the 
options they need. Basically, I don't think that the reference to 
{{javaOptions}} [here in this 
line|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L429]
 actually has all the options [defined 
earlier|https://github.com/nchammas/spark/blob/ab127b798dbfa9399833d546e627f9651b060918/project/SparkBuild.scala#L403-L418].

I don't know much Scala. If anyone could review what I have so far give me some 
pointers, that would be great!

You can see all the variations I've tried along with the associated output in 
[the open pull request|https://github.com/apache/spark/pull/3564]. I know we 
want to get this working with Maven, but I figured getting it to work first 
with SBT wouldn't be a bad thing.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-03 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233568#comment-14233568
 ] 

Nicholas Chammas commented on SPARK-3431:
-

[~joshrosen] I tried [that patch you posted earlier 
here|https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14168038&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14168038].
 It appears to fork a JVM for every individual test (e.g. 
{{org.apache.spark.streaming.DurationSuite}}). When I tried it out on Jenkins, 
the tests [timed out after 2 
hours|https://github.com/apache/spark/pull/3564#issuecomment-65349149].

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-12-02 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232309#comment-14232309
 ] 

Apache Spark commented on SPARK-3431:
-

User 'nchammas' has created a pull request for this issue:
https://github.com/apache/spark/pull/3564

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-11-20 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220065#comment-14220065
 ] 

Josh Rosen commented on SPARK-3431:
---

[~nchammas]: It's been a while since I tried that patch, so I don't remember 
offhand, but it seemed significantly faster.  Tests completed ~20 minutes-ish 
on my laptop, maybe?  Thats just a guess though; I could be misremembering.

The port contention issues should have been solved; even without 
parallelization in Maven, we still run into the risk of multiple Jenkins builds 
running on the same box contending for ports.  AFAIK we haven't seen any recent 
failures due to port contention, so I think it should be safe to increase the 
degree of parallelism.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-11-20 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1421#comment-1421
 ] 

Nicholas Chammas commented on SPARK-3431:
-

[~joshrosen] - Per [this comment | 
https://issues.apache.org/jira/browse/SPARK-3431?focusedCommentId=14168038&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14168038],
 are you saying we have a way to parallelize tests and the only problem is the 
interleaved output? How quickly do tests run in that mode?

I thought another problem we had with parallelizing tests was that certain 
tests fought over the same ports or something. Is that not the case?

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-10-16 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173618#comment-14173618
 ] 

Sean Owen commented on SPARK-3431:
--

Yes that should be what scalatest does. It is a fork of an old surefire so only 
has a very few options. This parallelization failed as above for a few reasons. 
I have not gotten surefire to run the scala tests 

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-10-15 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173300#comment-14173300
 ] 

Patrick Wendell commented on SPARK-3431:


[~srowen] - just wondering, is it trivial to parallelize the tests in maven at 
the granularity of test suites?

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-10-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169087#comment-14169087
 ] 

Sean Owen commented on SPARK-3431:
--

I just tried parallelizing scalatest and it failed fairly spectacularly. With 
forkMode=once, lots of errors pop out like:

{code}
  akka.actor.InvalidActorNameException: actor name [LocalBackendActor] is not 
unique!

  org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
stage 11.0 failed 1 times, most recent failure: Lost task 1.0 in stage 11.0 
(TID 10, localhost): java.io.IOException: PARSING_ERROR(2)

  org.apache.spark.SparkException: Task not serializable
  at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
{code}

I think it may work with even more fine-grained separation of tests into JVMs, 
as evidenced by Josh's success with SBT, but scalatest doesn't support that. 
(surefire does.)

With forkMode=never I see different errors:

{code}
java.net.BindException: Address already in use: Service 'SparkUI' failed after 
16 retries!
{code}

Maybe the several UIs start up at much more the same time when not forking 
JVMs, and some end up failing through 16 retries as so many compete to allocate 
ports from 4040 onwards.

This too might be better if you could control the level of parallelism, and 
again surefire does that.

So I will try to see if surefire can be used, but this probably also indicates 
some more work in the tests could make them more parallel-friendly too.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-10-12 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168948#comment-14168948
 ] 

Patrick Wendell commented on SPARK-3431:


If we can get the maven build times down to be similar or less than that of 
SBT, I'd prefer to use it to run the tests. So looking at parallel test 
execution in Maven would be great.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-10-11 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168038#comment-14168038
 ] 

Josh Rosen commented on SPARK-3431:
---

I've been playing around with this via SBT configurations and I've come up with 
something that allows multiple test suites to execute in parallel with each 
suite in its own JVM (at far as I can tell):

{code}
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 01a5b20..4e84c94 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -21,6 +21,7 @@ import scala.collection.JavaConversions._
 import sbt._
 import sbt.Classpaths.publishTask
 import sbt.Keys._
+import sbt.Tests._
 import sbtunidoc.Plugin.genjavadocSettings
 import org.scalastyle.sbt.ScalastylePlugin.{Settings => ScalaStyleSettings}
 import com.typesafe.sbt.pom.{PomBuild, SbtPomKeys}
@@ -333,6 +334,17 @@ object Unidoc {
 object TestSettings {
   import BuildCommons._

+  // See http://stackoverflow.com/questions/15798341 for notes on how to fork
+  // a new JVM for each test in SBT:
+  def singleTests(tests: Seq[TestDefinition]) =
+tests map { test =>
+  new Group(
+name = test.name,
+tests = Seq(test),
+runPolicy = SubProcess(javaOptions = Seq.empty[String]))
+}
+
+
   lazy val settings = Seq (
 // Fork new JVMs for tests and set Java options for those
 fork := true,
@@ -352,9 +364,9 @@ object TestSettings {
 testOptions += Tests.Argument(TestFrameworks.JUnit, "-v", "-a"),
 // Enable Junit testing.
 libraryDependencies += "com.novocode" % "junit-interface" % "0.9" % "test",
-// Only allow one test at a time, even across projects, since they run in 
the same JVM
-parallelExecution in Test := false,
-concurrentRestrictions in Global += Tags.limit(Tags.Test, 1),
+parallelExecution in Test := true,
+testGrouping in Test <<= definedTests in Test map singleTests,
+logBuffered in Test := true,
 // Remove certain packages from Scaladoc
 scalacOptions in (Compile, doc) := Seq(
   "-groups",
{code}

One snag that I ran into: it seems that running multiple tests suites in 
parallel in separate JVMs leads to interleaved test output, making it hard to 
debug failures or hangs: 
https://groups.google.com/forum/#!topic/simple-build-tool/SOq8gl4zd6E.  I think 
that we need to fix this issue before enabling parallel tests.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143835#comment-14143835
 ] 

Sean Owen commented on SPARK-3431:
--

For your experiments, scalatest just copies an old subset of surefire's config:

http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin
vs
http://maven.apache.org/surefire/maven-surefire-plugin/test-mojo.html

You can see discussion of how forkMode works:

http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html

Bad news is that scalatest's support is much more limited, but parallel=true 
and forkMode=once might do the trick.
Otherwise... I guess we can figure out if it's realistic to use standard 
surefire instead of scalatest.


> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143744#comment-14143744
 ] 

Nicholas Chammas commented on SPARK-3431:
-

I see. I'll try to look into it then. I don't know much about Maven, frankly, 
but this sounds doable for the relative n00b.

Since for starters we're just gonna try parallelizing the execution of entire 
test suites, we may not need to make many modifications to the tests upfront. 
We'll see.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143724#comment-14143724
 ] 

Sean Owen commented on SPARK-3431:
--

It's trivial to configure Maven surefire/failsafe to execute tests in parallel. 
It can parallelize by class or method, fork or not, control number of 
concurrent forks as a multiple of cores, etc. For example, it's no problem to 
make test classes use their own JVM, and not even reuse JVMs if you don't want.

The harder part is making the tests play nice with each other on one machine 
when it comes to shared resources: files and ports, really. I think the tests 
have had several passes of improvements to reliably use their own temp space, 
and try to use an unused port, but this is one typical cause of test breakage. 
It's not yet clear that tests don't clobber each other by trying to use the 
same default Spark working dir or something.

Finally, some tests that depend on a certain sequence of random numbers may 
need to be made more robust.

but the parallelization is trivial in Maven, at least.  

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143707#comment-14143707
 ] 

Nicholas Chammas commented on SPARK-3431:
-

{quote}
Do you know how maven / sbt plugins handle this?
{quote}

Not really. What I can do for starters is just experiment with GNU parallel and 
see how it works.

{quote}
The GNU parallel approach ... has the nice advantage of only affecting Jenkins
{quote}

Well, if we are modifying {{dev/run-tests}} then developers should also be able 
to use it locally. The contributing guide recommends running tests using that 
script. If we do go the GNU parallel route, we can have it trigger only if it 
detects GNU parallel on the host.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143684#comment-14143684
 ] 

Josh Rosen commented on SPARK-3431:
---

[~nchammas] I'm not sure.

The different test suites depend on the same build artifacts, but it looks like 
we call {{sbt assembly}} before running any tests.  The GNU parallel approach 
would certainly be easy to implement and it has the nice advantage of only 
affecting Jenkins, but I have one concern about test reporting.  How will 
output from tests be printed and will the test report XML files be generated at 
the same locations?  It might be confusing to see the output of several test 
suites interleaved in an arbitrary way.  Do you know how maven / sbt plugins 
handle this?

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143656#comment-14143656
 ] 

Nicholas Chammas commented on SPARK-3431:
-

[~joshrosen] I can take a crack at this in the next week or so if it's a simple 
matter of breaking up [this 
line|https://github.com/apache/spark/blob/56dae30ca70489a62686cb245728b09b2179bb5a/dev/run-tests#L170]
 into several invocations of {{sbt}} and parallelizing them with [GNU 
parallel|http://www.gnu.org/software/parallel/].

Would that work?

I remember on the dev list we were discussing using some plugin to Maven to 
parallelize tests, but I don't know much about that at this time.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Josh Rosen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143559#comment-14143559
 ] 

Josh Rosen commented on SPARK-3431:
---

It would be great to address this soon, since several open PRs plan to add 
expensive new test suites (Hive integration tests, Selenium tests for the web 
UI, etc.).

There are some thread-safety issues when running multiple SparkContexts in the 
same JVM, so for now we're restricted to running one test suite per JVM.  
However, I think we should be able to parallelize the execution of tests from 
different subprojects, e.g. by running Spark SQL tests in parallel with Spark 
Streaming tests (each using its own JVM).

Our Jenkins cluster is pretty underutilized, so I don't think this will cause 
problems.  We also recently increased the file descriptor ulimits, so this 
shouldn't cause any issues with port exhaustion, etc.

> Parallelize execution of tests
> --
>
> Key: SPARK-3431
> URL: https://issues.apache.org/jira/browse/SPARK-3431
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Nicholas Chammas
>
> Running all the tests in {{dev/run-tests}} takes up to 2 hours. A common 
> strategy to cut test time down is to parallelize the execution of the tests. 
> Doing that may in turn require some prerequisite changes to be made to how 
> certain tests run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org