[ https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025878#comment-16025878 ]
Shivaram Venkataraman commented on SPARK-20877: ----------------------------------------------- I've been investigating the breakdown of time taken by each test case by using the List reporter in testthat (https://github.com/hadley/testthat/blob/master/R/reporter-list.R#L7). The relevant code change in run-all.R was {code} res <- test_package("SparkR", reporter="list") sink(stderr(), type = "output") write.table(res, sep=",") sink(NULL, type = "output") {code} The results from running `./R/check-cran.sh` on my mac are at https://gist.github.com/shivaram/2923bc8535b3d71e710aa760935a2c0e -- The table is sorted by time taken to run tests with longest tests first. I am trying to get a similar table from a Windows VM to see how similar or different it looks. Couple of takeaways - The gapply and few of the MLlib tests dominate the overall time taken - I think the Windows runs might be slower because we don't use daemons in Windows. This coupled with the fact that we have 200 reducers by default for some of the group by tests leads to the slower runs on Windows is my current guess. Getting better timings would help verify this. > Investigate if tests will time out on CRAN > ------------------------------------------ > > Key: SPARK-20877 > URL: https://issues.apache.org/jira/browse/SPARK-20877 > Project: Spark > Issue Type: Sub-task > Components: SparkR > Affects Versions: 2.2.0 > Reporter: Felix Cheung > -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org