[ 
https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025878#comment-16025878
 ] 

Shivaram Venkataraman commented on SPARK-20877:
-----------------------------------------------

I've been investigating the breakdown of time taken by each test case by using 
the List reporter in testthat 
(https://github.com/hadley/testthat/blob/master/R/reporter-list.R#L7). The 
relevant code change in run-all.R was
{code}
res <- test_package("SparkR", reporter="list")
sink(stderr(), type = "output")
write.table(res, sep=",")
sink(NULL, type = "output")
{code}

The results from running `./R/check-cran.sh` on my mac are at 
https://gist.github.com/shivaram/2923bc8535b3d71e710aa760935a2c0e -- The table 
is sorted by time taken to run tests with longest tests first. I am trying to 
get a similar table from a Windows VM to see how similar or different it looks. 
Couple of takeaways 

- The gapply and few of the MLlib tests dominate the overall time taken
- I think the Windows runs might be slower because we don't use daemons in 
Windows. This coupled with the fact that we have 200 reducers by default for 
some of the group by tests leads to the slower runs on Windows is my current 
guess. Getting better timings would help verify this.

> Investigate if tests will time out on CRAN
> ------------------------------------------
>
>                 Key: SPARK-20877
>                 URL: https://issues.apache.org/jira/browse/SPARK-20877
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>    Affects Versions: 2.2.0
>            Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to