[jira] [Commented] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-28 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027870#comment-16027870
 ] 

Shivaram Venkataraman commented on SPARK-20877:
---

I managed to get the tests to pass on a Windows VM (Note that the VM has only 1 
core and 4G of memory ). The timing breakdown is at 
https://gist.github.com/shivaram/dc235c50b6369cbc60d859c25b13670d and the 
overall run time was close to 1hr.  I think AppVeyor might have a beefier 
machine ?

Anyways the most expensive tests to run remain to be the same across linux and 
windows -- I think we can disable them when running on CRAN / Windows ? Are 
there other options we have to make these tests run faster ?

> Investigate if tests will time out on CRAN
> --
>
> Key: SPARK-20877
> URL: https://issues.apache.org/jira/browse/SPARK-20877
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-27 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027720#comment-16027720
 ] 

Felix Cheung commented on SPARK-20877:
--

Have a run on AppVeyor
https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1360-master

it ran for 31 min (vs. on Jenkins, <7min)

> Investigate if tests will time out on CRAN
> --
>
> Key: SPARK-20877
> URL: https://issues.apache.org/jira/browse/SPARK-20877
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-26 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026760#comment-16026760
 ] 

Felix Cheung commented on SPARK-20877:
--

ok, I will track down the test run time on windows 

> Investigate if tests will time out on CRAN
> --
>
> Key: SPARK-20877
> URL: https://issues.apache.org/jira/browse/SPARK-20877
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-25 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025878#comment-16025878
 ] 

Shivaram Venkataraman commented on SPARK-20877:
---

I've been investigating the breakdown of time taken by each test case by using 
the List reporter in testthat 
(https://github.com/hadley/testthat/blob/master/R/reporter-list.R#L7). The 
relevant code change in run-all.R was
{code}
res <- test_package("SparkR", reporter="list")
sink(stderr(), type = "output")
write.table(res, sep=",")
sink(NULL, type = "output")
{code}

The results from running `./R/check-cran.sh` on my mac are at 
https://gist.github.com/shivaram/2923bc8535b3d71e710aa760935a2c0e -- The table 
is sorted by time taken to run tests with longest tests first. I am trying to 
get a similar table from a Windows VM to see how similar or different it looks. 
Couple of takeaways 

- The gapply and few of the MLlib tests dominate the overall time taken
- I think the Windows runs might be slower because we don't use daemons in 
Windows. This coupled with the fact that we have 200 reducers by default for 
some of the group by tests leads to the slower runs on Windows is my current 
guess. Getting better timings would help verify this.

> Investigate if tests will time out on CRAN
> --
>
> Key: SPARK-20877
> URL: https://issues.apache.org/jira/browse/SPARK-20877
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-25 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025861#comment-16025861
 ] 

Felix Cheung commented on SPARK-20877:
--

One run with skip_on_cran was 27min

<7min - tests

Given this I think it should be short enough?

[~shivaram]

> Investigate if tests will time out on CRAN
> --
>
> Key: SPARK-20877
> URL: https://issues.apache.org/jira/browse/SPARK-20877
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-25 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025800#comment-16025800
 ] 

Felix Cheung commented on SPARK-20877:
--

According to one run, in Jenkins, the build/run took 34 min, in which

13 min -  lintr (<3 min), Scala doc (9-10)
15-16min - tests
5min - R CMD check (without tests)

So if we add up the last 2 parts it would take 21 min. I tried to force 
NOT_CRAN to false but didn't seem like it was skipping any test - so in it is 
skipping tests on CRAN then it should even be shorter.

> Investigate if tests will time out on CRAN
> --
>
> Key: SPARK-20877
> URL: https://issues.apache.org/jira/browse/SPARK-20877
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20877) Investigate if tests will time out on CRAN

2017-05-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024159#comment-16024159
 ] 

Apache Spark commented on SPARK-20877:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/18104

> Investigate if tests will time out on CRAN
> --
>
> Key: SPARK-20877
> URL: https://issues.apache.org/jira/browse/SPARK-20877
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org