[GitHub] spark pull request #19816: [WIP][SPARK-21693][FOLLOWUP][R] Reduce shuffle pa...

HyukjinKwon Fri, 24 Nov 2017 08:20:29 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19816#discussion_r153000843
  
    --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
    @@ -3021,41 +3021,54 @@ test_that("dapplyCollect() on DataFrame with a 
binary column", {
     })
     
     test_that("repartition by columns on DataFrame", {
    -  df <- createDataFrame(
    --- End diff --
    
    Actual diff is:
    
    ```R
      # The tasks here launch R workers with shuffles. So, we decrease the 
number of shuffle
      # partitions to reduce the number of the tasks to speed up the test. This 
is particularly
      # slow on Windows because the R workers are unable to be forked. See also 
SPARK-21693.
      conf <- callJMethod(sparkSession, "conf")
      value <- callJMethod(conf, "get", "spark.sql.shuffle.partitions")
      callJMethod(conf, "set", "spark.sql.shuffle.partitions", "5")
      tryCatch({
        ...
      },
      finally = {
        # Resetting the conf back to default value
        callJMethod(conf, "set", "spark.sql.shuffle.partitions", value)
      })
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19816: [WIP][SPARK-21693][FOLLOWUP][R] Reduce shuffle pa...

Reply via email to