[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

clarkfitzg Wed, 24 Aug 2016 00:59:40 -0700

Github user clarkfitzg commented on the issue:

    https://github.com/apache/spark/pull/14783
  
    This change doesn't appear to make any difference in speed.
    
    ```
    # Wed Aug 24 14:12:12 KST 2016
    # Benchmarking performance before and after dapplyCollect patch
    
    # Downloaded data here:
    # https://s3-us-west-2.amazonaws.com/sparkr-data/nycflights13.csv
    
    library(microbenchmark)
    
    sparkR.session()
    
    df <- read.csv("~/data/nycflights13.csv")
    
    sdf <- createDataFrame(df)
    
    # BEFORE: 7.27 seconds
    # AFTER: 7.20 seconds
    # The patch shouldn't change this at all
    microbenchmark({sdf <- createDataFrame(df)}, times=1)
    
    # BEFORE: 502 seconds
    # AFTER: 508 seconds
    microbenchmark({
        df2 <- dapplyCollect(sdf, function(x) x)
    }, times=1)
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

Reply via email to