GitHub user titicaca opened a pull request:

    https://github.com/apache/spark/pull/16689

    SPARK-19342 bug fixed in collect method for collecting timestamp column

    ## What changes were proposed in this pull request?
    
    Fix a bug in collect method for collecting timestamp column, the bug can be 
reproduced as shown in the following codes and outputs:
    
    ```
    library(SparkR)
    sparkR.session(master = "local")
    df <- data.frame(col1 = c(0, 1, 2), 
                     col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, 
as.POSIXct("2017-01-01 12:00:01")))
    
    sdf1 <- createDataFrame(df)
    print(dtypes(sdf1))
    df1 <- collect(sdf1)
    print(lapply(df1, class))
    
    sdf2 <- filter(sdf1, "col1 > 0")
    print(dtypes(sdf2))
    df2 <- collect(sdf2)
    print(lapply(df2, class))
    ```
    
    As we can see from the printed output, the column type of col2 in df2 is 
converted to numeric unexpectedly, when NA exists at the top of the column. 
    
    This is caused by method `do.call(c, list)`, if we convert a list, i.e. 
`do.call(c, list(NA, as.POSIXct("2017-01-01 12:00:01"))`, the class of the 
result is numeric instead of POSIXct. 
    
    Therefore, we need to cast the data type of the vector explicitly. 
    
    
    
    ## How was this patch tested?
    
    The patch can be tested manually with the same code above.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/titicaca/spark sparkr-dev

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16689.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16689
    
----
commit a51c2eb54ca672ad63495d0709bd3ae7b254bd14
Author: titicaca <fangzhou.y...@hotmail.com>
Date:   2017-01-24T06:24:47Z

    SPARK-19342 bug fixed in collect method for collecting timestamp column

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to