[ 
https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566457#comment-15566457
 ] 

Shivaram Venkataraman commented on SPARK-17781:
-----------------------------------------------

[~falaki] I looked at this a bit more today and it looks like this problem is 
specific to dapplyCollect -- what happens here is that we dont have the schema 
for the output table, so we serialize it as a byteArray [1] and rely on the 
driver to do the conversion / deserialization while running collect. I couldn't 
trace this part to the end, but it looks like this gets deserialized in [2] and 
the call to unserialize there interprets the bytes as double instead of date. 
I'm not sure what is a good fix for this as well.


[1] https://github.com/apache/spark/blob/master/R/pkg/inst/worker/worker.R#L75
[2] https://github.com/apache/spark/blob/master/R/pkg/R/DataFrame.R#L1431

> datetime is serialized as double inside dapply()
> ------------------------------------------------
>
>                 Key: SPARK-17781
>                 URL: https://issues.apache.org/jira/browse/SPARK-17781
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.0.0
>            Reporter: Hossein Falaki
>
> When we ship a SparkDataFrame to workers for dapply family functions, inside 
> the worker DateTime objects are serialized as double.
> To reproduce:
> {code}
> df <- createDataFrame(data.frame(id = 1:10, date = Sys.Date()))
> dapplyCollect(df, function(x) { return(x$date) })
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to