[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566457#comment-15566457 ]
Shivaram Venkataraman commented on SPARK-17781: ----------------------------------------------- [~falaki] I looked at this a bit more today and it looks like this problem is specific to dapplyCollect -- what happens here is that we dont have the schema for the output table, so we serialize it as a byteArray [1] and rely on the driver to do the conversion / deserialization while running collect. I couldn't trace this part to the end, but it looks like this gets deserialized in [2] and the call to unserialize there interprets the bytes as double instead of date. I'm not sure what is a good fix for this as well. [1] https://github.com/apache/spark/blob/master/R/pkg/inst/worker/worker.R#L75 [2] https://github.com/apache/spark/blob/master/R/pkg/R/DataFrame.R#L1431 > datetime is serialized as double inside dapply() > ------------------------------------------------ > > Key: SPARK-17781 > URL: https://issues.apache.org/jira/browse/SPARK-17781 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 2.0.0 > Reporter: Hossein Falaki > > When we ship a SparkDataFrame to workers for dapply family functions, inside > the worker DateTime objects are serialized as double. > To reproduce: > {code} > df <- createDataFrame(data.frame(id = 1:10, date = Sys.Date())) > dapplyCollect(df, function(x) { return(x$date) }) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org