[ https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566541#comment-15566541 ]
Hossein Falaki commented on SPARK-17781: ---------------------------------------- [~shivaram] Thanks for looking into it. I think the problem applies to {{dapply}} as well. For example this fails: {code} > df <- createDataFrame(data.frame(id = 1:10, date = Sys.Date())) > collect(dapply(df, function(x) {data.frame(res = x$date)}, schema = > structType(structField("res", "date")))) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 52.0 failed 4 times, most recent failure: Lost task 0.3 in stage 52.0 (TID 10114, 10.0.229.211): java.lang.RuntimeException: java.lang.Double is not a valid external type for schema of date {code} I spent a few hours getting to the root of it. We have the correct type all the way until {{readList}} in {{deserialize.R}}. I instrumented that function. We get the correct type from {{readObject()}} but once it is placed in the list it loses its type. > datetime is serialized as double inside dapply() > ------------------------------------------------ > > Key: SPARK-17781 > URL: https://issues.apache.org/jira/browse/SPARK-17781 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 2.0.0 > Reporter: Hossein Falaki > > When we ship a SparkDataFrame to workers for dapply family functions, inside > the worker DateTime objects are serialized as double. > To reproduce: > {code} > df <- createDataFrame(data.frame(id = 1:10, date = Sys.Date())) > dapplyCollect(df, function(x) { return(x$date) }) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org