[ 
https://issues.apache.org/jira/browse/SPARK-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566541#comment-15566541
 ] 

Hossein Falaki commented on SPARK-17781:
----------------------------------------

[~shivaram] Thanks for looking into it. I think the problem applies to 
{{dapply}} as well. For example this fails:
{code}
> df <- createDataFrame(data.frame(id = 1:10, date = Sys.Date()))
> collect(dapply(df, function(x) {data.frame(res = x$date)}, schema = 
> structType(structField("res", "date"))))
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 52.0 failed 4 times, most recent failure: Lost task 0.3 in stage 52.0 
(TID 10114, 10.0.229.211): java.lang.RuntimeException: java.lang.Double is not 
a valid external type for schema of date
{code}

I spent a few hours getting to the root of it. We have the correct type all the 
way until {{readList}} in {{deserialize.R}}. I instrumented that function. We 
get the correct type from {{readObject()}} but once it is placed in the list it 
loses its type.

> datetime is serialized as double inside dapply()
> ------------------------------------------------
>
>                 Key: SPARK-17781
>                 URL: https://issues.apache.org/jira/browse/SPARK-17781
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.0.0
>            Reporter: Hossein Falaki
>
> When we ship a SparkDataFrame to workers for dapply family functions, inside 
> the worker DateTime objects are serialized as double.
> To reproduce:
> {code}
> df <- createDataFrame(data.frame(id = 1:10, date = Sys.Date()))
> dapplyCollect(df, function(x) { return(x$date) })
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to