[ https://issues.apache.org/jira/browse/SPARK-42005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vivek Atal updated SPARK-42005: ------------------------------- Description: This issue seems to be related with https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by [https://github.com/apache/spark/pull/15421] . If there exists a column of data type `date` which is completely NA, and another column of data type `timestamp`, the SparkR cannot collect that Spark dataframe into R dataframe. The reproducible code snippet is below. {code:java} df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> Error in handleErrors(returnStatus, conn): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 (TID 25) (ip-10-172-210-194.us-west-2.compute.internal executor driver): java.lang.IllegalArgumentException: Invalid type N #> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94) #> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68) #> at #> org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129) #> at org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128) #> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) #> at scala.collection.immutable.Range.foreach(Range.scala:158) #> ...{code} This issue does not appear If the column of `date` data type is {_}not missing{_}. Or if there _does not exist_ any other column with data type as `timestamp`. {code:java} df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> x y #> 1 2022-01-01 2022-01-01{code} or {code:java} df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> x y #> 1 <NA> 2022-01-01{code} was: This issue seems to be related with https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by [https://github.com/apache/spark/pull/15421] . If there exists a column of data type `date` which is completely NA, and another column of data type `timestamp`, the SparkR cannot collect that Spark dataframe into R dataframe. The reproducible code snippet is below. {code:java} df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> Error in handleErrors(returnStatus, conn): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 (TID 25) (ip-10-172-210-194.us-west-2.compute.internal executor driver): java.lang.IllegalArgumentException: Invalid type N #> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94) #> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68) #> at #> org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129) #> at org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128) #> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) #> at scala.collection.immutable.Range.foreach(Range.scala:158) #> ...{code} This issue does not appear If the column of `date` data type is {_}not missing{_}. Or if there _does not exist_ any other column with data type as `timestamp`. {code:java} df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> x y #> 1 2022-01-01 2022-01-01{code} or {code:java} df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> x y #> 1 <NA> 2022-01-01{code} > SparkR cannot collect dataframe with NA in a date column along with another > timestamp column > -------------------------------------------------------------------------------------------- > > Key: SPARK-42005 > URL: https://issues.apache.org/jira/browse/SPARK-42005 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 3.3.0 > Reporter: Vivek Atal > Priority: Major > > This issue seems to be related with > https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by > [https://github.com/apache/spark/pull/15421] . > If there exists a column of data type `date` which is completely NA, and > another column of data type `timestamp`, the SparkR cannot collect that Spark > dataframe into R dataframe. > The reproducible code snippet is below. > {code:java} > df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01")) > SparkR::collect(SparkR::createDataFrame(df)) > #> Error in handleErrors(returnStatus, conn): > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 > (TID 25) (ip-10-172-210-194.us-west-2.compute.internal executor driver): > java.lang.IllegalArgumentException: Invalid type N > #> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94) > #> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68) > #> at #> > org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129) > #> at > org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128) > #> at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > #> at scala.collection.immutable.Range.foreach(Range.scala:158) > #> ...{code} > This issue does not appear If the column of `date` data type is {_}not > missing{_}. Or if there _does not exist_ any other column with data type as > `timestamp`. > {code:java} > df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01")) > SparkR::collect(SparkR::createDataFrame(df)) > #> x y > > #> 1 2022-01-01 2022-01-01{code} > or > {code:java} > df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01")) > SparkR::collect(SparkR::createDataFrame(df)) > #> x y > #> 1 <NA> 2022-01-01{code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org