[jira] [Updated] (SPARK-42005) SparkR cannot collect dataframe with NA in a date column along with another timestamp column

Vivek Atal (Jira) Wed, 11 Jan 2023 19:58:07 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-42005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vivek Atal updated SPARK-42005:
-------------------------------
    Description: 
This issue seems to be related with 
https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by 
[https://github.com/apache/spark/pull/15421] .

If there exists a column of data type `date` which is completely NA, and 
another column of data type `timestamp`, the SparkR cannot collect that Spark 
dataframe into R dataframe.

The reproducible code snippet is below. 
{code:java}
df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01"))
SparkR::collect(SparkR::createDataFrame(df))

#> Error in handleErrors(returnStatus, conn): org.apache.spark.SparkException: 
Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 25.0 (TID 25) 
(ip-10-172-210-194.us-west-2.compute.internal executor driver): 
java.lang.IllegalArgumentException: Invalid type N
#> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94)
#> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68)
#> at #> 
org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129)
#> at 
org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128)
#> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
#> at scala.collection.immutable.Range.foreach(Range.scala:158)
#> ...{code}
This issue does not appear If the column of `date` data type is {_}not 
missing{_}. Or if there _does not exist_ any other column with data type as 
`timestamp`.
{code:java}
df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01"))
SparkR::collect(SparkR::createDataFrame(df))

#>            x             y                                                   
      
#> 1     2022-01-01    2022-01-01{code}
or
{code:java}
df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01"))
SparkR::collect(SparkR::createDataFrame(df))

#>            x             y
#> 1        <NA>       2022-01-01{code}
 

 

  was:
This issue seems to be related with 
https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by 
[https://github.com/apache/spark/pull/15421] .

If there exists a column of data type `date` which is completely NA, and 
another column of data type `timestamp`, the SparkR cannot collect that Spark 
dataframe into R dataframe.

The reproducible code snippet is below. 
{code:java}
df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01"))
SparkR::collect(SparkR::createDataFrame(df))

#> Error in handleErrors(returnStatus, conn): org.apache.spark.SparkException: 
Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 25.0 (TID 25) 
(ip-10-172-210-194.us-west-2.compute.internal executor driver): 
java.lang.IllegalArgumentException: Invalid type N
#> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94)
#> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68)
#> at #> 
org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129)
#> at 
org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128)
#> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
#> at scala.collection.immutable.Range.foreach(Range.scala:158)
#> ...{code}
This issue does not appear If the column of `date` data type is {_}not 
missing{_}. Or if there _does not exist_ any other column with data type as 
`timestamp`.
{code:java}
df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01"))
SparkR::collect(SparkR::createDataFrame(df))

#>            x             y                                                   
      
#> 1     2022-01-01    2022-01-01{code}
or

 
{code:java}
df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01"))
SparkR::collect(SparkR::createDataFrame(df))

#>            x             y
#> 1        <NA>       2022-01-01{code}
 

 


> SparkR cannot collect dataframe with NA in a date column along with another 
> timestamp column
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-42005
>                 URL: https://issues.apache.org/jira/browse/SPARK-42005
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 3.3.0
>            Reporter: Vivek Atal
>            Priority: Major
>
> This issue seems to be related with 
> https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by 
> [https://github.com/apache/spark/pull/15421] .
> If there exists a column of data type `date` which is completely NA, and 
> another column of data type `timestamp`, the SparkR cannot collect that Spark 
> dataframe into R dataframe.
> The reproducible code snippet is below. 
> {code:java}
> df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01"))
> SparkR::collect(SparkR::createDataFrame(df))
> #> Error in handleErrors(returnStatus, conn): 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 
> (TID 25) (ip-10-172-210-194.us-west-2.compute.internal executor driver): 
> java.lang.IllegalArgumentException: Invalid type N
> #> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94)
> #> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68)
> #> at #> 
> org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129)
> #> at 
> org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128)
> #> at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> #> at scala.collection.immutable.Range.foreach(Range.scala:158)
> #> ...{code}
> This issue does not appear If the column of `date` data type is {_}not 
> missing{_}. Or if there _does not exist_ any other column with data type as 
> `timestamp`.
> {code:java}
> df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01"))
> SparkR::collect(SparkR::createDataFrame(df))
> #>            x             y                                                 
>         
> #> 1     2022-01-01    2022-01-01{code}
> or
> {code:java}
> df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01"))
> SparkR::collect(SparkR::createDataFrame(df))
> #>            x             y
> #> 1        <NA>       2022-01-01{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42005) SparkR cannot collect dataframe with NA in a date column along with another timestamp column

Reply via email to