[
https://issues.apache.org/jira/browse/SPARK-47385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-47385.
---------------------------------
> Tuple encoder produces wrong results with Option inputs
> -------------------------------------------------------
>
> Key: SPARK-47385
> URL: https://issues.apache.org/jira/browse/SPARK-47385
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.3.4
> Reporter: Chenhao Li
> Assignee: Chenhao Li
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.2, 3.4.3, 4.0.0
>
>
>
> The behavior of tupled encoders on the Option type was changed by
> https://github.com/apache/spark/pull/40755.
> {code:java}
> import org.apache.spark.sql.{Encoders, Encoder}
> case class Required(name: String)
> case class Optional(name: String)
> implicit val enc: Encoder[(Required, Option[Optional])] =
> Encoders.tuple(Encoders.product[Required],
> Encoders.product[Option[Optional]])
>
> spark.createDataFrame(Seq(
> (Required("1"), Some(Optional("1"))),
> (Required("2"), None)
> )).as[(Required, Option[Optional])].collect(){code}
> Before the PR, the result is:
> {code:java}
> Array((Required(1),Some(Optional(1))), (Required(2),None)){code}
> After the PR, the result is:
> {code:java}
> Array((Required(1),Some(Optional(1))), (Required(2),null)) {code}
> which is incorrect because the original input is None rather than null.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]