[ https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Boulter updated SPARK-46251: --------------------------------- Summary: Spark 3.3.3 tuple encoders do not correctly cast null into None for Option values (was: Spark 3.3.3 tuple encoders do not correctly casting null into None for Option values) > Spark 3.3.3 tuple encoders do not correctly cast null into None for Option > values > --------------------------------------------------------------------------------- > > Key: SPARK-46251 > URL: https://issues.apache.org/jira/browse/SPARK-46251 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0 > Reporter: Will Boulter > Priority: Major > > In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, > ..)` correctly handle casting `null` into `None` when the target type is an > `Option`. > > In Spark `3.3.3`, this behaviour has changed and the Option value comes > through as `null` which is likely to cause a `NullPointerException` for most > Scala code that operates on the Option. The change seems to be related to the > following commit: > [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] > > I have made a reproduction with a couple of examples in a public Github repo > here: > [https://github.com/q-willboulter/spark-tuple-encoders-bug] > > The common use case where this is likely to be encountered is while doing any > joins that can return null, e.g. left or outer joins. When casting the result > of a left join it is sensible to wrap the right-hand side in an Option to > handle the case where there is no match - since 3.3.3 this could fail if the > encoder is derived manually using `Encoders.tuple(leftEncoder, > rightEncoder)`. If the entire tuple encoder `Encoder[(Left, Option[Right]])` > is derived at once, the encoder works as expected - the bug appears to be in > the following function inside `ExpressionEncoder.scala` > ``` > def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ... > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org