[ https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Boulter updated SPARK-46251: --------------------------------- Summary: Spark 3.3.3 tuple encoders built using Encoders.tuple do not correctly cast null into None for Option values (was: Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values) > Spark 3.3.3 tuple encoders built using Encoders.tuple do not correctly cast > null into None for Option values > ------------------------------------------------------------------------------------------------------------ > > Key: SPARK-46251 > URL: https://issues.apache.org/jira/browse/SPARK-46251 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0 > Reporter: Will Boulter > Priority: Major > > In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, > encoder2, ..)}} correctly handle casting {{null}} into {{None}} when the > target type is an Option. > In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes > through as {{null}} which is likely to cause a {{NullPointerException}} for > most Scala code that operates on the Option. The change seems to be related > to the following commit: > [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a] > I have made a reproduction with a couple of examples in a public Github repo > here: > [https://github.com/q-willboulter/spark-tuple-encoders-bug] > The common use case where this is likely to be encountered is while doing any > joins that can return null, e.g. left or outer joins. When casting the result > of a left join it is sensible to wrap the right-hand side in an Option to > handle the case where there is no match. Since 3.3.3 this would fail if the > encoder is derived manually using {{Encoders.tuple(leftEncoder, > rightEncoder).}} > If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at > once using reflection, the encoder works as expected. The bug appears to be > in the following function inside {{ExpressionEncoder.scala}} > {code:java} > def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = > ...{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org