[ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826945#comment-17826945
 ] 

Josh Rosen edited comment on SPARK-46251 at 3/14/24 5:34 AM:
-------------------------------------------------------------

FYI, this looks like it was duplicated by 
https://issues.apache.org/jira/browse/SPARK-47385 which now has a PR open to 
fix it.


was (Author: joshrosen):
FYI, this looks like a duplicate of 
https://issues.apache.org/jira/browse/SPARK-47385 which now has a PR open to 
fix it.

> Spark 3.3.3 tuple encoders built using Encoders.tuple do not correctly cast 
> null into None for Option values
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-46251
>                 URL: https://issues.apache.org/jira/browse/SPARK-46251
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>            Reporter: Will Boulter
>            Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting {{null}} into {{None}} when the 
> target type is an Option. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to