[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Description: 
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting {{null}} into \{{None}} when the target type is 
\{{{}an Option. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 

  was:
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting \{{null}} into \{{None }} when the target type 
is \{{{}an Option. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 


> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting {{null}} into \{{None}} when the 
> target type is \{{{}an Option. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

--

[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Description: 
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting \{{null}} into \{{None }} when the target type 
is \{{{}an Option. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 

  was:
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting null into None when the target 
type is {{{}an Option{}}}. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 


> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting \{{null}} into \{{None }} when the 
> target type is \{{{}an Option. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820

[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using Encoders.tuple do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Summary: Spark 3.3.3 tuple encoders built using Encoders.tuple do not 
correctly cast null into None for Option values  (was: Spark 3.3.3 tuple 
encoders built using `Encoders.tuple` do not correctly cast null into None for 
Option values)

> Spark 3.3.3 tuple encoders built using Encoders.tuple do not correctly cast 
> null into None for Option values
> 
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting {{null}} into {{None}} when the 
> target type is an Option. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Description: 
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting {{null}} into {{None}} when the target type is 
an Option. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 

  was:
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting {{null}} into \{{None}} when the target type is 
\{{{}an Option. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 


> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting {{null}} into {{None}} when the 
> target type is an Option. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Description: 
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting null into None when the target 
type is {{{}an Option{}}}. 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 

  was:
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting {{null}} into {{None }}when the target type is 
an {{{}Option{}}}. 

 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

 

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}

 
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 


> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting null into None when 
> the target type is {{{}an Option{}}}. 
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassia

[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Description: 
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2, 
..)}} correctly handle casting {{null}} into {{None }}when the target type is 
an {{{}Option{}}}. 

 

In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
through as {{null}} which is likely to cause a {{NullPointerException}} for 
most Scala code that operates on the Option. The change seems to be related to 
the following commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

 

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match. Since 3.3.3 this would fail if the 
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}

If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
once using reflection, the encoder works as expected. The bug appears to be in 
the following function inside {{ExpressionEncoder.scala}}

 
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
 

  was:
In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, 
..)` correctly handle casting `null` into `None` when the target type is an 
`Option`. 

 

In Spark `3.3.3`, this behaviour has changed and the Option value comes through 
as `null` which is likely to cause a `NullPointerException` for most Scala code 
that operates on the Option. The change seems to be related to the following 
commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

 

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match - since 3.3.3 this could fail if the 
encoder is derived manually using `Encoders.tuple(leftEncoder, rightEncoder)`. 
If the entire tuple encoder `Encoder[(Left, Option[Right]])` is derived at 
once, the encoder works as expected - the bug appears to be in the following 
function inside `ExpressionEncoder.scala`

```
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...
```


> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, 
> encoder2, ..)}} correctly handle casting {{null}} into {{None }}when the 
> target type is an {{{}Option{}}}. 
>  
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes 
> through as {{null}} which is likely to cause a {{NullPointerException}} for 
> most Scala code that operates on the Option. The change seems to be related 
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
>  
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
>  
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match. Since 3.3.3 this would fail if the 
> encoder is derived manually using {{Encoders.tuple(leftEncoder, 
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at 
> once using reflection, the encoder works as expected. The bug appears to be 
> in the following function inside {{ExpressionEncoder.scala}}
>  
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = 
> ...{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (SPARK-46251) Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast null into None for Option values

2023-12-04 Thread Will Boulter (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Boulter updated SPARK-46251:
-
Summary: Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not 
correctly cast null into None for Option values  (was: Spark 3.3.3 tuple 
encoders do not correctly cast null into None for Option values)

> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast 
> null into None for Option values
> --
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
>Reporter: Will Boulter
>Priority: Major
>
> In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, 
> ..)` correctly handle casting `null` into `None` when the target type is an 
> `Option`. 
>  
> In Spark `3.3.3`, this behaviour has changed and the Option value comes 
> through as `null` which is likely to cause a `NullPointerException` for most 
> Scala code that operates on the Option. The change seems to be related to the 
> following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
>  
> I have made a reproduction with a couple of examples in a public Github repo 
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug] 
>  
> The common use case where this is likely to be encountered is while doing any 
> joins that can return null, e.g. left or outer joins. When casting the result 
> of a left join it is sensible to wrap the right-hand side in an Option to 
> handle the case where there is no match - since 3.3.3 this could fail if the 
> encoder is derived manually using `Encoders.tuple(leftEncoder, 
> rightEncoder)`. If the entire tuple encoder `Encoder[(Left, Option[Right]])` 
> is derived at once, the encoder works as expected - the bug appears to be in 
> the following function inside `ExpressionEncoder.scala`
> ```
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org