ARRAY_INTERSECT In Spark Library

Ran Tao (Jira) Wed, 02 Aug 2023 20:43:05 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ran Tao updated CALCITE-5893:
-----------------------------
    Description: 
The following are the implementations of some array functions in 
calcite({*}Spark Library{*}) that are inconsistent with actual spark behavior.

 

The reason here is that *null* and *cast (null as xxx)* are treated equally, 
and *NullPolicy* acts on these two situations at the same time and returns null 
directly. However in spark, {*}the former needs to throw an exception, and the 
correct form is the latter{*}. (In fact, apache flink also throws an 
exception.)  we should use Resource.nullIllegal to raise "Illegal use of 
''NULL'" in such case to match Spark behavior.

*calcite spark:*

// return null
select array_contains(array[1, 2], null);
// return null
select array_except(array[1, 2, 3], null)
// return null
select array_intersect(array[1,2,3], null)

*actual spark:*
{code:java}
// Cannot resolve "array_contains(array(1, 2), NULL)" due to data type 
mismatch: // Null typed values cannot be used as arguments of `array_contains`

spark-sql (default)> select array_contains(array(1, 2), null); {code}
{code:java}
// data type mismatch: Input to function `array_except` should have been two 
"ARRAY" with same element type, 
// but it's ["ARRAY<INT>", "VOID"]

spark-sql (default)> select array_except(array(1, 2, 3), null); {code}
{code:java}
// data type mismatch: Input to function `array_intersect` should have been two 
"ARRAY" with same element type, 
// but it's ["ARRAY<INT>", "VOID"]

spark-sql (default)> select array_intersect(array(1,2,3), null); {code}

  was:
The following are the implementations of some array functions in 
calcite({*}Spark Library{*}) that are inconsistent with actual spark behavior.

 

The reason here is that *null* and *cast (null as xxx)* are treated equally, 
and *NullPolicy* acts on these two situations at the same time and returns null 
directly. However in spark, the former needs to throw an exception, and the 
correct form is the latter. (In fact, apache flink also throws an exception.)  
we should use Resource.nullIllegal to raise "Illegal use of ''NULL'" in such 
case to match Spark behavior.

*calcite spark:*

// return null
select array_contains(array[1, 2], null);
// return null
select array_except(array[1, 2, 3], null)
// return null
select array_intersect(array[1,2,3], null)

*actual spark:*
{code:java}
// Cannot resolve "array_contains(array(1, 2), NULL)" due to data type 
mismatch: // Null typed values cannot be used as arguments of `array_contains`

spark-sql (default)> select array_contains(array(1, 2), null); {code}
{code:java}
// data type mismatch: Input to function `array_except` should have been two 
"ARRAY" with same element type, 
// but it's ["ARRAY<INT>", "VOID"]

spark-sql (default)> select array_except(array(1, 2, 3), null); {code}
{code:java}
// data type mismatch: Input to function `array_intersect` should have been two 
"ARRAY" with same element type, 
// but it's ["ARRAY<INT>", "VOID"]

spark-sql (default)> select array_intersect(array(1,2,3), null); {code}


> Wrong NULL literal behavior of ARRAY_CONTAINS/ARRAY_EXCEPT/ARRAY_INTERSECT In 
> Spark Library
> -------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5893
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5893
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.35.0
>            Reporter: Ran Tao
>            Assignee: Ran Tao
>            Priority: Major
>             Fix For: 1.36.0
>
>
> The following are the implementations of some array functions in 
> calcite({*}Spark Library{*}) that are inconsistent with actual spark behavior.
>  
> The reason here is that *null* and *cast (null as xxx)* are treated equally, 
> and *NullPolicy* acts on these two situations at the same time and returns 
> null directly. However in spark, {*}the former needs to throw an exception, 
> and the correct form is the latter{*}. (In fact, apache flink also throws an 
> exception.)  we should use Resource.nullIllegal to raise "Illegal use of 
> ''NULL'" in such case to match Spark behavior.
> *calcite spark:*
> // return null
> select array_contains(array[1, 2], null);
> // return null
> select array_except(array[1, 2, 3], null)
> // return null
> select array_intersect(array[1,2,3], null)
> *actual spark:*
> {code:java}
> // Cannot resolve "array_contains(array(1, 2), NULL)" due to data type 
> mismatch: // Null typed values cannot be used as arguments of `array_contains`
> spark-sql (default)> select array_contains(array(1, 2), null); {code}
> {code:java}
> // data type mismatch: Input to function `array_except` should have been two 
> "ARRAY" with same element type, 
> // but it's ["ARRAY<INT>", "VOID"]
> spark-sql (default)> select array_except(array(1, 2, 3), null); {code}
> {code:java}
> // data type mismatch: Input to function `array_intersect` should have been 
> two "ARRAY" with same element type, 
> // but it's ["ARRAY<INT>", "VOID"]
> spark-sql (default)> select array_intersect(array(1,2,3), null); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-5893) Wrong NULL literal behavior of ARRAY_CONTAINS/ARRAY_EXCEPT/ARRAY_INTERSECT In Spark Library

Reply via email to