[ 
https://issues.apache.org/jira/browse/SPARK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bogdan Raducanu updated SPARK-21228:
------------------------------------
    Description: 
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show
+----+
|minA|
+----+
+----+
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show

+-----+
| minA|
+-----+
|[1,1]|
+-----+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.

  was:
In InSet it's possible that hset contains GenericInternalRows while child 
returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
will always be false in this case.

The following code reproduces the problem:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
default is 10 which requires a longer query text to repro

spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
a").createOrReplaceTempView("A")

sql("select * from (select min(a) as minA from A) A where minA in 
(named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
2L),named_struct('a', 3L, 'b', 3L))").show
+----+
|minA|
+----+
+----+
{code}
In.doCodeGen appears to be correct:
{code}
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
will not use InSet
+-----+
| minA|
+-----+
|[1,1]|
+-----+
{code}

Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
not trigger InSet optimization at all in this case.


> InSet.doCodeGen incorrect handling of structs
> ---------------------------------------------
>
>                 Key: SPARK-21228
>                 URL: https://issues.apache.org/jira/browse/SPARK-21228
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Bogdan Raducanu
>
> In InSet it's possible that hset contains GenericInternalRows while child 
> returns UnsafeRows (and vice versa). InSet.doCodeGen uses hset.contains which 
> will always be false in this case.
> The following code reproduces the problem:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "2") // the 
> default is 10 which requires a longer query text to repro
> spark.range(1, 10).selectExpr("named_struct('a', id, 'b', id) as 
> a").createOrReplaceTempView("A")
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> +----+
> |minA|
> +----+
> +----+
> {code}
> In.doCodeGen appears to be correct:
> {code}
> spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", "3") // now it 
> will not use InSet
> sql("select * from (select min(a) as minA from A) A where minA in 
> (named_struct('a', 1L, 'b', 1L),named_struct('a', 2L, 'b', 
> 2L),named_struct('a', 3L, 'b', 3L))").show
> +-----+
> | minA|
> +-----+
> |[1,1]|
> +-----+
> {code}
> Solution could be either to do safe<->unsafe conversion in InSet.doCodeGen or 
> not trigger InSet optimization at all in this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to