Maxim Gekk created SPARK-31563:
----------------------------------

             Summary: Failure of InSet.sql for UTF8String collection
                 Key: SPARK-31563
                 URL: https://issues.apache.org/jira/browse/SPARK-31563
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.5, 3.0.0, 3.1.0
            Reporter: Maxim Gekk


The InSet expression works on collections of internal Catalyst's types. We can 
see this in the optimization when In is replaced by InSet, and In's collection 
is evaluated to internal Catalyst's values: 
[https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L253-L254]
{code:scala}
        if (newList.length > SQLConf.get.optimizerInSetConversionThreshold) {
          val hSet = newList.map(e => e.eval(EmptyRow))
          InSet(v, HashSet() ++ hSet)
        }
{code}
The code existed before the optimization 
https://github.com/apache/spark/pull/25754 that made another wrong assumption 
about collection types.

If InSet accepts only internal Catalyst's types, the following code shouldn't 
fail:
{code:scala}
InSet(Literal("a"), Set("a", "b").map(UTF8String.fromString)).sql
{code}
but it fails with the exception:
{code}
Unsupported literal type class org.apache.spark.unsafe.types.UTF8String a
java.lang.RuntimeException: Unsupported literal type class 
org.apache.spark.unsafe.types.UTF8String a
        at 
org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:88)
        at 
org.apache.spark.sql.catalyst.expressions.InSet.$anonfun$sql$2(predicates.scala:522)
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to