[ https://issues.apache.org/jira/browse/SPARK-31563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-31563. ----------------------------------- Fix Version/s: 3.0.0 2.4.6 Resolution: Fixed Issue resolved by pull request 28343 [https://github.com/apache/spark/pull/28343] > Failure of InSet.sql for UTF8String collection > ---------------------------------------------- > > Key: SPARK-31563 > URL: https://issues.apache.org/jira/browse/SPARK-31563 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.5, 3.0.0, 3.1.0 > Reporter: Maxim Gekk > Assignee: Maxim Gekk > Priority: Major > Fix For: 2.4.6, 3.0.0 > > > The InSet expression works on collections of internal Catalyst's types. We > can see this in the optimization when In is replaced by InSet, and In's > collection is evaluated to internal Catalyst's values: > [https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L253-L254] > {code:scala} > if (newList.length > SQLConf.get.optimizerInSetConversionThreshold) { > val hSet = newList.map(e => e.eval(EmptyRow)) > InSet(v, HashSet() ++ hSet) > } > {code} > The code existed before the optimization > https://github.com/apache/spark/pull/25754 that made another wrong assumption > about collection types. > If InSet accepts only internal Catalyst's types, the following code shouldn't > fail: > {code:scala} > InSet(Literal("a"), Set("a", "b").map(UTF8String.fromString)).sql > {code} > but it fails with the exception: > {code} > Unsupported literal type class org.apache.spark.unsafe.types.UTF8String a > java.lang.RuntimeException: Unsupported literal type class > org.apache.spark.unsafe.types.UTF8String a > at > org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:88) > at > org.apache.spark.sql.catalyst.expressions.InSet.$anonfun$sql$2(predicates.scala:522) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org