[ https://issues.apache.org/jira/browse/SPARK-33098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-33098: ------------------------------------ Assignee: Apache Spark > Exception when using 'in' to compare a partition column to a literal with the > wrong type > ---------------------------------------------------------------------------------------- > > Key: SPARK-33098 > URL: https://issues.apache.org/jira/browse/SPARK-33098 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.0 > Reporter: Bruce Robbins > Assignee: Apache Spark > Priority: Major > > Comparing a partition column against a literal with the wrong type works if > you use equality ('='). However, if you use 'in', you get: > {noformat} > MetaException(message:Filtering is supported only on partition keys of type > string) > {noformat} > For example: > {noformat} > spark-sql> create table test (a int) partitioned by (b int) stored as parquet; > Time taken: 0.323 seconds > spark-sql> insert into test values (1, 1), (1, 2), (2, 2); > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updating partition stats fast for: test > 20/10/08 19:57:14 WARN log: Updated size to 418 > 20/10/08 19:57:14 WARN log: Updated size to 836 > Time taken: 2.124 seconds > spark-sql> -- this works, of course > spark-sql> select * from test where b in (2); > 1 2 > 2 2 > Time taken: 0.13 seconds, Fetched 2 row(s) > spark-sql> -- this also works (equals with wrong type) > spark-sql> select * from test where b = '2'; > 1 2 > 2 2 > Time taken: 0.132 seconds, Fetched 2 row(s) > spark-sql> -- this does not work ('in' with wrong type) > spark-sql> select * from test where b in ('2'); > 20/10/08 19:58:30 ERROR SparkSQLDriver: Failed in [select * from test where b > in ('2')] > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > - > Caused by: MetaException(message:Filtering is supported only on partition > keys of type string) > {noformat} > There are also interesting variations of this using the dataframe API: > {noformat} > scala> sql("select cast(b as string) as b from test where b in > (2)").show(false) > +---+ > |b | > +---+ > |2 | > |2 | > +---+ > scala> sql("select cast(b as string) as b from test").filter("b in > (2)").show(false) > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:828) > - > - > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is > supported only on partition keys of type string > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org