[ https://issues.apache.org/jira/browse/SPARK-39241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540601#comment-17540601 ]
Yuming Wang commented on SPARK-39241: ------------------------------------- I can't reproduce this issue: {code:scala} spark.sql( """ | CREATE EXTERNAL TABLE tmp( f1 STRING) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 'file://tmp/tmp/' """.stripMargin) spark.sql("""insert into table tmp partition(dt="2022051000") values("1")""") spark.sql("select * from tmp where dt like '202205100%'").show() spark.sql("select * from tmp where dt like any ('202205100%')").show() {code} > Spark SQL 'Like' operator behaves wrongly while filtering on partitioned > column after Spark 3.1 > ----------------------------------------------------------------------------------------------- > > Key: SPARK-39241 > URL: https://issues.apache.org/jira/browse/SPARK-39241 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.2 > Environment: *Environment: EMR* > Release label:emr-6.5.0 > Hadoop distribution:Amazon 3.2.1 > Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1 > Reporter: Dmitry Gorbatsevich > Priority: Major > > It seems like introduction of "like any" in spark 3.1 breaks "like" behaviour > when filtering on partitioned column. Here is the example: > 1. Create test table: > {code:java} > scala> spark.sql( > | """ > | CREATE EXTERNAL TABLE tmp( > | f1 STRING > | ) > | PARTITIONED BY (dt STRING) > | ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' > | LINES TERMINATED BY '\n' > | STORED AS TEXTFILE > | LOCATION 's3://vlg-data-us-east-1/tmp/tmp/'; > | """) > res2: org.apache.spark.sql.DataFrame = []{code} > 2. insert something there: > {code:java} > scala> spark.sql( > | """ > | insert into table tmp partition(dt="2022051000") values("1") > | """ > | ) > res3: org.apache.spark.sql.DataFrame = [] {code} > 3. Do select using 'like': > {code:java} > scala> spark.sql( > | """ > | select * from tmp > | where dt like '202205100%' > | """ > | ).show() > +---+---+ > | f1| dt| > +---+---+ > +---+---+ {code} > 4. Do select using 'like any': > {code:java} > scala> spark.sql( > | """ > | select * from tmp > | where dt like any ('202205100%') > | """ > | ).show() > 22/05/20 14:50:26 WARN HiveConf: HiveConf of name hive.server2.thrift.url > does not exist > +---+----------+ > | f1| dt| > +---+----------+ > | 1|2022051000| > +---+----------+ {code} > Expectation is that results 3 and 4 are identical, however this is not the > case and result #3 is obviously wrong. > > *Environment: EMR* > Release label:emr-6.5.0 > Hadoop distribution:Amazon 3.2.1 > Applications:{*}Spark 3.1.2{*}, Hive 3.1.2, Livy 0.7.1 > -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org