attilapiros opened a new pull request, #43064: URL: https://github.com/apache/spark/pull/43064
### What changes were proposed in this pull request? Supporting Hive 4.0 metastore where partition filters even for CHAR and a VARCHAR types can be pushed down. **Hive 4.0 is still beta! This is why this is work on progress PR.** ### Why are the changes needed? Supporting more Hive versions (with extra performance improvement) is good for our users. ### Does this PR introduce _any_ user-facing change? Yes. Regarding supporting Hive 4.0 metastore the documentation is updated accordingly. ### How was this patch tested? #### Manually I used the docker image of apache/hive:4.0.0-beta-1 for starting a metastore and a hiveserver2 (along with a hadoop3 docker image). Created a table: ``` CREATE EXTERNAL TABLE testTable1 ( column1 String ) PARTITIONED BY (partColumn1 CHAR(30), partColumn2 VARCHAR(30)) LOCATION 'hdfs://hadoop3:8020/tmp/hive_external/'; ``` Inserted some values in beeline: ``` insert into table testtable1 values ("column1_v1", "partcolumn1_v1", "partcolumn2_v1"), ("column1_v2", "partcolumn1_v2", "partcolumn2_v2"); ``` Started my spark in the hiveserver2 container as: ``` ./bin/spark-shell --conf spark.sql.hive.metastore.version=4.0.0 --conf spark.sql.hive.metastore.jars="/opt/hive/lib/*" ``` Run the query as: ``` scala> sql("select * from testtable1 where partcolumn1 = 'partcolumn1_v1' and partcolumn2 = 'partcolumn2_v1'").show Hive Session ID = 6846fe0e-968a-474d-afec-4f67b3a2a274 +----------+--------------------+--------------+ | column1| partcolumn1| partcolumn2| +----------+--------------------+--------------+ |column1_v1|partcolumn1_v1 ...|partcolumn2_v1| +----------+--------------------+--------------+ ``` And check the HMS calls in the metastore container in the file `/tmp/hive/hive.log`: ``` ... 2023-09-22T21:06:34,293 INFO [Metastore-Handler-Pool: Thread-1356] HiveMetaStore.audit: ugi=hive ip=172.30.0.5 cmd=source:172.30.0.5 get_partitions_by_filter : tbl=hive.default.testtable1 ... ``` Which contains the expected `get_partitions_by_filter`. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org