wangyum opened a new pull request, #40294:
URL: https://github.com/apache/spark/pull/40294

   ### What changes were proposed in this pull request?
   
   This PR enhances `UnwrapCastInBinaryComparison` to support unwrapping date 
type to string type.
   
   ### Why are the changes needed?
   
   Avoid always fetching all partitions because the partition filters cannot be 
pushed down to the Hive metastore. For example:
   ```sql
   CREATE TABLE t1(id int, dt string) using parquet PARTITIONED BY (dt);
   EXPLAIN SELECT * FROM t1 WHERE dt > date_add(current_date(), -7);
   ```
   
   Before SPARK-27638. It pushes partition filters to Hive metastore:
   ```
   == Physical Plan ==
   *(1) FileScan parquet default.t1[id#2,dt#3] Batched: true, Format: Parquet, 
Location: PrunedInMemoryFileIndex[], PartitionCount: 0, PartitionFilters: 
[isnotnull(dt#3), (dt#3 > 2023-02-27)], PushedFilters: [], ReadSchema: 
struct<id:int>
   ```
   
   After SPARK-27638. Because it will not [convert partition 
filters](https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L794-L798)
 to hive metastore filters, it will not push partition filters to Hive 
metastore. As a result, it always takes all the parititons:
   ```
   == Physical Plan ==
   *(1) ColumnarToRow
   +- FileScan parquet default.t1[id#5,dt#6] Batched: true, DataFilters: [], 
Format: Parquet, Location: InMemoryFileIndex(0 paths)[], PartitionFilters: 
[isnotnull(dt#6), (cast(dt#6 as date) > 2023-02-27)], PushedFilters: [], 
ReadSchema: struct<id:int>
   ```
   
   After this PR. It unwraps date type to string type and then pushes partition 
filters to Hive metastore:
   ```
   == Physical Plan ==
   *(1) ColumnarToRow
   +- FileScan parquet spark_catalog.default.t1[id#0,dt#1] Batched: true, 
DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(0 paths)[], 
PartitionFilters: [isnotnull(dt#1), (dt#1 > 2023-02-26)], PushedFilters: [], 
ReadSchema: struct<id:int>
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to