fuwhu edited a comment on issue #27232: [SPARK-30525][SQL]HiveTableScanExec do not need to prune partitions again after pushing down to hive metastore URL: https://github.com/apache/spark/pull/27232#issuecomment-575660407 > emmmm, prune partition again is because when call `listPartitionsByFilter` kit can't convert all spark filter expression to hive metastore filter condition. and will get more partition then exactly wanted. then prune again will spark's method to keep smallest partitions. > > If we can't promise all these case was fixed in `listPartitionsByFilter`, we may still need this. and it's cost is negligible HiveExternalCatalog.listPartitionsByFilter will call HiveClient.getPartitionsByFilter to push down to hive metastore for partition pruning, which may not convert all spark filters to hive filters. But now it already call ExternalCatalogUtils.prunePartitionsByFilter to prune the results returned by HiveClient.getPartitionsByFilter again in HiveExternalCatalog.listPartitionsByFilter. So it is not necessary any more to prune again in HiveTableScanExec. you can check the code : https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L1254
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org