boneanxs commented on PR #8452: URL: https://github.com/apache/hudi/pull/8452#issuecomment-1606013767
> if oyu could attach the query plan for before and after this change, it would be helpful. There's no query plan difference btw before and after, since all filters will be pushed to hudi, but some filters won't take effect before this pr. I tested a table with 5w partitions(region, date, hour), and print timeCost in `org.apache.hudi.SparkHoodieTableFileIndex#tryListByPartitionPathPrefix` ```scala private def tryListByPartitionPathPrefix(partitionColumnNames: Seq[String], partitionColumnPredicates: Seq[Expression]) = { // Static partition-path prefix is defined as a prefix of the full partition-path where only // first N partition columns (in-order) have proper (static) values bound in equality predicates, // allowing in turn to build such prefix to be used in subsequent filtering val startTime = System.currentTimeMillis() //... log.info(s"Time cost to listing files: ${System.currentTimeMillis() - startTime}ms") result } ``` Pushed with filter `date=date"2023-06-20`, and run it in Local[10] mode 3 times, we can see the time can be saved with this pr ### Before the pr ``` 23/06/25 18:09:11 INFO HoodieFileIndex: Time cost to listing files: 42745ms 23/06/25 18:12:04 INFO HoodieFileIndex: Time cost to listing files: 37495ms 23/06/25 18:15:14 INFO HoodieFileIndex: Time cost to listing files: 43496ms ``` ### After the pr ``` 23/06/25 18:19:35 INFO HoodieFileIndex: Time cost to listing files: 10928ms 23/06/25 18:20:29 INFO HoodieFileIndex: Time cost to listing files: 10015ms 23/06/25 18:21:25 INFO HoodieFileIndex: Time cost to listing files: 12032ms ``` SInce my backend storage is `HDFS`, I think it could save more time if using `ObjectStore` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org