bvaradar commented on issue #2269: URL: https://github.com/apache/hudi/issues/2269#issuecomment-732367698
1. Only the files having records to be updated or added will be "touched" by Hudi. 2.Hudi write operations would only load the partitions that are needed for writing (only partitions that are getting affected). There are background processes like cleaner and compaction scheduling which would need to look at the entire dataset. 3.Apart of listing issue, if you are using HMS for your query, you would need to look at how it is performing during query planning phases when your queries are hitting it to prune directories to be queried. 4. Spark bucketing is not supported currently. You can add an explicit partition column which would bound the search space but would require users to include them in the query. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org