[GitHub] [hudi] bvaradar commented on issue #2269: [SUPPORT] - HUDI Table Bulk Insert for 5 gb parquet file progressively taking longer time to insert.

GitBox Mon, 23 Nov 2020 11:13:11 -0800


bvaradar commented on issue #2269:
URL: https://github.com/apache/hudi/issues/2269#issuecomment-732367698



   1. Only the files having records to be updated or added will be "touched" by 
Hudi.
   2.Hudi write operations would only load the partitions that are needed for 
writing (only partitions that are getting affected). There are background 
processes like cleaner and compaction scheduling which would need to look at 
the entire dataset.
   3.Apart of listing issue, if you are using HMS for your query, you would 
need to look at how it is performing during query planning phases when your 
queries are hitting it to prune directories to be queried.
   4. Spark bucketing is not supported currently. You can add an explicit 
partition column which would bound the search space but would require users to 
include them in the query.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2269: [SUPPORT] - HUDI Table Bulk Insert for 5 gb parquet file progressively taking longer time to insert.

Reply via email to