davehagman edited a comment on issue #3733: URL: https://github.com/apache/hudi/issues/3733#issuecomment-933845183
> just partitioning on year, month and day did not work out for you and hence you have to go w/ hour as well? We tested multiple partitioning schemes and this gave us a good tradeoff between read and write performance (especially under multi-hour processing delays when we need to ingest large amounts of more recent data to catch up to real-time). Removing the hour partition _could_ be feasible now though, I'm not sure how much testing we did originally with and without the hour specifically. > are you seeing spikes only in those batches where records are spread across older partitions. if you have regular traffic which updates only the last few partitions, are the perf back to normal ? Yes exactly. We ended up splitting these very old sparse events out of the ingestion process and this allowed the performance to return to normal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org