Re: [PR] fix: [HUDI-CLUSTERING] Optimize binary copy performance with lazy loading, bulk reads, and double buffering [hudi]

via GitHub Wed, 25 Feb 2026 02:38:06 -0800


zhangyue19921010 commented on PR #18241:
URL: https://github.com/apache/hudi/pull/18241#issuecomment-3958324548


   Hi @gudladona Thanks for this contribution! In general, there are two 
questions that I wonder if you could elaborate on:
   
   1. Whole-File In-Memory Processing: Implemented a "Read Whole File" strategy 
for files smaller than 2GB. Do we need to cache the entire file here, or is IO 
at the fg granularity sufficient? This is mainly a consideration of memory 
pressure.
   2. Double-Buffer: Do we definitely need this Double-Buffer? For binary copy, 
the CPU pressure itself is relatively low, and the overall bottleneck lies in 
the IO interaction with remote storage. It seems that using a double buffer for 
caching here is not of great practical significance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: [HUDI-CLUSTERING] Optimize binary copy performance with lazy loading, bulk reads, and double buffering [hudi]

Reply via email to