zhangyue19921010 commented on PR #18241: URL: https://github.com/apache/hudi/pull/18241#issuecomment-3958324548
Hi @gudladona Thanks for this contribution! In general, there are two questions that I wonder if you could elaborate on: 1. Whole-File In-Memory Processing: Implemented a "Read Whole File" strategy for files smaller than 2GB. Do we need to cache the entire file here, or is IO at the fg granularity sufficient? This is mainly a consideration of memory pressure. 2. Double-Buffer: Do we definitely need this Double-Buffer? For binary copy, the CPU pressure itself is relatively low, and the overall bottleneck lies in the IO interaction with remote storage. It seems that using a double buffer for caching here is not of great practical significance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
