mukund-thakur commented on PR #15341:
URL: https://github.com/apache/iceberg/pull/15341#issuecomment-4138941641

   Those numbers are quite good.
   After you mentioning the S3 IO overhead, I think another approach would be 
to do only async rather than doing in parallel. What I mean by that is while a 
file is being processed make a REST call to S3 in background. We can see if 
this matches the perf improvements. In that case we won't have to deal with 
managing the multiple threads. 
   
   PS: We do something similar in S3A connector for async file listings. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to