mukund-thakur commented on PR #15341: URL: https://github.com/apache/iceberg/pull/15341#issuecomment-4138941641
Those numbers are quite good. After you mentioning the S3 IO overhead, I think another approach would be to do only async rather than doing in parallel. What I mean by that is while a file is being processed make a REST call to S3 in background. We can see if this matches the perf improvements. In that case we won't have to deal with managing the multiple threads. PS: We do something similar in S3A connector for async file listings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
