lidavidm commented on pull request #9656: URL: https://github.com/apache/arrow/pull/9656#issuecomment-812526608
So we have 2 I/O operations per batch because we're reading the message header, getting the body size, then the message body. But the IPC footer already tells us the body size! So I tried consolidating them into a single read. As expected, then we only launch 1 I/O operation per batch, but it doesn't give us any speedup - locally, the 16-file case still takes about 3.5-4 seconds (from a baseline of ~3 seconds). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
