westonpace edited a comment on pull request #9482: URL: https://github.com/apache/arrow/pull/9482#issuecomment-781627186
Ok, I think I get it now. Let's pretend ~~we are outside of S3,~~ (nvm, S3 not relevant) there is 1 file, 3 row groups, and 3 columns. Three scan tasks will be generated. Task 1 needs RG1C1, RG1C2, RG1C3. Task 2 needs RG2C1, RG2C2, RG2C3. Task 3 needs RG3C1, RG3C2, RG3C3. Prebuffer will be called asking for all 9 blocks and it will then issue three reads in parallel (instead of the 9 reads that would otherwise be issued) RG1, RG2, and RG3. If there are only 3 columns in the file, is it possible Prebuffer would coalesce this all into one read? In that case wouldn't all three tasks be blocked until the entire file is read, preventing the ability for task 1 to start running as soon as RG1 is issued? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
