westonpace commented on pull request #10485: URL: https://github.com/apache/arrow/pull/10485#issuecomment-857950376
For context, I want to work on parallelizing the streaming CSV reader. I'd like to investigate smaller block sizes for the earlier stages since they perform effectively random access and I was curious if keeping the data in ~L2-sized chunks would work. I didn't know if setting the block size that small would affect read speeds or if buffering the reads would help. I'm also just kind of working through each stage (read, parse, decode) and getting a good grasp on the performance at each stage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org