edponce commented on PR #13857:
URL: https://github.com/apache/arrow/pull/13857#issuecomment-1255455924

   @wjones127 Thanks for sharing [these 
benchmarks](https://github.com/apache/arrow/pull/13857#issuecomment-1255415425).
 Are these results measured without the extra overhead of the temporary 
`std::vector` for the ChunkResolver case?
   
   It is reasonable that the smaller the chunk size the better performance 
ChunkResolver cases are, due to chunk caching and the higher probability of 
hitting the same chunk consecutively.
   
   Performance is a tricky business bc it depends on the metrics you are 
evaluating for. Both approaches have advantage and disadvantage. If you have a 
large dataset and are somewhat memory constrained, the concatenation approach 
may not be adequate due to the extra storage. The ChunkResolver is the most 
general solution with least overhead on memory use and still reasonable 
performance. AFAIK, Arrow does not tracks memory statistics to permit selecting 
which of these approaches should be used. Well, maybe adding an option for the 
client code to decide but this does not seem to follow Arrow's general design.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to