Re: [PR] AVRO-4134: [C++] Allocate std::vector block by block when decoding [avro]

via GitHub Mon, 17 Nov 2025 02:57:44 -0800


philippeVerney commented on PR #3546:
URL: https://github.com/apache/avro/pull/3546#issuecomment-3541171183


   The short and rigorous answer is : no.
   
   However, it would be hard to make a clear answer about if it is faster or 
slower because, as often, it depends...
   For example, if one decides to encode a very large array in blocks of one 
item, this change would be dramatic.
   Inversely, if one decides to encode a very large array in a single block of 
all items, this change would be perfect.
   As thumb rule, more you use blocks in your array, and worst will be this 
change.
   
   Now, if we try to guess what AVRO users will do (knowing that they can do 
exactly what they want in the end), I think the current AVRO C++ code encodes 
an array as a single block which is the "perfect" case for this change. See 
https://github.com/apache/avro/blob/9110c693767c1dde2665b2b57939333478b12036/lang/c%2B%2B/include/avro/Specific.hh#L238
 where we only see a single call to `e.setItemCount(b.size());`
   Consequently, when using the current AVRO code to encode arrays, you end up 
with a single block array which will make the "problematic" decoding loop 
irrelevant.
   
   I think that this change is clearly faster for the default usage of the AVRO 
C++ library. However, it has indeed drawbacks and can be worst for some exotic 
and/or proprietary and/or future array encodings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] AVRO-4134: [C++] Allocate std::vector block by block when decoding [avro]

Reply via email to