philippeVerney commented on PR #3546: URL: https://github.com/apache/avro/pull/3546#issuecomment-3541171183
The short and rigorous answer is : no. However, it would be hard to make a clear answer about if it is faster or slower because, as often, it depends... For example, if one decides to encode a very large array in blocks of one item, this change would be dramatic. Inversely, if one decides to encode a very large array in a single block of all items, this change would be perfect. As thumb rule, more you use blocks in your array, and worst will be this change. Now, if we try to guess what AVRO users will do (knowing that they can do exactly what they want in the end), I think the current AVRO C++ code encodes an array as a single block which is the "perfect" case for this change. See https://github.com/apache/avro/blob/9110c693767c1dde2665b2b57939333478b12036/lang/c%2B%2B/include/avro/Specific.hh#L238 where we only see a single call to `e.setItemCount(b.size());` Consequently, when using the current AVRO code to encode arrays, you end up with a single block array which will make the "problematic" decoding loop irrelevant. I think that this change is clearly faster for the default usage of the AVRO C++ library. However, it has indeed drawbacks and can be worst for some exotic and/or proprietary and/or future array encodings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
