[
https://issues.apache.org/jira/browse/AVRO-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Peter Marlow updated AVRO-4134:
--------------------------------------
Description: There is an improvement that could be made to the c++ avro
decoder for large arrays. In Specific.hh the code loops over a collection,
performing a push_back for each item. This is in the static decode template
function. It does a clear and then does a push back for each item that it adds.
So when the collection is large, hundreds or even thousands of items, the
repeated expansion of the vector can cause a performance issue. I am hoping
there is some way for the code to be able to call reserve before performing
those push backs. (was: There is an improvement that could be made to the c++
avro decoder for large arrays. In Specific.hh the code loops over a collection,
performing a push_back for each item. This is in the static decode template
function. It does a clear and then does a push back for each item that it adds.
So when the collection is large, hundreds or even thousands of items, the
repeated expansion of the vector can cause a performance issue. The fix is
simple. Right after the call to clear, make a call to reserve. The number of
items may have to be counted with code like this:
size_t count = 0;
for (size_t n = d.arrayStart(); n != 0; n = d.arrayNext())
{ count += n; })
> Specific.hh decode and adding a large number of items to a vector without
> using reserve first
> ---------------------------------------------------------------------------------------------
>
> Key: AVRO-4134
> URL: https://issues.apache.org/jira/browse/AVRO-4134
> Project: Apache Avro
> Issue Type: Improvement
> Components: c++
> Affects Versions: 1.12.0
> Reporter: Andrew Peter Marlow
> Priority: Trivial
>
> There is an improvement that could be made to the c++ avro decoder for large
> arrays. In Specific.hh the code loops over a collection, performing a
> push_back for each item. This is in the static decode template function. It
> does a clear and then does a push back for each item that it adds. So when
> the collection is large, hundreds or even thousands of items, the repeated
> expansion of the vector can cause a performance issue. I am hoping there is
> some way for the code to be able to call reserve before performing those push
> backs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)