[ 
https://issues.apache.org/jira/browse/AVRO-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Peter Marlow updated AVRO-4134:
--------------------------------------
    Description: There is an improvement that could be made to the c++ avro 
decoder for large arrays. In Specific.hh the code loops over a collection, 
performing a push_back for each item. This is in the static decode template 
function. It does a clear and then does a push back for each item that it adds. 
So when the collection is large, hundreds or even thousands of items, the 
repeated expansion of the vector can cause a performance issue. I am hoping 
there is some way for the code to be able to call reserve before performing 
those push backs.  (was: There is an improvement that could be made to the c++ 
avro decoder for large arrays. In Specific.hh the code loops over a collection, 
performing a push_back for each item. This is in the static decode template 
function. It does a clear and then does a push back for each item that it adds. 
So when the collection is large, hundreds or even thousands of items, the 
repeated expansion of the vector can cause a performance issue. The fix is 
simple. Right after the call to clear, make a call to reserve. The number of 
items may have to be counted with code like this:

size_t count = 0;
for (size_t n = d.arrayStart(); n != 0; n = d.arrayNext())

{     count += n; })

> Specific.hh decode and adding a large number of items to a vector without 
> using reserve first
> ---------------------------------------------------------------------------------------------
>
>                 Key: AVRO-4134
>                 URL: https://issues.apache.org/jira/browse/AVRO-4134
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: c++
>    Affects Versions: 1.12.0
>            Reporter: Andrew Peter Marlow
>            Priority: Trivial
>
> There is an improvement that could be made to the c++ avro decoder for large 
> arrays. In Specific.hh the code loops over a collection, performing a 
> push_back for each item. This is in the static decode template function. It 
> does a clear and then does a push back for each item that it adds. So when 
> the collection is large, hundreds or even thousands of items, the repeated 
> expansion of the vector can cause a performance issue. I am hoping there is 
> some way for the code to be able to call reserve before performing those push 
> backs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to