[ https://issues.apache.org/jira/browse/ARROW-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Omer Ozarslan updated ARROW-6375: --------------------------------- Description: I was trying to benchmark performances of using array builders vs. STL API for converting some row data to arrow tables. I realized it is around 1.5-1.8 times slower to convert {{std::vector}} values with STL API than doing so with builder API. It appears this is primarily due to appending rows via {{...::Append}} method by iterating over {{ConversionTrait<std::vector<...>>::AppendRow}} for each value. Calling {{...::AppendValues}} would make it more efficient, however, {{ConversionTraits}} doesn't offer a way for appending more than one cells ({{AppendRow}} takes a builder and a single cell as its parameters). Would it be possible to extend conversion traits with an optional metho\{{d }}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization to efficiently append multiple values at once? In the example above this function would be called with {{std::vector::data()}} and {{std::vector::size()}} if provided. If such method isn't provided by the specialization, current behavior (i.e. iterating over {{AppendRow}}) can be used as default. [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] is the particular part in code that will be replaced in practice. Instead of directly calling AppendRow in a for loop, a public helper function (e.g. {{stl::AppendRows}}) can be provided, in which it implements above logic. was: I was trying to benchmark performances of using array builders vs. STL API for converting some row data to arrow tables. I realized it is around 1.5-1.8 times slower to convert {{std::vector}} values with STL API than with builder API. It appears this is primarily due to appending rows via {{...::Append}} method by iterating over {{ConversionTrait<std::vector<...>>::AppendRow}} for each value. Calling {{...::AppendValues}} would make it more efficient, however, {{ConversionTraits}} doesn't offer a way for appending more than one cells ({{AppendRow}} takes a builder and a single cell as its parameters). Would it be possible to extend conversion traits with an optional metho{{d }}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization to efficiently append multiple values at once? In the example above this function would be called with {{std::vector::data()}} and {{std::vector::size()}} if provided. If such method isn't provided by the specialization, current behavior (i.e. iterating over {{AppendRow}}) can be used as default. [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] is the particular part in code that will be replaced in practice. Instead of directly calling AppendRow in a for loop, a public helper function (e.g. {{stl::AppendRows}}) can be provided, in which it implements above logic. > [C++] Extend ConversionTraits to allow efficiently appending list values in > STL API > ----------------------------------------------------------------------------------- > > Key: ARROW-6375 > URL: https://issues.apache.org/jira/browse/ARROW-6375 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Omer Ozarslan > Priority: Major > > I was trying to benchmark performances of using array builders vs. STL API > for converting some row data to arrow tables. I realized it is around 1.5-1.8 > times slower to convert {{std::vector}} values with STL API than doing so > with builder API. It appears this is primarily due to appending rows via > {{...::Append}} method by iterating over > {{ConversionTrait<std::vector<...>>::AppendRow}} for each value. > Calling {{...::AppendValues}} would make it more efficient, however, > {{ConversionTraits}} doesn't offer a way for appending more than one cells > ({{AppendRow}} takes a builder and a single cell as its parameters). > Would it be possible to extend conversion traits with an optional metho\{{d > }}{{AppendRows(Builder, Cell*, size_t)}} which allows template specialization > to efficiently append multiple values at once? In the example above this > function would be called with {{std::vector::data()}} and > {{std::vector::size()}} if provided. If such method isn't provided by the > specialization, current behavior (i.e. iterating over {{AppendRow}}) can be > used as default. > [This|https://github.com/apache/arrow/blob/e29732be86958e563801c55d3fcd8dc3fe4e9801/cpp/src/arrow/stl.h#L97-L100] > is the particular part in code that will be replaced in practice. Instead of > directly calling AppendRow in a for loop, a public helper function (e.g. > {{stl::AppendRows}}) can be provided, in which it implements above logic. -- This message was sent by Atlassian Jira (v8.3.2#803003)