[ https://issues.apache.org/jira/browse/ARROW-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-2330: -------------------------------- Fix Version/s: (was: 0.9.0) 0.10.0 > [C++] Optimize delta buffer creation with partially finishable array builders > ----------------------------------------------------------------------------- > > Key: ARROW-2330 > URL: https://issues.apache.org/jira/browse/ARROW-2330 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Affects Versions: 0.8.0 > Reporter: Dimitri Vorona > Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > The main aim of this change is to optimize the building of delta > dictionaries. In the current version delta dictionaries are built using an > additional "overflow" buffer which leads to complicated and potentially > error-prone code and subpar performance by doubling the number of lookups. > I solve this problem by introducing the notion of partially finishable array > builders, i.e. builder which are able to retain the state on calling Finish. > The interface is based on RecordBatchBuilder::Flush, i.e. Finish is > overloaded with additional signature Finish(bool reset_builder, > std::shared_ptr<Array>* out). The resulting Arrays point to the same data > buffer with different offsets. > I'm aware that the change is kind of biggish, but I'd like to discuss it > here. The solution makes the code more straight forward, doesn't bloat the > code base too much and leaves the API more or less untouched. Additionally, > the new way to make delta dictionaries by using a different call signature to > Finish feel cleaner to me. > I'm looking forward to your critic and improvement ideas. > The pull request is available at: https://github.com/apache/arrow/pull/1769 -- This message was sent by Atlassian JIRA (v7.6.3#76005)