[ 
https://issues.apache.org/jira/browse/ARROW-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2330:
--------------------------------
    Fix Version/s:     (was: 0.9.0)
                   0.10.0

> [C++] Optimize delta buffer creation with partially finishable array builders
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-2330
>                 URL: https://issues.apache.org/jira/browse/ARROW-2330
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>    Affects Versions: 0.8.0
>            Reporter: Dimitri Vorona
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> The main aim of this change is to optimize the building of delta 
> dictionaries. In the current version delta dictionaries are built using an 
> additional "overflow" buffer which leads to complicated and potentially 
> error-prone code and subpar performance by doubling the number of lookups.
> I solve this problem by introducing the notion of partially finishable array 
> builders, i.e. builder which are able to retain the state on calling Finish. 
> The interface is based on RecordBatchBuilder::Flush, i.e. Finish is 
> overloaded with additional signature Finish(bool reset_builder, 
> std::shared_ptr<Array>* out). The resulting Arrays point to the same data 
> buffer with different offsets.
> I'm aware that the change is kind of biggish, but I'd like to discuss it 
> here. The solution makes the code more straight forward, doesn't bloat the 
> code base too much and leaves the API more or less untouched. Additionally, 
> the new way to make delta dictionaries by using a different call signature to 
> Finish feel cleaner to me.
> I'm looking forward to your critic and improvement ideas.
> The pull request is available at: https://github.com/apache/arrow/pull/1769



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to