[jira] [Commented] (ARROW-16161) [C++] Overhead of std::shared_ptr copies is causing thread contention

Tobias Zagorni (Jira) Wed, 27 Apr 2022 16:11:07 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529119#comment-17529119
 ]


Tobias Zagorni commented on ARROW-16161:
----------------------------------------

I created a hacky prototype of an ExecArrayData class that does not hold a 
shared_ptr to DataType. It is used in Datum as an alternative to ArrayData. I 
can currently run all the ExecuteScalarExpressionOverhead benchmarks with it.

There are noticeable performance improvements in multi-threaded runs with small 
(e.g. 1000) batch sizes. The best is in zero_copy_expression:

{{old: 
ExecuteScalarExpressionOverhead/zero_copy_expression/rows_per_batch:1000/threads:16
        1613271 ns     25730049 ns           32 rows_per_second=388.651M/s
new: 
ExecuteScalarExpressionOverhead/zero_copy_expression/rows_per_batch:1000/threads:16
        1121647 ns     17913402 ns           32 rows_per_second=558.241M/s}}

(full results are attached: [^ExecArrayData-difference.txt])

In other places I see a small performance reduction, and also the 
simple_expression/100 improvement is smaller than with the first copyable 
DataType experiment. Probably due to the additional code paths in this 
implementation and my way of getting a shared_ptr<DataType> back from 
ExecArrayData, cloning the DataType object, which is far from optimal. I'm 
still looking into this.

most the remaining reference counting overhead seems to come from 3 places:
 * ValueDescr / resolving output types using it
 * Scalar datums
 * pointers to the CPUDevice instance, because CPUMemeoryManager instances are 
recreated a lot. I'll create a diffrent subtask for this

[Code|https://github.com/zagto/arrow/compare/datatype-performance-execarraydata-baseline...zagto:datatype-performance?expand=1]
 - (ignore the weird mix of template and subclassing for ExecArrayData)

> [C++] Overhead of std::shared_ptr<DataType> copies is causing thread 
> contention
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-16161
>                 URL: https://issues.apache.org/jira/browse/ARROW-16161
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Tobias Zagorni
>            Priority: Major
>         Attachments: ExecArrayData-difference.txt
>
>
> We created a benchmark to measure ExecuteScalarExpression performance in 
> ARROW-16014.  We noticed significant thread contention (even though there 
> shouldn't be much, if any, for this task) As part of ARROW-16138 we have been 
> investigating possible causes.
> One cause seems to be contention from copying shared_ptr<DataType> objects.
> Two possible solutions jump to mind and I'm sure there are many more.
> ExecBatch is an internal type and used inside of ExecuteScalarExpression as 
> well as inside of the execution engine.  In the former we can safely assume 
> the data types will exist for the duration of the call.  In the latter we can 
> safely assume the data types will exist for the duration of the execution 
> plan.  Thus we can probably take a more targetted fix and migrate only 
> ExecBatch to using DataType* (or const DataType&).
> On the other hand, we might consider a more global approach.  All of our 
> "stock" data types are assumed to have static storage duration.  However, we 
> must use std::shared_ptr<DataType> because users could create their own 
> extension types.  We could invent an "extension type registration" system 
> where extension types must first be registered with the C++ lib before being 
> used.  Then we could have long-lived DataType instances and we could replace 
> std::shared_ptr<DataType> with DataType* (or const DataType&) throughout most 
> of the entire code base.
> But, as I mentioned, I'm sure there are many approaches to take.  CC 
> [~lidavidm] and [~apitrou] and [~yibocai] for thoughts but this might be 
> interesting for just about any C++ dev.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (ARROW-16161) [C++] Overhead of std::shared_ptr copies is causing thread contention

Reply via email to