[ 
https://issues.apache.org/jira/browse/ARROW-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522329#comment-17522329
 ] 

David Li commented on ARROW-16161:
----------------------------------

I agree the ideal is to refactor ExecBatch. I suggested copying as just a way 
to test the impact without having to refactor to start with.

> [C++] Overhead of std::shared_ptr<DataType> copies is causing thread 
> contention
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-16161
>                 URL: https://issues.apache.org/jira/browse/ARROW-16161
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Tobias Zagorni
>            Priority: Major
>
> We created a benchmark to measure ExecuteScalarExpression performance in 
> ARROW-16014.  We noticed significant thread contention (even though there 
> shouldn't be much, if any, for this task) As part of ARROW-16138 we have been 
> investigating possible causes.
> One cause seems to be contention from copying shared_ptr<DataType> objects.
> Two possible solutions jump to mind and I'm sure there are many more.
> ExecBatch is an internal type and used inside of ExecuteScalarExpression as 
> well as inside of the execution engine.  In the former we can safely assume 
> the data types will exist for the duration of the call.  In the latter we can 
> safely assume the data types will exist for the duration of the execution 
> plan.  Thus we can probably take a more targetted fix and migrate only 
> ExecBatch to using DataType* (or const DataType&).
> On the other hand, we might consider a more global approach.  All of our 
> "stock" data types are assumed to have static storage duration.  However, we 
> must use std::shared_ptr<DataType> because users could create their own 
> extension types.  We could invent an "extension type registration" system 
> where extension types must first be registered with the C++ lib before being 
> used.  Then we could have long-lived DataType instances and we could replace 
> std::shared_ptr<DataType> with DataType* (or const DataType&) throughout most 
> of the entire code base.
> But, as I mentioned, I'm sure there are many approaches to take.  CC 
> [~lidavidm] and [~apitrou] and [~yibocai] for thoughts but this might be 
> interesting for just about any C++ dev.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to