[ https://issues.apache.org/jira/browse/ARROW-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519943#comment-17519943 ]
Antoine Pitrou commented on ARROW-16161: ---------------------------------------- Note that ExecBatch holds shared_ptr to DataTypes indirectly through the Datums as well. So doing this is more involved that it seems, IMHO. > [C++] Overhead of std::shared_ptr<DataType> copies is causing thread > contention > ------------------------------------------------------------------------------- > > Key: ARROW-16161 > URL: https://issues.apache.org/jira/browse/ARROW-16161 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ > Reporter: Weston Pace > Priority: Major > > We created a benchmark to measure ExecuteScalarExpression performance in > ARROW-16014. We noticed significant thread contention (even though there > shouldn't be much, if any, for this task) As part of ARROW-16138 we have been > investigating possible causes. > One cause seems to be contention from copying shared_ptr<DataType> objects. > Two possible solutions jump to mind and I'm sure there are many more. > ExecBatch is an internal type and used inside of ExecuteScalarExpression as > well as inside of the execution engine. In the former we can safely assume > the data types will exist for the duration of the call. In the latter we can > safely assume the data types will exist for the duration of the execution > plan. Thus we can probably take a more targetted fix and migrate only > ExecBatch to using DataType* (or const DataType&). > On the other hand, we might consider a more global approach. All of our > "stock" data types are assumed to have static storage duration. However, we > must use std::shared_ptr<DataType> because users could create their own > extension types. We could invent an "extension type registration" system > where extension types must first be registered with the C++ lib before being > used. Then we could have long-lived DataType instances and we could replace > std::shared_ptr<DataType> with DataType* (or const DataType&) throughout most > of the entire code base. > But, as I mentioned, I'm sure there are many approaches to take. CC > [~lidavidm] and [~apitrou] and [~yibocai] for thoughts but this might be > interesting for just about any C++ dev. -- This message was sent by Atlassian Jira (v8.20.1#820001)