Francois Saint-Jacques created ARROW-6854: ---------------------------------------------
Summary: [Dataset] RecordBatchProjector is not thread safe Key: ARROW-6854 URL: https://issues.apache.org/jira/browse/ARROW-6854 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Francois Saint-Jacques While working on ARROW-6769 I noted that RecordbBatchProjector is not thread safe. My goal is to use this class to wrap the ScanTaskIterator in another ScanTaskIterator that projects, so producer (fragments) don't have to know about this schema. The issue is that ScanTask are expected to run on concurrent thread. The projector will be invoked by multiple thread. The lack of concurrency safety is due to adaptivity of input schemas and `SetInputSchema` stores in a local cache. I suggest we refactor into 2 classes. # `RecordBatchProjector` which will work with a static `from` schema, i.e. no adaptivity. The schema is defined at construct time. This class is thread safe to invoke after construction since no local modification is done. # `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash, std::shared_ptr<RecordBatchProjector>] protected with a mutex. -- This message was sent by Atlassian Jira (v8.3.4#803005)