[ https://issues.apache.org/jira/browse/ARROW-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-12386: ----------------------------------- Labels: pull-request-available (was: ) > [C++] Support file parallelism in AsyncScanner > ---------------------------------------------- > > Key: ARROW-12386 > URL: https://issues.apache.org/jira/browse/ARROW-12386 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Weston Pace > Assignee: Weston Pace > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Whether we pull from files in parallel or not is controlled by how we merge > the batch streams in `AsyncScanner::ScanBatchesUnorderedAsync`. Currently we > are relying on `MakeConcatenatedGenerator` which is incorrect. This is > needed because `MakeMergedGenerator` pulls from its source (an > `EnumeratingGenerator`) in an async reentrant fashion. `MakeMergedGenerator` > should not do this. If some kind of readahead is truly necessary there then > use `MakeReadaheadGenerator`. -- This message was sent by Atlassian Jira (v8.3.4#803005)