[
https://issues.apache.org/jira/browse/ARROW-11889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou updated ARROW-11889:
-----------------------------------
Fix Version/s: 5.0.0
> [C++] Add parallelism to streaming CSV reader
> ---------------------------------------------
>
> Key: ARROW-11889
> URL: https://issues.apache.org/jira/browse/ARROW-11889
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Priority: Major
> Fix For: 5.0.0
>
>
> Currently the streaming CSV reader does not allow for much parallelism. It
> doesn't allow for reading more than one segment at once (useful in S3) and it
> doesn't allow for column fan-out for parsing & converting.
> It seems both of these options would speed up CSV reading in some scenarios
> although it's possible this is mostly mitigated in cases where there are many
> more files than cores (as per-file parallelism will occupy all the cores
> anyways).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)