[ 
https://issues.apache.org/jira/browse/ARROW-11889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-11889:
-----------------------------------
    Fix Version/s: 5.0.0

> [C++] Add parallelism to streaming CSV reader
> ---------------------------------------------
>
>                 Key: ARROW-11889
>                 URL: https://issues.apache.org/jira/browse/ARROW-11889
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>             Fix For: 5.0.0
>
>
> Currently the streaming CSV reader does not allow for much parallelism.  It 
> doesn't allow for reading more than one segment at once (useful in S3) and it 
> doesn't allow for column fan-out for parsing & converting.
> It seems both of these options would speed up CSV reading in some scenarios 
> although it's possible this is mostly mitigated in cases where there are many 
> more files than cores (as per-file parallelism will occupy all the cores 
> anyways).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to