[
https://issues.apache.org/jira/browse/ARROW-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weston Pace updated ARROW-12090:
--------------------------------
Summary: [C++] Expose CSV I/O readahead as a read option (was: [C++]
Expose CSV block level readahead as a read option)
> [C++] Expose CSV I/O readahead as a read option
> -----------------------------------------------
>
> Key: ARROW-12090
> URL: https://issues.apache.org/jira/browse/ARROW-12090
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Assignee: Weston Pace
> Priority: Minor
>
> All of the CSV readers today base their I/O readahead on the parallelism of
> the executor (or 2 for the serial reader). This is a reasonable default if
> the I/O is homogeneous but better values could presumably be used for some
> situations.
> For example, if most files are buffered in RAM (and the reader is CPU bound
> for these files) but some files are not, then you would want the readahead to
> be large enough to read the unbuffered files while the CPU bound work is
> being done (assuming you are even lucky enough for things to be scheduled in
> that way)
> This isn't likely to be much benefit in most situations though and it does
> add yet another option so I'm not really motivated to do this work until such
> a situation arises.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)