[ 
https://issues.apache.org/jira/browse/ARROW-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weston Pace updated ARROW-12090:
--------------------------------
    Summary: [C++] Expose CSV I/O readahead as a read option  (was: [C++] 
Expose CSV block level readahead as a read option)

> [C++] Expose CSV I/O readahead as a read option
> -----------------------------------------------
>
>                 Key: ARROW-12090
>                 URL: https://issues.apache.org/jira/browse/ARROW-12090
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Minor
>
> All of the CSV readers today base their I/O readahead on the parallelism of 
> the executor (or 2 for the serial reader).  This is a reasonable default if 
> the I/O is homogeneous but better values could presumably be used for some 
> situations.
> For example, if most files are buffered in RAM (and the reader is CPU bound 
> for these files) but some files are not, then you would want the readahead to 
> be large enough to read the unbuffered files while the CPU bound work is 
> being done (assuming you are even lucky enough for things to be scheduled in 
> that way)
> This isn't likely to be much benefit in most situations though and it does 
> add yet another option so I'm not really motivated to do this work until such 
> a situation arises.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to