Weston Pace created ARROW-12090:
-----------------------------------

             Summary: [C++] Expose CSV block level readahead as a read option
                 Key: ARROW-12090
                 URL: https://issues.apache.org/jira/browse/ARROW-12090
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace
            Assignee: Weston Pace


All of the CSV readers today base their I/O readahead on the parallelism of the 
executor (or 2 for the serial reader).  This is a reasonable default if the I/O 
is homogeneous but better values could presumably be used for some situations.

For example, if most files are buffered in RAM (and the reader is CPU bound for 
these files) but some files are not, then you would want the readahead to be 
large enough to read the unbuffered files while the CPU bound work is being 
done (assuming you are even lucky enough for things to be scheduled in that way)

This isn't likely to be much benefit in most situations though and it does add 
yet another option so I'm not really motivated to do this work until such a 
situation arises.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to