[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576858#comment-17576858 ]
Antoine Pitrou commented on ARROW-17313: ---------------------------------------- The intent of datasets has always be that each file format defines its own granularity for reading files. I don't understand why the consumer would specify byte ranges by hand. [~bkietz] What is your opinion on this? > [C++] Add Byte Range to CSV Reader ReadOptions > ---------------------------------------------- > > Key: ARROW-17313 > URL: https://issues.apache.org/jira/browse/ARROW-17313 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python > Reporter: Ziheng Wang > Assignee: Ziheng Wang > Priority: Major > > Sometimes it's desirable to just read a portion of a CSV. The best way to do > that is to pass in a list of byte ranges to CSV read options that specify > where in the CSV you want to read. These byte ranges don't necessarily have > to be aligned on line break boundaries, the CSV reader should just read until > the end of the line, and skip anything before the first line break in a byte > range. > Based on discussion, the scope is going to be reduced here. The first > implementation will support a single byte range that is already assumed to be > aligned on byte boundaries. > Will not handle quotes/returns and other edge cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)