[jira] [Commented] (ARROW-8631) [C++][Dataset] Add ConvertOptions and ReadOptions to CsvFileFormat

David Li (Jira) Tue, 16 Mar 2021 07:13:10 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302561#comment-17302561
 ]


David Li commented on ARROW-8631:
---------------------------------

Following on from ARROW-9749, we now have ConvertOptions. ReadOptions doesn't 
fit neatly into either the dataset-global or scan-specific buckets (skip_rows, 
column names are in the former; block_size is in the latter) so fields will 
have to be inlined (or it could be made part of CsvFileFormat with an explicit 
block_size field in CsvFragmentScanOptions). 

I'll also plumb through the options into Python and add a ScannerBuilder method 
for setting them.

> [C++][Dataset] Add ConvertOptions and ReadOptions to CsvFileFormat
> ------------------------------------------------------------------
>
>                 Key: ARROW-8631
>                 URL: https://issues.apache.org/jira/browse/ARROW-8631
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 0.17.0
>            Reporter: Ben Kietzman
>            Assignee: David Li
>            Priority: Major
>              Labels: dataset, pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/pull/7033 does not add ConvertOptions 
> (including alternate spellings for null/true/false, etc) or ReadOptions 
> (block_size, column name customization, etc). These will be helpful but will 
> require some discussion to find the optimal way to integrate them with 
> dataset::



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-8631) [C++][Dataset] Add ConvertOptions and ReadOptions to CsvFileFormat

Reply via email to