[GitHub] [arrow] bkietz commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

GitBox Wed, 29 Apr 2020 07:01:26 -0700


bkietz commented on pull request #7033:
URL: https://github.com/apache/arrow/pull/7033#issuecomment-621229069



   @jorisvandenbossche 
   
   > do we plan to use the CsvFileFormat also for writing? ... For Parquet 
there is a ReaderOptions that grouped all options related to reading in the 
ParquetFileFormat
   
   We added ReaderOptions to avoid collisions with extant parquet write 
options. Since we don't have CSV write options avoiding collisions seems like 
specualtive generality; this can be handled when/if we add CSV writing. Also 
note that (for example) `ParseOptions::delimiter` would also be used when 
writing
   
   > I assume this is because the ParseOptions are used in its entirety, and 
for csv.ConvertOptions and csv.ReadOptions, only part of them is valid to set 
here? (another option could also be to check that those options are not set and 
if set raise an error to the user about that)
   
   That is indeed why I used only ParseOptions whole. It seemed better to me 
than documenting which fields are ignored or emitting errors when things aren't 
defaulted.
   
   > Do we want to allow reading csv files without a header line with column 
names?
   
   I can add `column_names` and `autogenerate*` too if required, but maybe that 
could wait for a follow up? Since the way I handled `/{Convert,Read}Options/` 
is evidently not transparent maybe everything except ParseOptions should wait 
as well?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] bkietz commented on pull request #7033: ARROW-7759: [C++][Dataset] Add CsvFileFormat

Reply via email to