[GitHub] [arrow] dgreiss commented on pull request #36436: GH-36247: [R] Add write_csv_dataset

via GitHub Sat, 15 Jul 2023 04:58:43 -0700


dgreiss commented on PR #36436:
URL: https://github.com/apache/arrow/pull/36436#issuecomment-1636746387


   > The implementation of `write_csv_arrow()` doesn't do this (maybe it 
should, but that hasn't been done yet), but we could do those things here.
   
   Got it, I created a separate issue #36700  to add the other functions 
`write_delim_arrow()` and `write_tsv_arrow()`
   
   > In the implementation of `open_csv_dataset()`, I also intentionally don't 
allow extra things to be passed in via the `...`, to try to keep things simpler 
to reason about (advanced users can use `open_dataset()` and pass in whatever 
they like there, if they need to).
   
   Makes sense, I'll update it to follow that convention. 
   
   > What I'm wondering is, if we want to take the opportunity to expose the 
options differently here, to create an API for users that is easier to reason 
about.
   > 
   > For example, `readr::write_csv` has the following options exposed in its 
parameters:
   > 
   > * `na`
   > * `append`
   > * `col_names`
   > * `quote`
   > * `escape`
   > * `eol`
   > 
   > We could take these, work out if they map to the Arrow options nicely, and 
do the hard work inside of the function to convert them for the user, a bit 
like we do for the CSV reader.
   
   Right now `na`, `eol` and `delim` map to Arrow's `null_string`, `eol` and 
`delimiter` and I've exposed all of them in the PR. I can make sure those 
options get added as well in #36700. 
   
   The mapping of `readr` options does complicate things if a user uses both 
Arrow and `readr` options (eg. `write_dataset(ds, file, delim = ',', delimiter 
= ';')`). So we'll have to handle that possibility by either throwing an error 
or defaulting to the Arrow option. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] dgreiss commented on pull request #36436: GH-36247: [R] Add write_csv_dataset

Reply via email to