Re: [I] Add quote-style parameter for CSV options [datafusion]

2024-05-25 Thread via GitHub


DDtKey commented on issue #10669:
URL: https://github.com/apache/datafusion/issues/10669#issuecomment-2131388179

   I think this might be labeled with `good first issue`, there are links to 
the code that needs to be changed and it is also possible to write 
`sqllogictest` similar to https://github.com/apache/datafusion/pull/10671.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



[I] Add quote-style parameter for CSV options [datafusion]

2024-05-25 Thread via GitHub


DDtKey opened a new issue, #10669:
URL: https://github.com/apache/datafusion/issues/10669

   ### Is your feature request related to a problem or challenge?
   
   CSV writers usually supports configuration of quote style/mode with the 
following options:
   - `Always`
   - `Necessary`
   - `Never`
   - `NonNumeric`
   
   Sometimes this just need to be controlled, and for now only way to change 
that is to re-iterate through result file(s) in order to store the content with 
desired quote style.
   
   You can find such configs in many libraries:
   - `csv` crate 
([`QuoteStyle`](https://docs.rs/csv/latest/csv/enum.QuoteStyle.html)), 
   - `csv` from python (constants, like 
[`QUOTE_ALL`](https://docs.python.org/3/library/csv.html#csv.QUOTE_ALL)
   - in Appach Commons CSV for Java 
([`QuoteMode`](https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/QuoteMode.html))
   
   ### Describe the solution you'd like
   
   Just expose a way to pass the `QuoteStyle` enum along with other properties 
like `quote`, `delimiter` and etc (as part of `CsvOptions`). However, need to 
keep in mind that the configuration only makes sense for writers, not readers.
   
   
   That shouldn't be an issue to support, because `datafusion` relies on 
`arrow-csv` which uses `csv` crate under the hood.
   
   - requires to update `arrow-csv` to accept quote-style param (sub-issue for 
`arrow-rs`?) 
 - add to `WriterBuilder`: 
https://github.com/apache/arrow-rs/blob/4b5d9bfc958c06fb1ff71d90ba58497e965eff40/arrow-csv/src/writer.rs#L191-L214
 - pass to `csv::Writer`: 
https://github.com/apache/arrow-rs/blob/4b5d9bfc958c06fb1ff71d90ba58497e965eff40/arrow-csv/src/writer.rs#L402-L408
   - update `datafusion`
 - add parameter to `CsvOptions`: 
https://github.com/apache/datafusion/blob/ea92ae72f7ec2e941d35aa077c6a39f74523ab63/datafusion/common/src/config.rs#L1554-L1570
 - pass to `arrow-csv`: 
https://github.com/apache/datafusion/blob/ea92ae72f7ec2e941d35aa077c6a39f74523ab63/datafusion/common/src/file_options/csv_writer.rs#L48-L75
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org