HuaHuaY commented on code in PR #47524:
URL: https://github.com/apache/arrow/pull/47524#discussion_r2332462200
##########
cpp/src/arrow/csv/options.h:
##########
@@ -188,6 +188,12 @@ struct ARROW_EXPORT WriteOptions {
/// Whether to write an initial header line with column names
bool include_header = true;
+ /// \brief Quoting style of header
+ ///
+ /// If `quoting_header` is `QuotingStyle::None`, then only write quotes when
a column
+ /// name contains structural characters. Otherwise, always quote column
names.
+ QuotingStyle quoting_header = QuotingStyle::AllValid;
Review Comment:
I prefer to keep the same form as csv data's `quoting_style`. But I agree
that these options are easy to misunderstand now. I don't agree to use `Needed`
here. It will cause that `Needed` has two different meanings between header and
data. I think we may need a new option `Minimal` as mentioned in #42032.
For csv header and binary data:
- `None`: Don't write quotes. Return an error when having structural
characters.
- `Minimal`: Don't write quotes, except having structural characters.
- `Needed` or `AllValid`: Write quotes anyway.
For normal data:
- `None`, `Minimal`, or `Needed`: Don't write quotes. Return an error when
having structural characters.
- `AllValid`: Write quotes anyway.
But personally, I think adding a new option for data is a more complicated
thing. I hope that we can handle the header first and I would like to add a new
option and correct header's behavior. Open a new pull request in the future to
handle data's behavior.
Or, in a simple way, using a boolean `header_quote_all_fields` is also ok
for me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]