Copilot commented on code in PR #50146:
URL: https://github.com/apache/arrow/pull/50146#discussion_r3453359683
##########
cpp/src/arrow/csv/converter.cc:
##########
@@ -440,6 +441,13 @@ struct SingleParserTimestampValueDecoder : public
ValueDecoder {
const TimestampParser& parser_;
};
+std::vector<const TimestampParser*> GetTimestampParsers(const ConvertOptions&
options) {
+ std::vector<const TimestampParser*>
parsers(options.timestamp_parsers.size());
+ std::ranges::transform(options.timestamp_parsers, parsers.begin(),
+ [](const auto& parser) { return parser.get(); });
+ return parsers;
+}
Review Comment:
`GetTimestampParsers()` uses `std::ranges::transform` unconditionally.
Elsewhere in the codebase `std::ranges` usage is guarded (or avoided) to
accommodate environments where `__cpp_lib_ranges` isn’t available even when
compiling as C++20. Using a simple loop here avoids that portability pitfall
without changing behavior.
##########
python/pyarrow/tests/test_csv.py:
##########
@@ -1164,6 +1164,65 @@ def test_dates(self):
'b': [datetime(1970, 1, 2), datetime(1971, 1, 2)],
}
+ def test_dates_with_timestamp_parsers(self):
+ # Custom timestamp parsers are used as a fallback for explicitly-typed
+ # date columns (GH-28303, GH-41488)
+ rows = b"a,b\n15/10/2015,1970-01-02\n18/06/1990,15/10/2015\n"
+ opts = ConvertOptions()
+ opts.column_types = {'a': pa.date32(), 'b': pa.date64()}
+ with pytest.raises(pa.ArrowInvalid,
+ match="CSV conversion error to date"):
+ self.read_bytes(rows, convert_options=opts)
+
+ opts.timestamp_parsers = ['%d/%m/%Y']
+ table = self.read_bytes(rows, convert_options=opts)
+ schema = pa.schema([('a', pa.date32()),
+ ('b', pa.date64())])
+ assert table.schema == schema
+ # ISO values keep working alongside custom-parsed ones
+ assert table.to_pydict() == {
+ 'a': [date(2015, 10, 15), date(1990, 6, 18)],
+ 'b': [date(1970, 1, 2), date(2015, 10, 15)],
+ }
+
+ # Month names are parsed case-insensitively
+ rows = b"a\n15-OCT-15\n18-Jun-90\n"
+ opts = ConvertOptions(column_types={'a': pa.date32()},
+ timestamp_parsers=['%d-%b-%y'])
+ table = self.read_bytes(rows, convert_options=opts)
+ assert table.to_pydict() == {
+ 'a': [date(2015, 10, 15), date(1990, 6, 18)],
+ }
Review Comment:
This test uses `%b` month-name parsing with hard-coded English abbreviations
("OCT", "Jun"). `strptime` month/day names are locale-dependent on many
platforms, so the test can fail under a non-English `LC_TIME`. Consider forcing
`LC_TIME` to the C locale within the test (and restoring it) to make the
assertion deterministic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]