Re: [PR] GH-41488: [C++][Python] Apply timestamp_parsers as fallback when parsing CSV date and time columns [arrow]

via GitHub Mon, 22 Jun 2026 08:15:31 -0700


Copilot commented on code in PR #50146:
URL: https://github.com/apache/arrow/pull/50146#discussion_r3453359683



##########
cpp/src/arrow/csv/converter.cc:
##########
@@ -440,6 +441,13 @@ struct SingleParserTimestampValueDecoder : public 
ValueDecoder {
   const TimestampParser& parser_;
 };
 
+std::vector<const TimestampParser*> GetTimestampParsers(const ConvertOptions& 
options) {
+  std::vector<const TimestampParser*> 
parsers(options.timestamp_parsers.size());
+  std::ranges::transform(options.timestamp_parsers, parsers.begin(),
+                         [](const auto& parser) { return parser.get(); });
+  return parsers;
+}

Review Comment:
   `GetTimestampParsers()` uses `std::ranges::transform` unconditionally. 
Elsewhere in the codebase `std::ranges` usage is guarded (or avoided) to 
accommodate environments where `__cpp_lib_ranges` isn’t available even when 
compiling as C++20. Using a simple loop here avoids that portability pitfall 
without changing behavior.



##########
python/pyarrow/tests/test_csv.py:
##########
@@ -1164,6 +1164,65 @@ def test_dates(self):
             'b': [datetime(1970, 1, 2), datetime(1971, 1, 2)],
         }
 
+    def test_dates_with_timestamp_parsers(self):
+        # Custom timestamp parsers are used as a fallback for explicitly-typed
+        # date columns (GH-28303, GH-41488)
+        rows = b"a,b\n15/10/2015,1970-01-02\n18/06/1990,15/10/2015\n"
+        opts = ConvertOptions()
+        opts.column_types = {'a': pa.date32(), 'b': pa.date64()}
+        with pytest.raises(pa.ArrowInvalid,
+                           match="CSV conversion error to date"):
+            self.read_bytes(rows, convert_options=opts)
+
+        opts.timestamp_parsers = ['%d/%m/%Y']
+        table = self.read_bytes(rows, convert_options=opts)
+        schema = pa.schema([('a', pa.date32()),
+                            ('b', pa.date64())])
+        assert table.schema == schema
+        # ISO values keep working alongside custom-parsed ones
+        assert table.to_pydict() == {
+            'a': [date(2015, 10, 15), date(1990, 6, 18)],
+            'b': [date(1970, 1, 2), date(2015, 10, 15)],
+        }
+
+        # Month names are parsed case-insensitively
+        rows = b"a\n15-OCT-15\n18-Jun-90\n"
+        opts = ConvertOptions(column_types={'a': pa.date32()},
+                              timestamp_parsers=['%d-%b-%y'])
+        table = self.read_bytes(rows, convert_options=opts)
+        assert table.to_pydict() == {
+            'a': [date(2015, 10, 15), date(1990, 6, 18)],
+        }

Review Comment:
   This test uses `%b` month-name parsing with hard-coded English abbreviations 
("OCT", "Jun"). `strptime` month/day names are locale-dependent on many 
platforms, so the test can fail under a non-English `LC_TIME`. Consider forcing 
`LC_TIME` to the C locale within the test (and restoring it) to make the 
assertion deterministic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41488: [C++][Python] Apply timestamp_parsers as fallback when parsing CSV date and time columns [arrow]

Reply via email to