[ https://issues.apache.org/jira/browse/ARROW-10343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou closed ARROW-10343. ---------------------------------- Resolution: Duplicate > [C++] Unable to parse strings into timestamps > --------------------------------------------- > > Key: ARROW-10343 > URL: https://issues.apache.org/jira/browse/ARROW-10343 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Affects Versions: 1.0.1 > Environment: macOS 10.15.7, Python 3.8.2 > Reporter: Niclas Roos > Priority: Minor > Labels: timestamp, timezone > > Hi, > I'm working with parquet files generated by a AWS RDS Postgres snapshot > export. > I'm trying to parse a date column stored as a string into a timestamp, but it > fails. > I've managed to parse the same date format (as in the first example below) > when reading from a csv, so I tried to investigate it as far as I could on my > own, and here's my results: > {code:java} > import pyarrow as pa > import pytz > ################################################################################# > ## the format I get from the database > us_tz_arr = pa.array([ > "2014-12-07 07:48:59.285332+00", > "2014-12-07 08:01:49.758975+00", > "2014-12-07 10:11:35.884304+00"]) > us_tz_arr.cast(pa.timestamp('us', tz=pytz.UTC)) > -> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35.884304+00 > ################################################################################# > ## tried removing the timezone > us_arr = pa.array([ > "2014-12-07 07:48:59.285332", > "2014-12-07 08:01:49.758975", > "2014-12-07 10:11:35.884304"]) > us_arr.cast(pa.timestamp('us')) > -> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35.884304 > ################################################################################# > ## tried removing the microseconds but keeping the timezone > second_tz_arr = pa.array([ > "2014-12-07 07:48:59+00", > "2014-12-07 08:01:49+00", > "2014-12-07 10:11:35+00"]) > second_tz_arr.cast(pa.timestamp('s', tz=pytz.UTC)) > -> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35+00 > ################################################################################# > ## removing microseconds and timezone, makes it work! > s_arr = pa.array([ > "2014-12-07 07:48:59", > "2014-12-07 08:01:49", > "2014-12-07 10:11:35"]) > s_arr.cast(pa.timestamp('s')) > -> <pyarrow.lib.TimestampArray object at 0x7fbdf81ae460> > [ > 2014-12-07 07:48:59, > 2014-12-07 08:01:49, > 2014-12-07 10:11:35 > ]{code} > PS. This is my first bug report, so apologies if important things are > missing. -- This message was sent by Atlassian Jira (v8.20.1#820001)