no23reason opened a new issue, #38528:
URL: https://github.com/apache/arrow/issues/38528
### Describe the enhancement requested
The `pyarrow.compute.strptime` handles the `%Y` format part (i.e. 4-digit
year) differently from the built-in `time.strptime`:
When the year part of the input has only two digits, `time.strptime` fails
to parse it, while `pyarrow.compute.strptime` parses it with no error yielding
a result with a year in the 1st century.
For example
```python
>>> import pyarrow as pa
>>> import pyarrow.compute as pc
>>> import time
>>> input = "01-01-23"
>>> format = "%m-%d-%Y"
>>> time.strptime(input, format)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Users/no23reason/.pyenv/versions/3.11.4/lib/python3.11/_strptime.py", line
562, in _strptime_time
tt = _strptime(data_string, format)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/Users/no23reason/.pyenv/versions/3.11.4/lib/python3.11/_strptime.py", line
349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data '01-01-23' does not match format '%m-%d-%Y'
>>> pc.strptime(pa.array([input]), format=format, unit="s")
<pyarrow.lib.TimestampArray object at 0x152e40c40>
[
0023-01-01 00:00:00
]
```
I believe the `pyarrow.compute.strptime` should also fail in this case as it
most likely means that the format is wrong.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]