New submission from Skip Montanaro:

I have a CSV file. Here are a few rows:

"2013-10-30 14:26:46.000528","1.36097023829"
"2013-10-30 14:26:46.999755","1.36097023829"
"2013-10-30 14:26:47.999308","1.36097023829"
"2013-10-30 14:26:49.002472","1.36097023829"
"2013-10-30 14:26:50","1.36097023829"
"2013-10-30 14:26:51.000549","1.36097023829"
"2013-10-30 14:26:51.999315","1.36097023829"
"2013-10-30 14:26:52.999703","1.36097023829"
"2013-10-30 14:26:53.999640","1.36097023829"
"2013-10-30 14:26:54.999139","1.36097023829"

I want to parse the strings in the first column as timestamps. I can, and often 
do, use dateutil.parser.parse(), but in situations like this where all the 
timestamps are of the same format, it can be incredibly slow. OTOH, there is no 
single format I can pass to datetime.datetime.strptime() that will parse all 
the above timestamps. Using "%Y-%m-%d %H:%M:%S" I get errors about the leftover 
microseconds. Using "%Y-%m-%d %H:%M:%S".%f" I get errors when I try to parse a 
timestamp which doesn't have microseconds.

Alas, it is datetime itself which is to blame for this problem. The above 
timestamps were all printed from an earlier Python program which just dumps the 
str() of a datetime object to its output CSV file. Consider:

>>> dt = dateutil.parser.parse("2013-10-30 14:26:50")
>>> print dt
2013-10-30 14:26:50
>>> dt2 = dateutil.parser.parse("2013-10-30 14:26:51.000549")
>>> print dt2
2013-10-30 14:26:51.000549

The same holds for isoformat():

>>> print dt.isoformat()
2013-10-30T14:26:50
>>> print dt2.isoformat()
2013-10-30T14:26:51.000549

Whatever happened to "be strict in what you send, but generous in what you 
receive"? If strptime() is going to complain the way it does, then str() should 
always generate a full timestamp, including microseconds. The above is from a 
Python 2.7 session, but I also confirmed that Python 3.3 behaves the same.

I've checked 2.7 and 3.3 in the Versions list, but I don't think it can be 
fixed there. Can the __str__ and isoformat methods of datetime (and time) 
objects be modified for 3.4 to always include the microseconds? Alternatively, 
can the %S format character be modified to consume optional decimal point and 
microseconds? I rate this as "easy" considering the easiest fix is to modify 
__str__ and isoformat, which seems unchallenging.

----------
components: Extension Modules
keywords: easy
messages: 201917
nosy: skip.montanaro
priority: normal
severity: normal
status: open
title: Inconsistency between datetime's str()/isoformat() and its strptime() 
method
type: behavior
versions: Python 2.7, Python 3.3, Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19475>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to