Peter Valdemar Mørch wrote:
From the lack of responses I take it that nobody disagrees that the handling of the timestamp type is fundamentally broken in every version of MySQL.
Silence does not necessarily indicate assent. You asked a complicated question (when the U.S. list members were on holiday rather than reading e-mail, I might add). I agree that it does not do what you expect, but the question of "fundamentally broken" is not so simple.
I'll go ahead and file several bugs, and start changing our code to avoid the timestamp type altogther. I'm quite surprised!
Peter
Peter Valdemar Mørch swp5jhu02-at-sneakemail.com |Lists| wrote:
*******************
Claim
*******************
I seem to have discovered that MySQL cannot handle the hour where DST becomes non-DST reliably (on Oct 31st in CET):
2am 3am | problem | | time | --->-----------DST---->----->----/ / |/ / / /---->------>--- non-DST ------>--------
Essentially, the problem seems to be that "October 31, 2:15 am" is ambiguous and MySQL cannot disambiguate between them.
Well, of course. "October 31, 2:15 am" *is* ambiguous. Which one does it mean? You cannot tell from the value "October 31, 2:15 am" whether it is the first or second occurrence of that time. No system could -- additional info is required.
It seems that internally MySQL stores timestamp values with all timezone and DST/non-DST information intact. But all operations (e.g. TIMEDIFF(), FROM_UNIXTIME() and UPDATE TABLE SET timestamp=timestamp+0) destroy this important information and operate only now()'s timezone ignoring the appropriate original source time zone.
You seem to have a fundamental misunderstanding of the TIMESTAMP type. No timezone or DST information is stored in a TIMESTAMP column. TIMESTAMPs are stored internally as seconds since epoch UTC, according to the manual, but they are always translated to DATETIMES in the current timezone when retrieved. The operations you mention, TIMEDIFF(), FROM_UNIXTIME() and UPDATE TABLE SET timestamp=timestamp+0, all behave as expected, once you understand how they, and timestamps, work
I just find it impossibly difficult to believe I'm the first one finding this, so now I'm curious. Googling has come up short.
I very much hope I'm mistaken and that I don't have to change our code, database format and contents to store # secs since 1970 GMT in an int everywhere and avoid timestamps altogether because they "don't work"...
That may be exactly what you should do, because that seems to be precisely what you've said you want. You want rows to be stamped with seconds since epoch so you can preserve order across the end of DST. Every operation you've attempted seems to be based on the assumption that timestamps are just seconds since epoch, but they aren't.
********************
Evidence for claim
********************
Essentially we have a log table using a timestamp column as a sort key. For our logic to work we need to be able to:
* Sort reliably, because the order of entries is very important (we use the "last" log entry to determine "current" state). Entries that are made during 2am-3am DST *must* come before log entries that are made 2am-3am non-DST just as they follow chronologically in real life.
Timestamps have 1 second resolution. Unless you are guaranteed never to get 2 log entries in the same second, you cannot use a timestamp to sort reliably. Don't you have an auto_increment primary key on which to sort?
This works initially, but gets broken by a "mysqldump db | mysql new_db" cycle. Timestamp values in mysqldump output don't contain DST timezone information and hence of course it is lost during restore. There is no option to mysqldump to maintain this info in output. How does one backup a database then? (Other than cp or mysqlhotcopy which fails if host != localhost)
Right, that's a problem, though I think not quite for the reason you describe. MySQL doesn't handle timezone as part of a datetime anywhere. Timezone is handled separately. Hence, we can't dump the timestamps' timezone/DST status on a row by row basis. The problem is that the internal seconds since epoch is converted to a datetime in the first place. Two different internal values translate to the same external datetime, so it's not a 1 to 1 function. Having lost information on output, you can't get it back on input.
The question is whether or not this is a problem. That is, does this contradict the intended design of the timestamp column?
* Be able to do TIMEDIFF() math and "INSERT INTO TABLE ... VALUES(FROM_UNIXTIME(?))" maintaining the correct time. This is not possible.
You are expecting functions to do what they aren't designed to do. TIMEDIFF() operates on times and datetimes, not on unix timestamps. Hence, it does not operate directly on timestamp columns, but rather on the translated-to-datetime values. Do you see? Once you convert 1099185600 to a datetime, the info you want preserved is already lost. TIMEDIFF is not at fault. It simply operates on the datetimes it is given. The same goes for FROM_UNIXTIME(). It takes seconds since epoch and converts to a datetime. Again, that is not a 1-to-1 function, so the info you want preserved is lost, before you get to the INSERT part.
It seems that for all MySQL functions, the 2am-3am period (DST/non-DST) chosen depends on now(), not the value of ?, yeilding off-by-an-hour errors. As illustrated by this short example (more elaborate examples follow below):
No. Functions operate correctly on the values they are given.
susan:~# cat /etc/timezone Europe/Copenhagen
susan:~# mysql -e ' select 1099185600-1099181400; select TIMEDIFF(FROM_UNIXTIME(1099185600), FROM_UNIXTIME(1099181400));' +-----------------------+ | 1099185600-1099181400 | +-----------------------+ | 4200 | +-----------------------+ +----------------------------------------------------------------+ | TIMEDIFF(FROM_UNIXTIME(1099185600), FROM_UNIXTIME(1099181400)) | +----------------------------------------------------------------+ | 00:10:00 | +----------------------------------------------------------------+
But 4200 secs is 01:10:00, not 00:10:00!!!
Of course! You are not subtracting seconds, you are subtracting datetimes! Please try
SELECT FROM_UNIXTIME(1099185600), FROM_UNIXTIME(1099181400);
You will see the results (in your timezone) are two datetimes which are in fact 10 minutes apart. The lesson here is that if you have values measured in seconds since epoch and you want to find the difference between them, you must not convert to datetimes before subtracting. Instead, simply subtract the seconds. In the case of two timestamp columns, you can access the internal seconds-since-epoch representation with the UNIX_TIMESTAMP() function, so this would be
SELECT UNIX_TIMESTAMP(ts2) - UNIX_TIMESTAMP(ts1);
Before we:
1) File an enhancement request against mysqldump
2) File bugs against UNIX_TIMESTAMP() and TIMEDIFF() et. al.
3) change our database format and code to avoid the timestamp type altogether and use "secondsSince1970 int" instead (huge)
I'd love to hear that I'm mistaken and that these problems can be solved with current mysql code. I've tried 4.1.7 and 4.0.22.
*********************************
More examples
*********************************
I've created a table only with an ID and a timestamp. I've tried filling it in a loop in two ways (perl source code link below):
1) modifying system time and inserting into table using implied timestamp=now(). This works (order by timestamp - IDs are in order):
+----+---------------------+---------------------------+
| ID | timestamp | unix_timestamp(timestamp) |
+----+---------------------+---------------------------+
| 1 | 2004-10-31 01:45:00 | 1099179900 |
| 2 | 2004-10-31 02:15:00 | 1099181700 |
| 3 | 2004-10-31 02:45:00 | 1099183500 |
| 4 | 2004-10-31 02:15:00 | 1099185300 |
| 5 | 2004-10-31 02:45:00 | 1099187100 |
| 6 | 2004-10-31 03:15:00 | 1099188900 |
+----+---------------------+---------------------------+
But TIMEDIFF() on this otherwise correct data is erroneous: select TIMEDIFF(A.timestamp, B.timestamp) from table as A, table as B where A.ID=4 and B.ID=2; +------------------------------------+ | TIMEDIFF(A.timestamp, B.timestamp) | +------------------------------------+ | 00:00:00 | +------------------------------------+
This should have been a full hour!!!
No. As described above, A.timestamp and B.timestamp must be converted to datetimes before TIMEDIFF can operate on them, so the result is correct. The difference in time between them is
SELECT UNIX_TIMESTAMP(A.timestamp) - UNIX_TIMESTAMP(B.timestamp) FROM table as A, table as B WHERE A.ID=4 and B.ID=2;
or
SELECT TIMEDIFF(FROM_UNIXTIME(UNIX_TIMESTAMP(A.timestamp) - UNIX_TIMESTAMP(B.timestamp)), FROM_UNIXTIME(0)) FROM table as A, table as B WHERE A.ID=4 and B.ID=2;
An ever-so-slight modification (+0) on above table.timestamp makes it loose its timezone info:
update table set timestamp=timestamp+0
Try
SELECT ID, timestamp, timestamp+0 FROM table;
You will see
+----+---------------------+----------------+ | ID | timestamp | timestamp+0 | +----+---------------------+----------------+ | 1 | 2004-10-31 01:45:00 | 20041031014500 | | 2 | 2004-10-31 02:15:00 | 20041031021500 | | 4 | 2004-10-31 02:15:00 | 20041031021500 |
When used in numeric context, datetimes are converted to integers by removing the separators. Hence, your update converts the timestamp to a datetime, converts the datetime to an integer (+0), converts the integer to a datetime, then sets the timestamp to the new datetime. As always, converting to a datetime necessarily loses the info you want preserved, as this conversion is not 1-to-1. In general, you cannot simply add numbers to datetimes and expect to get meaningful results.
IDs are now out of order, when order by timestamp!!!!! +----+---------------------+---------------------------+ | ID | timestamp | unix_timestamp(timestamp) | +----+---------------------+---------------------------+ | 1 | 2004-10-31 01:45:00 | 1099179900 | | 2 | 2004-10-31 02:15:00 | 1099185300 | | 4 | 2004-10-31 02:15:00 | 1099185300 | | 3 | 2004-10-31 02:45:00 | 1099187100 | | 5 | 2004-10-31 02:45:00 | 1099187100 | | 6 | 2004-10-31 03:15:00 | 1099188900 | +----+---------------------+---------------------------+ ^ | Out of numerical order!!!!
These are in the correct order based on your updated timestamps.
2) keeping now() in a single time zone and specifying timestamp=FROM_UNIXTIME($val) during insert. Specifying time fails (order by timestamp):
- timestamps are duplicated - IDs are out of order!!!!!
+----+---------------------+---------------------------+
| ID | timestamp | unix_timestamp(timestamp) |
+----+---------------------+---------------------------+
| 1 | 2004-10-31 01:45:00 | 1099179900 |
| 2 | 2004-10-31 02:15:00 | 1099185300 |
| 4 | 2004-10-31 02:15:00 | 1099185300 |
| 3 | 2004-10-31 02:45:00 | 1099187100 |
| 5 | 2004-10-31 02:45:00 | 1099187100 |
| 6 | 2004-10-31 03:15:00 | 1099188900 |
+----+---------------------+---------------------------+
Yes, again because FROM_UNIXTIME() converts to a datetime.
"Extraordinary claims require extraordinary evidence." - Carl Sagan Sorry this is long but I couldn't make it much shorter...
************************** Source code to reproduce ************************** http://demo.capmon.dk/~pvm/mysqlTimestamps/timestamps.pl.txt Output of said code (WITH /etc/timezone == "Europe/Copenhagen"!!) http://demo.capmon.dk/~pvm/mysqlTimestamps/output.txt
I really hope someone has a good idea what to do now!
Peter
My conclusions:
1) A basic limitation of TIMESTAMP columns is that, despite the internal representation, they are effectively DATETIMEs. This may make them unsuitable for your application.
2) Because of the way they are handled, mysqldump cannot preserve timestamp values, as you have reported. This could be viewed as a problem (I would tend to agree that it is), but one could argue that this doesn't matter as long as the intention is that TIMESTAMPs are a kind of DATETIME, in which case the changed value is still "correct".
3) While we can retrieve the internal seconds-since-epoch representation of a timestamp column using UNIX_TIMESTAMP(timestamp_col), there is no way to set the internal representation. That is, you cannot set a timestamp column to be a certain number of seconds since epoch. Timestamp columns always take datetimes as input. Hence, even if you wrote a custom backup solution (or changed mysqldump) to retrieve the internal values from a timestamp column using UNIX_TIMESTAMP(), there is no way I can see to use those to restore the original values.
Changing this, then, requires more than a change to mysqldump. We would need a way to set the internal seconds-since-epoch representation of a timestamp first, then we could modify mysqldump to use it. The simplest solution would be for mysql to use the internal representation whenever a timestamp is used in numeric context.
In the meantime, I would think the simplest solution for you would be to keep your table as is, use the timestamp in the WHERE clause to choose the desired log entries, but use the ID to order the results to keep them sequential. Perhaps there's a reason that won't work for you, but I can't see it from your examples.
Michael
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]