On Nov 19, 2009 1:18am, Andrew Gierth <and...@tao11.riddles.org.uk> wrote:

Right, but including more data in a single type is the wrong approach,
since it complicates the semantics and interferes with normalization.
For example, if you have a type T which incorporates a timestamp and a
timezone, what semantics does the T = T operator have? What semantics
apply if the definitions of timezones change?

I dont get the thing about normalization, there's complete ortoghonality in my approach. And when you say "complicates the semantic" I'd say "enrich the semantics" (and even clarify it) so that it fits more neatly to the typical usage of dates and times in real life. For "datetimes with tz" the equality (and comparison) operator is not trivial, roughly in the same sense that date-time arithmetic is not trivial when one stops thinking of
datetimes as "physical time". So is life. Should the datetimes
'2010-07-27 9:30 Chile' and '2010-07-27 10:30 Argentine' (GMT+4 and GMT+3 respec) be considered equal? It's arguable; but the ambiguity (just a matter of adoption) reflects
reality. We can discuss it and adopt some consistent criteria.

What if you're storing
times of events at specific places; in that case you want to associate
the timezone with the _place_ not the event (so that if the timezone
rules change, moving the place from one timezone to another, you only
have to change the place, not all the events that refer to it).

I'm not sure I undestand you here. I'm claiming that timezone rules alterations (zic files changes) should always be supported by the db implementation, without needing of touching your data. And I believe that timestamps (ie physical times) are in practice almost never associated to timezone information. If you want to store "the instant of last solar eclipse" you normally store the timestamp, a timezone might only be useful for displaying (or as an adittional info, not really associated to the event) A border case would be "store the instant of the death of John Lennon". You might store the TZ here if you are interested in the civil time (so you can answer, for example, ¿how many rock stars died in morning/afternoon?). But then, again, you are here actually storing a civil date (local date-time plus TZ). The only problematic case i can envision is to intend to store a physical time in the future with TZ, but frankly it is difficult to think of this scenario (and even more difficult to think of needing to operate with that data as a whole; hence, in this case, to store the two fields separatadely makes sense).

I'm being dense, and this might be a lost cause, but anyway, perhaps some day in the future
this might be of some use:

I strongly believe that, if one could sample the real needings and usage of date-time types in applications in this world, and taking apart types DATE (very frequent, but rather straightforward), and TIME (not so relevant) and intervals (other issues here, much related to datetimes), the
overwhelming majority would fall ( conceptually) into these three types:

- TIMESTAMP (physical time - no TZ - no civil time implied)
- LOCAL DATETIME (civil time, no TZ)
- DATETIME (civil time with TZ => togheter with zic tables, implies a physical time)

And of these three -I'd bet- the first is (conceptually) the most common, by a wide margin.

As the name TIMESTAMP implies, it frequently records the moment of a event (in the DB corresponds frequently to the creation or alteration of a record, frequently via a "now()" default or such). Examples: the timestamp of messages in a mailing list, or issues in a bugtracker, or posts/articles in a blog/Cms. Sometimes it is modifiable by the user. Sometimes it is displayed (as a civil date, of course) according to some TZ implied somewhere else. It's normal that users with differnt TZ sees this event each with its own TZ; and one is not directly interested on obtaining (say) an "inherent" civil datetime for the event (for example one is not interested in asking what posts where generated at midnight
acording to the localtime of the user that created it).

The LOCAL DATETIME is only of use for civil date-times, when one is not directly interested in asociate events with real (physicial time) - this cannot be compared with a real time (it cant trigger alarms, eg) Or, more rarely, when the TZ is implied somehere else (in the application, not it the DB server!).

The DATETIME is equivalent to the compound type {LOCAL_DATETIME,TZ}. Here the "civil date-time" is again the primary concept one deals with, but in a given place in the world (TZ), so it implies also (with the assistance of a zic table) a real time. This type is, IMHO, less frequent than the others. The typical use is for calendars or schedulers.

One could, a propos Andrew's observation, consider a fourth type: TIMESTAMP WITH TZ. But it seems overkill: except for ZIC changes, the correspondence with DATETIME is univocal (BTW, this is why in the Jodatime API -which does not deal with persistence- this concepts are strictly equivalent). Given this nearly-equivalence, and that the needing of this type in real life is (IMO) almost null, I think that DATETIME is the one to survive.

(One could even propose a fifht type: a TIMESTAMP WITH GMT OFFSET (roughly the ANSI proposal), which would be equivalent to have a timestamp AND a local datetime; this is more easy to deal with than timezones, but (as was discussed here before) is too limited (does not allow artithmetic) and is not orthogonal with the real useful types.)

Regarding implementation:

TIMESTAMP is straightforward, more or less the same as today: stored as UTC, can be input/output in ISO 8601 format (the client/server can use the offset they like, internally it's translated to GMT+0)

LOCAL_DATETIME also is straightforward, also stored as UTC (as in GMT+0) . BUT
- input/output in ISO 8601 format should not allow/produce GMT offset
- the similarity of implementation should not leak upwards. This types are incompatible, cannot be compared, etc

DATETIME is the difficult one, of course.
- Equivalent to the pair {LOCAL_DATETIME,TX_id} (occupies more space)
- Requires some catalog table or something akin to codify consistently the timezones as numbers (included in pg_dump output?) - Requires new definitions for input/output (and deal with some ambiguities, particulary in DST transitions)
- Requires some semantic definitions (orderig, equality)
- Some arithmetics (which involve convertion to physic time) may be expensive, might require some aggresive caching of time (zic) calculations.

Global considerations:
- Backward compatibility?
- SQL spec compatibility? (is worsened?) (deprecate TIMESTAMP WITH TIMEZONE?)
- implement conversion functions - castings (how strict?)
- discuss/implement interval types/functions
- interfaces (JDBC...)

A bit of work, granted... (I might help)... but I bet that the most difficult work, by far, is to
reach an agreement :-)

Thanks for reading.

Hernán J. González

Reply via email to