[Format][Important] Needed clarification of timezone-less timestamps

Antoine Pitrou Mon, 14 Jun 2021 08:57:10 -0700


Hello,

In ARROW-13033, there was a disagreement as to how the specificationabout timezone-less timestamps should be interpreted.


Here is the wording in the Schema specification:

  /// * If the time zone is null or equal to an empty string, the data is "time
  ///   zone naive" and shall be displayed *as is* to the user, not localized
  ///   to the locale of the user. This data can be though of as UTC but
  ///   without having "UTC" as the time zone, it is not considered to be
  ///   localized to any time zone

My interpretation is that timestamp *values* are always expressed inUTC. The timezone is an optional piece of metadata that describes thecontext in which they were obtained, but do not impact how the *values*should be interpreted.

Joris' interpretation is that timestamp *values* are expressed in anarbitrary "local time" that is unknown and unspecified. It is thereforedifficult to exactly interpret them, since the timezone information isunavailable.

(I'll let Joris express his thoughts more accurately, but the gist ofhis opinion is that "can be thought of as UTC" is only an indication,not a prescription)

To me, the problem with the "unknown local timezone" interpretation isthat it renders the data essentially ambiguous and useless. The problemis very similar to the problem of having string data without awell-known encoding. This is well-known to Python users as the Python 2encoding hell (to the point that it motivated the heavy and disruptivePython 3 transition).

(note the problem is even worse for timestamps. At least, you can with ahigh degree of probability detect that an arbitrary binary string is*not* UTF8-encoded. You cannot do so with timestamp values: any 64-bittimestamp may or may not be a UTC timestamp. Once you have lost thatinformation, you cannot regain it anymore.)

In any case, I think this must be clarified, first on this mailing-list,then by making the spec wording stronger and more prescriptive.


Regards

Antoine.

[Format][Important] Needed clarification of timezone-less timestamps

Reply via email to