On Mon, 14 Jun 2021 at 17:57, Antoine Pitrou <anto...@python.org> wrote: > > ... > > Joris' interpretation is that timestamp *values* are expressed in an > arbitrary "local time" that is unknown and unspecified. It is therefore > difficult to exactly interpret them, since the timezone information is > unavailable. > > (I'll let Joris express his thoughts more accurately, but the gist of > his opinion is that "can be thought of as UTC" is only an indication, > not a prescription)
That's indeed correct. One clarification: you can interpret them as is, and for many applications this is fine. It's only when you want to interpret them as an absolute point in time that the user needs to supply a timezone to interpret them. For the rest, Wes' responses already cover my viewpoint (as a pandas maintainer, I of course have a similar perspective on this looking at this from the pandas implementation he wrote). An additional source that explains the "local semantics" of naive timestamps well IMO, and especially explains the "can be thought of as UTC without being UTC" aspect, is the parquet format docs: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#local-semantics-timestamps-not-normalized-to-utc (it's of course about Parquet and not Arrow, but the explanation is relevant for the Arrow spec as well). Joris