On Mon, 14 Jun 2021 at 17:57, Antoine Pitrou <anto...@python.org> wrote:
>
> ...
>
> Joris' interpretation is that timestamp *values* are expressed in an
> arbitrary "local time" that is unknown and unspecified. It is therefore
> difficult to exactly interpret them, since the timezone information is
> unavailable.
>
> (I'll let Joris express his thoughts more accurately, but the gist of
> his opinion is that "can be thought of as UTC" is only an indication,
> not a prescription)

That's indeed correct. One clarification: you can interpret them as
is, and for many applications this is fine. It's only when you want to
interpret them as an absolute point in time that the user needs to
supply a timezone to interpret them.

For the rest, Wes' responses already cover my viewpoint (as a pandas
maintainer, I of course have a similar perspective on this looking at
this from the pandas implementation he wrote).

An additional source that explains the "local semantics" of naive
timestamps well IMO, and especially explains the "can be thought of as
UTC without being UTC" aspect, is the parquet format docs:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#local-semantics-timestamps-not-normalized-to-utc
(it's of course about Parquet and not Arrow, but the explanation is
relevant for the Arrow spec as well).

Joris

Reply via email to