Arrow's decision was not to permit storage of timestamps with
"localized" representation (which is distinct from UTC internal
representation with a different time zone set). The problem really
comes down to the interpretation of "time zone naive" timestamps on
different systems: operations in my opinion should not yield different
results depending on the particular locale of the system where the
operations are being run.

date on my Linux system returns 1622748048, which is 19:21 UTC. If you
encounter 1622748048 without any given time zone, and want to
interpret 1622748048 as CDT (US/Central where I live), then Arrow is
asking you to localize that timestamp to the UTC representation of
19:21 CDT, which is 7 hours later, so you need to add 7 hours of
seconds to the timestamp to adjust it to UTC.

In some systems, if you encounter 1622748048 without time zone
indicated, the behavior of timestamp_day() or timestamp_hour() will
depend on the system locale. We are recommending that the behavior of
these functions should consistently have the UTC interpretation of the
value rather than using the system locale. This is what Python does
with "tz-naive" datetime.datetime objects — if you call access
datetime.hour on a timezone-less datetime.datetime, it will return the
same result no matter where in the world you are.

On Thu, Jun 3, 2021 at 1:19 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:
>
> It seems that Arrow’s timestamp type can either have no time zone or be UTC. 
> I think that is a flawed design, because doesn’t catch user errors.
>
> Suppose you want to find the number of milliseconds between two timestamps. 
> If the first has a timezone and the second is implicitly UTC, then you can 
> convert them both to instants and subtract. But if the first has a timezone 
> and the second has no time zone, you must supply a time zone for the second. 
> So, the subtraction function will have a different signature.
>
> There are many similar operations, where a time zone needs to be supplied, or 
> where you cannot safely mix timestamps with different time zones.
>
> Julian
>
>
> > On Jun 3, 2021, at 11:07 AM, Adam Hooper <a...@adamhooper.com> wrote:
> >
> > On Thu, Jun 3, 2021 at 2:02 PM Adam Hooper <a...@adamhooper.com> wrote:
> >
> >> I understand isAdjustedToUTC=true to mean "timestamp", and
> >> isAdjustedToUTC=false to mean, "int64 and I hope somebody attached some
> >> docs because
> >> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#local-semantics-timestamps-not-normalized-to-utc
> >> lists a whole slew of potential meanings and without extra metadata I'll
> >> never be able to figure out what this column means."
> >>
> >
> > Correcting myself here: Parquet isAdjustedToUTC=false does have just one
> > meaning. It means encoding a "(year, month, day, hour, minute, second,
> > microsecond)" tuple as a single integer.
> >
> > Adam
> >
> > --
> > Adam Hooper
> > +1-514-882-9694
> > http://adamhooper.com
>

Reply via email to