Sorry, I definitely did NOT mean "Python functions treat a naive
timestamp as if it were a
UTC timestamp."

I am referring to the relationship between the behavior of attribute
accessors like "hour" or "day" and the representation of the data.
datetime.datetime.hour returns the same thing for the same timestamp
ordinal value when there is no time zone wherever you are in the
world. I do not mean that tz-naive data is intended to be casted
implicitly to UTC in operations with tz-aware data, that's crazy talk
=) Operations between tz-naive and tz-aware data are not permitted
without explicit casts / localization.

On Fri, Jun 4, 2021 at 2:18 PM Weston Pace <weston.p...@gmail.com> wrote:
>
> > We are recommending that the behavior of
> > these functions should consistently have the UTC interpretation of the
> > value rather than using the system locale. This is what Python does
> > with "tz-naive" datetime.datetime objects
>
> This is not quite true, although perhaps my reading is incorrect.  I
> read that as "Python functions treat a naive timestamp as if it were a
> UTC timestamp."  Python does not treat a naive timestamp the same as a
> UTC timestamp.  And I think this is the heart of what Julilan's point
> is (which I agree with).  For example, consider this snippet:
>
> >>> import datetime
> >>> import pytz
> >>> x = datetime.datetime.now()
> >>> y = pytz.utc.localize(x)
> >>> x - y
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: can't subtract offset-naive and offset-aware datetimes
>
> x is not assumed to be UTC (If it were I would get
> datetime.timedelta(0) instead of an exception).  Another example:
>
> >>> x.isoformat()
> '2021-06-04T09:09:18.304640'
> >>> y.isoformat()
> '2021-06-04T09:09:18.304640+00:00'
>
> On Fri, Jun 4, 2021 at 7:46 AM Julian Hyde <jhyde.apa...@gmail.com> wrote:
> >
> > The learning there is: library software shouldn’t use anything from its 
> > environment (time zone, locale, encoding, endianness). Functions that use 
> > time zone should always have a time zone parameter.
> >
> > Once you take that step, the functions that work with zoneless timestamps 
> > start to look different to functions that work with local timestamps, and 
> > you start to realize that they should be separate data types.
> >
> > > On Jun 3, 2021, at 12:26 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> > >
> > > Arrow's decision was not to permit storage of timestamps with
> > > "localized" representation (which is distinct from UTC internal
> > > representation with a different time zone set). The problem really
> > > comes down to the interpretation of "time zone naive" timestamps on
> > > different systems: operations in my opinion should not yield different
> > > results depending on the particular locale of the system where the
> > > operations are being run.
> > >
> > > date on my Linux system returns 1622748048, which is 19:21 UTC. If you
> > > encounter 1622748048 without any given time zone, and want to
> > > interpret 1622748048 as CDT (US/Central where I live), then Arrow is
> > > asking you to localize that timestamp to the UTC representation of
> > > 19:21 CDT, which is 7 hours later, so you need to add 7 hours of
> > > seconds to the timestamp to adjust it to UTC.
> > >
> > > In some systems, if you encounter 1622748048 without time zone
> > > indicated, the behavior of timestamp_day() or timestamp_hour() will
> > > depend on the system locale. We are recommending that the behavior of
> > > these functions should consistently have the UTC interpretation of the
> > > value rather than using the system locale. This is what Python does
> > > with "tz-naive" datetime.datetime objects — if you call access
> > > datetime.hour on a timezone-less datetime.datetime, it will return the
> > > same result no matter where in the world you are.
> > >
> > > On Thu, Jun 3, 2021 at 1:19 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:
> > >>
> > >> It seems that Arrow’s timestamp type can either have no time zone or be 
> > >> UTC. I think that is a flawed design, because doesn’t catch user errors.
> > >>
> > >> Suppose you want to find the number of milliseconds between two 
> > >> timestamps. If the first has a timezone and the second is implicitly 
> > >> UTC, then you can convert them both to instants and subtract. But if the 
> > >> first has a timezone and the second has no time zone, you must supply a 
> > >> time zone for the second. So, the subtraction function will have a 
> > >> different signature.
> > >>
> > >> There are many similar operations, where a time zone needs to be 
> > >> supplied, or where you cannot safely mix timestamps with different time 
> > >> zones.
> > >>
> > >> Julian
> > >>
> > >>
> > >>> On Jun 3, 2021, at 11:07 AM, Adam Hooper <a...@adamhooper.com> wrote:
> > >>>
> > >>> On Thu, Jun 3, 2021 at 2:02 PM Adam Hooper <a...@adamhooper.com> wrote:
> > >>>
> > >>>> I understand isAdjustedToUTC=true to mean "timestamp", and
> > >>>> isAdjustedToUTC=false to mean, "int64 and I hope somebody attached some
> > >>>> docs because
> > >>>> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#local-semantics-timestamps-not-normalized-to-utc
> > >>>> lists a whole slew of potential meanings and without extra metadata 
> > >>>> I'll
> > >>>> never be able to figure out what this column means."
> > >>>>
> > >>>
> > >>> Correcting myself here: Parquet isAdjustedToUTC=false does have just one
> > >>> meaning. It means encoding a "(year, month, day, hour, minute, second,
> > >>> microsecond)" tuple as a single integer.
> > >>>
> > >>> Adam
> > >>>
> > >>> --
> > >>> Adam Hooper
> > >>> +1-514-882-9694
> > >>> http://adamhooper.com
> > >>
> >

Reply via email to