> I find the whole notion of a "timezone naive timestamp" to be nearly meaningless
>From the perspective of, say, the dateutil parser, what would you do with "2020-11-06 07:48"? If you assume it's UTC you'll be wrong in this case. If you assume it is in your local timezone, you'll be wrong in Europe. Timezone-naive datetimes are an abstraction for exactly this case. >>> t0 = pd.Timestamp.now() You can use `pd.Timestamp.now("UTC")`. See also https://mail.python.org/archives/list/datetime-...@python.org/thread/PT4JWJLYBE5R2QASVBPZLHH37ULJQR43/ , https://github.com/pandas-dev/pandas/issues/22451 On Fri, Nov 6, 2020 at 2:48 AM Noam Yorav-Raphael <noamr...@gmail.com> wrote: > Hi, > > I actually arrived at this by first trying to use pandas.Timestamp and > getting very frustrated about it. With pandas, I get: > > >>> pd.Timestamp.now() > Timestamp('2020-11-06 09:45:24.249851') > > I find the whole notion of a "timezone naive timestamp" to be nearly > meaningless. A timestamp should mean a moment in time (as the current numpy > documentation defines very well). A "naive timestamp" doesn't mean > anything. It's exactly like a "unit naive length". I can have a Length type > which just takes a number, and be very happy that it works both if my "unit > zone" is inches or centimeters. So "Length(3)" will mean 3 cm in most of > the world and 3 inches in the US. But then, if I get "Length(3)" from > someone, I can't be sure what length it refers to. > > So currently, this happens with pandas timestamps: > > >>> os.environ['TZ'] = 'UTC'; time.tzset() > ... t0 = pd.Timestamp.now() > ... time.sleep(1) > ... os.environ['TZ'] = 'EST-5'; time.tzset() > ... t1 = pd.Timestamp.now() > ... t1 - t0 > Timedelta('0 days 05:00:01.001583') > > This is not just theoretical - I actually need to work with data from > several devices, each in its own time zone. And I need to know that I won't > get such meaningless results. > > And you can even get something like this: > > >>> t0 = pd.Timestamp.now() > ... time.sleep(10) > ... t1 = pd.Timestamp.now() > ... t1 - t0 > Timedelta('0 days 01:00:10.001583') > > if the first measurement happened to be in winter time and the second > measurement happened to be in daylight saving time. > > The solution is simple, and is what datetime64 used to do before the > change - have a type that just represents a moment in time. It's not "in > UTC" - it just stores the number of seconds that passed since an agreed > moment in time (which is usually 1970-01-01 02:00+0200, which is more > commonly referred to as 1970-01-01 00:00Z - it's the exact same moment). > > I think it would make things clearer if I'll mention that there are > operations that are not dealing with timestamps. For example, it's > meaningless to ask what is the year of a timestamp - it may depend on the > time zone. These are always *human* related questions, that depend on > certain human conventions. We can call them "calendar questions". For these > types of questions, a type that includes both a timestamp and a timezone > offset (in minutes from UTC) can be useful. Some questions even require > full timezone information, meaning a function that defines what's the > timezone offset for each moment. However, I don't think numpy should deal > with those calendar issues. As a very simple example, even for > "timestamp+offset" types, it's not clear how to compare them - should > values with the same timestamp and different offsets be considered equal or > not? And in virtually all of my data analysis, this calendar aspect has > nothing to do with the questions I'm trying to answer. > > I have a suggestion. Instead of changing datetime64 (which I consider to > be ill-defined, but never mind), add a new type called "timestamp64". It > will have the exact same behavior as datetime64 had before the change, > except that its only allowed units will be seconds, milliseconds, > microseconds and nanoseconds. Removing the longer units will make it clear > that it doesn't deal with calendar and dates. Also, all the business day > functionality will not be applicable to timestamp64. In order to get > calendar information (such as the year) from timestamp64, you will have to > manually convert it to python's datetime (or to np.datetime64) with an > explicit timezone (utc, local, an offset, or a timezone object). > > What do you think? > > Thanks, > Noam > > > > > > On Fri, Nov 6, 2020 at 1:45 AM Stephan Hoyer <sho...@gmail.com> wrote: > >> I can try to dig up the old discussions, but datetime64 used to implement >> both (1) and (3), and this was updated in a very intentional way. >> Datetime64 now works like Python's own time-zone naive datetime.datetime >> objects. The documentation referencing "Z" should be updated -- datetime64 >> can be in any timezone you like. >> >> Timezone aware datetime objects are certainly useful, but NumPy's >> datetime64 was restricted to UTC. The consensus was that it was worse to >> have UTC-only rather than timezone-naive-only. NumPy's datetime64 is often >> used for data analysis purposes, for which automatic conversion to the >> local timezone of the computer running the analysis is often >> counter-productive. >> >> If you care about timezone conversions, I would highly recommend looking >> into pandas's Timestamp class for this purpose. In the future, this would >> be a good use-case for a new custom NumPy dtype. (The existing >> np.datetime64 code cannot easily handle multiple timezones.) >> >> On Thu, Nov 5, 2020 at 1:04 PM Eric Wieser <wieser.eric+nu...@gmail.com> >> wrote: >> >>> Without weighing in yet on how I feel about the deprecation, you can see >>> some discussion about why this was originally deprecated in the PR that >>> introduced the warning: >>> >>> https://github.com/numpy/numpy/pull/6453 >>> >>> Eric >>> >>> On Thu, Nov 5, 2020, 20:13 Noam Yorav-Raphael <noamr...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I suggest removing the deprecation warning when constructing a >>>> datetime64 with a timezone. For example, this is the current behavior: >>>> >>>> >>> np.datetime64('2020-11-05 16:00+0200') >>>> <stdin>:1: DeprecationWarning: parsing timezone aware datetimes is >>>> deprecated; this will raise an error in the future >>>> numpy.datetime64('2020-11-05T14:00') >>>> >>>> I suggest removing the deprecation warning because I find this to be a >>>> useful behavior, and because it is a correct behavior. The manual says: >>>> "The datetime object represents a single moment in time... Datetimes are >>>> always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z." >>>> So 2020-11-05T16:00+0200 is indeed the moment in time represented by >>>> np.datetime64('2020-11-05T14:00'). >>>> >>>> I just used this to restrict my data set to records created after a >>>> certain moment. It was easier for me to write the moment in my local time >>>> and add "+0200" than to figure out the moment representation in UTC. >>>> >>>> So this is my simple suggestion: remove the deprecation warning. >>>> >>>> >>>> Beyond that, I have 3 ideas for changing the repr of datetime64 that I >>>> would like to discuss. >>>> >>>> 1. Add "Z" at the end, for example, >>>> numpy.datetime64('2020-11-05T14:00Z'). This will make it clear to which >>>> moment it refers. I think this is significant - I had to dig quite a bit to >>>> realize that datetime64('2020-11-05T14:00') means 14:00 UTC. >>>> >>>> 2. Replace the 'T' with a space. I just find it much easier to read >>>> '2020-11-05 14:00Z' than '2020-11-05T14:00Z'. The long sequence of >>>> characters makes it hard for my brain to parse. >>>> >>>> 3. This will require discussion, but will be very convenient: have the >>>> repr display the time using the environment time zone, including a time >>>> offset. So, in my specific time zone (+0200), I will have: >>>> >>>> repr(np.datetime64('2020-11-05 14:00Z')) == >>>> "numpy.datetime64('2020-11-05T16:00+0200')" >>>> >>>> I'm sure the pros and cons of having an environment-dependent repr >>>> should be discussed. But I will list some pros: >>>> 1. It's very convenient - it's immediately obvious to me to which >>>> moment 2020-11-05 16:00+0200 refers. >>>> 2. It's well defined - I may collect timestamps from machines with >>>> different time zones, and I will be able to know to which exact moment each >>>> timestamp refers. >>>> 3. It's very simple - I could compare any two timestamps, I don't have >>>> to worry about time zones. >>>> >>>> I would be happy to hear your thoughts. >>>> >>>> Thanks, >>>> Noam >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion