Re: [Numpy-discussion] start of an array (tensor) and dataframe API standardization initiative

2020-11-12 Thread Matti Picus



On 11/10/20 8:19 PM, Ralf Gommers wrote:

Hi all,

I'd like to share an update on this topic. The draft array API 
standard is now ready for wider review:


- Blog post: https://data-apis.org/blog/array_api_standard_release 

- Array API standard document: 
https://data-apis.github.io/array-api/latest/

- Repo: https://github.com/data-apis/array-api/

It would be great if people - and in particular, NumPy maintainers - 
could have a look at it and see if that looks sensible from a NumPy 
perspective and whether the goals and benefits of adopting it are 
described clearly enough and are compelling.




I think it is compelling for a first version. The test suite and 
benchmark suite will be valuable tools. I hope future versions 
standardize complex numbers as a dtype. I realize there is a limit to 
the breadth of the scope of functions to be covered. Is there a page 
that lists them in one place? For instance I tried to look up what the 
standard has to say on issue https://github.com/numpy/numpy/issues/17760 
about using bincount on unt64 arrays. It took me a while to figure out 
that bincount was not in the API (although unique(..., return_counts) is).



Matti

___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add the timestamp64 type (Noam Yorav-Raphael)

2020-11-12 Thread Stefano Miccoli


On 11 Nov 2020, at 18:00, 
numpy-discussion-requ...@python.org 
wrote:

I propose to add a new type called "timestamp64". It will be a pure timestamp, 
meaning that it represents a moment in time (as seconds/ms/us/ns since the 
epoch), without any timezone information.

Sorry, but I really don see the usefulness for another time stamping format 
based on POSIX time. Indeed POSIX time is based on a naive approximation of UTC 
and is ambiguous across leap seconds. Quoting from Wikipedia 


The Unix time number 1483142400 is thus ambiguous: it can refer either to start 
of the leap second (2016-12-31 23:59:60) or the end of it, one second later 
(2017-01-01 00:00:00). In the theoretical case when a negative leap second 
occurs, no ambiguity is caused, but instead there is a range of Unix time 
numbers that do not refer to any point in UTC time at all.

Precision time stamping is quite a complex task: you can use UTC, TAI, GPS, 
just to mention the most used timescales. And how do you deal with timestamps 
in the past, when timekeeping was based on earth rotation, and not atomic 
clocks ticking at (approximately) 1 SI-second frequency?

In my opinion time-stamping should be application dependent, and I doubt that 
the new “timestamp64” could be beneficial to the numpy community.

Best regards,

Stefano
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add the timestamp64 type (Noam Yorav-Raphael)

2020-11-12 Thread Matti Picus



On 11/12/20 6:04 PM, Stefano Miccoli wrote:



On 11 Nov 2020, at 18:00, numpy-discussion-requ...@python.org 
 wrote:


I propose to add a new type called "timestamp64". It will be a pure 
timestamp, meaning that it represents a moment in time (as 
seconds/ms/us/ns since the epoch), without any timezone information.


Sorry, but I really don see the usefulness for another time stamping 
format based on POSIX time. Indeed POSIX time is based on a naive 
approximation of UTC and is ambiguous across leap seconds. Quoting 
from Wikipedia 


...



In a one-on-one discussion with Noam in a pre-community call (that, how 
ironically, we had time for since we both messed up the meeting 
time-zone change) we reached the conclusion that the request is to 
clarify whether NumPy's datetime64 represents TAI time [0] or POSIX 
time, with a preferecne for TAI time. The documentation mentions POSIX 
time[1]. As Stefano points out, there is a couple of seconds difference 
between POSIX (or Unix) time and TAI time. In practice numpy simply 
stores a int64 value to represent the datetime64, and relies on others 
to convert it. The leap-second might be getting lost in the conversions. 
So it might make sense to clarify exactly how those conversions deal 
with the leap-seconds and choose which one we mean when we use 
datetime64. Noam please correct me if I am mistaken.



Matti


[0] https://en.wikipedia.org/wiki/International_Atomic_Time

[1] 
https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add the timestamp64 type (Noam Yorav-Raphael)

2020-11-12 Thread Noam Yorav-Raphael
Hi Matti and Stefano,

My understanding is that datetime64 was decided to be neither TAI nor posix
time, but rather represent an abstract calendar point, like
datetime.datetime without a specified timezone. This can usually be
converted into posix time given a timezone (although in the "repeated" hour
between DST and winter time there will be ambiguity!) If it is agreed by
all users that a datetime64 represents the time in UTC, it is the same as
posix time.

I would like to have a type that is defined to be equivalent to posix time.
I don't agree with Stefano, I think that posix time is very useful (as I
think its ubiquity shows that), and I think that a type that is defined to
be posix time would also be very useful. I think that posix time is well
suited for the vast majority of use cases. Indeed, there are use cases
where you should take into account leap seconds, but those are rare. In
practice, a leap second would be presented by the OS as a second that
actually takes more than a second. This actually happens all the time
without leap seconds - when your computer automatically syncs with ntp, it
adjusts the time continuously, so applications will not experience "time
bumps". If you want to make sure that the intervals you measure are
correct, you should use something like time.monotonic().

So, most users are not interested in very precise time measurements, but
rather in knowing what happened before what, and roughly when. For this,
posix time is great - it's very simple, and does the job. In some cases you
need to take into account leap seconds, but in those cases, just using the
computer clock will not give you the precision you need no matter what - so
you'll need specialized software anyway.

I think that posix time is great, and since it's very easy to make wrong
decisions that seem to work until you discover they don't (such as
discovering too late that local time won't work when you are not sure of
the time zone, or when you switch from DST to winter time), a sane and
simple default is important.

Cheers,
Noam





On Thu, Nov 12, 2020 at 6:41 PM Matti Picus  wrote:

>
> On 11/12/20 6:04 PM, Stefano Miccoli wrote:
> >
> >
> >> On 11 Nov 2020, at 18:00, numpy-discussion-requ...@python.org
> >>  wrote:
> >>
> >> I propose to add a new type called "timestamp64". It will be a pure
> >> timestamp, meaning that it represents a moment in time (as
> >> seconds/ms/us/ns since the epoch), without any timezone information.
> >
> > Sorry, but I really don see the usefulness for another time stamping
> > format based on POSIX time. Indeed POSIX time is based on a naive
> > approximation of UTC and is ambiguous across leap seconds. Quoting
> > from Wikipedia 
> >
> > ...
>
>
> In a one-on-one discussion with Noam in a pre-community call (that, how
> ironically, we had time for since we both messed up the meeting
> time-zone change) we reached the conclusion that the request is to
> clarify whether NumPy's datetime64 represents TAI time [0] or POSIX
> time, with a preferecne for TAI time. The documentation mentions POSIX
> time[1]. As Stefano points out, there is a couple of seconds difference
> between POSIX (or Unix) time and TAI time. In practice numpy simply
> stores a int64 value to represent the datetime64, and relies on others
> to convert it. The leap-second might be getting lost in the conversions.
> So it might make sense to clarify exactly how those conversions deal
> with the leap-seconds and choose which one we mean when we use
> datetime64. Noam please correct me if I am mistaken.
>
>
> Matti
>
>
> [0] https://en.wikipedia.org/wiki/International_Atomic_Time
>
> [1]
> https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: add the timestamp64 type (Noam Yorav-Raphael)

2020-11-12 Thread Daniele Nicolodi
On 12/11/2020 17:40, Matti Picus wrote:
> In a one-on-one discussion with Noam in a pre-community call (that, how 
> ironically, we had time for since we both messed up the meeting 
> time-zone change) we reached the conclusion that the request is to 
> clarify whether NumPy's datetime64 represents TAI time [0] or POSIX 
> time, with a preferecne for TAI time. The documentation mentions POSIX 
> time[1]. As Stefano points out, there is a couple of seconds difference 
> between POSIX (or Unix) time and TAI time. In practice numpy simply 
> stores a int64 value to represent the datetime64, and relies on others 
> to convert it. The leap-second might be getting lost in the conversions. 
> So it might make sense to clarify exactly how those conversions deal 
> with the leap-seconds and choose which one we mean when we use 
> datetime64. Noam please correct me if I am mistaken.

Unix time is a representation of the UTC timescale that counts 1 seconds
intervals starting from a defined epoch. It deals with leap seconds
either skipping one interval (never happened so far) or repeating an
interval so that two moments in time that on the UTC timescale are
separated by one second (for example 2016-12-31 23:59:59 and 2016-12-31
23:59:60) are represented in the same way and thus the conversion from
Unix time to UTC is ambiguous during this one second. This happened 37
times since 1972.

This comes with the nice properties that minutes, hours and days have
always the same duration (in Unix time), thus converting from the Unix
time representation to an date and hour and vice versa is fairly easy.

The drawback are, as seen above, an ambiguity on leap seconds and the
fact that the trivial computation of time intervals does not take into
account leap seconds and thus may be shorted of a few seconds (any time
interval across 2016-12-31 23:59:59 is off by at least one second if
computed simply subtracting Unix times).

I don't think these two drawbacks are important for Numpy (or any other
general purpose library). As things stand, it is not even possible, in
Python, with or without Numpy, to create a datetime or datetime64 object
from the time "2016-12-31 23:59:60" (neither accept the existence of a
minute with 61 seconds) thus the ambiguity issue is not an issue in
practice. The time interval issue may matter for some applications, but
the ones affected are aware of the issue and have means to deal with it
(the most common one being taking a day off on the days leap seconds are
introduced).

I think documenting that datetime64 is a representation of fixed time
intervals since a conventional epoch, neglecting leap seconds, is easy
to explain and implement and allows for easy interoperability with the
rest of the world.

What advantage would making datetime64 explicitly a representation of
TAI bring?

One disadvantage would be that `np.datetime64(datetime.now())` would be
harder to support as we are trying to match a point in time on the UTC
time scale to a point in time in on the TAI time scale. This is trivial
for past times (just need to adjust for the right offset) but it is
impossible to do correctly for dates in the future because we cannot
predict future leap second insertions. This would, for example, make
timestamp conversions not be reproducible across announcement of leap
second insertions.

Cheers,
Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] API, NEP: Inclusion of the experimental `like=` argument in NumPy 1.20 (we currently lean to yes)

2020-11-12 Thread Sebastian Berg
Hi all,

TL;DR: Should NumPy add a `like=` to array creation functions? This is
an extension of the `__array_function__` protocol useful when working
with array-like objects other than NumPy arrays.
Including it, effectively means we preliminarily accept NEP 35.

Note that without any feedback here, the current default is to include
it in the upcoming NumPy 1.20 release.


Long Version:


Users who only work with NumPy arrays and no alternative array objects
are not affected by this (but will see a "useless" keyword argument).
However, dask and cupy, asked for the addition of a `like=` keyword
argument to array creation functions (list below at [1]) in the
proposed NEP 35:


https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html

This is an extension of the `__array_function__` protocol.  I will
refer to the well written NEP for details.

It is important to note that there are alternative ideas under
consideration [2].  This means there is a chance that the `like=`
argument will be superseded by a different solution.

My very personal angle currently is this:

* Dask/CuPy have shown that this is useful to them
* It does not seem like a big burden to me (aside from the new
API/documentation which users might get confused by)
* We could deprecate it again, even if it may be a slow process.

With a lack of a strong argument against it and no clarity when
alternatives might become available, I am fine with accepting it into
NumPy [3].
However, I can certainly be swayed if anyone has concerns.


There are also currently a few small outstanding discussion listed at:

https://github.com/numpy/numpy/issues/17075

which is probably too technical for the decision whether or not to
include it, but if you are interested in using the feature, lets
discuss these as well!

Cheers,

Sebastian


[1] The functions which receive the keyword argument are:

* np.array, np.asarray, np.ascontiguousarray, etc.
* np.arange
* np.ones, np.zeros, np.empty, np.full
* np.fromfunction
* np.identity
* np.fromfile
* ... and a few I forgot ...

[2] E.g. https://numpy.org/neps/nep-0037-array-module.html

[3] There is the "middle ground": We could require an environment
variable to activate it. But we discussed it briefly at the community
meeting as well, and I think the consensus was there is probably no
good argument for that. (e.g. it would mean the argument doesn't show
up in the online documentation.)


signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] API, NEP: Inclusion of the experimental `like=` argument in NumPy 1.20 (we currently lean to yes)

2020-11-12 Thread Stefan van der Walt
On Thu, Nov 12, 2020, at 17:48, Sebastian Berg wrote:
> [3] There is the "middle ground": We could require an environment
> variable to activate it. But we discussed it briefly at the community
> meeting as well, and I think the consensus was there is probably no
> good argument for that. (e.g. it would mean the argument doesn't show
> up in the online documentation.)

As long as we don't follow this option (3), I think the changes are innocuous.  
If it helps some libraries out there, I see no reason not to include it.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion