Hi Pierre, Thanks for pinging me. To put it in the simplest way possible, that PR adds a new `like` kwarg that will dispatch to downstream libraries using `__array_function__` when specified, otherwise fallback to the default behavior of NumPy. While that introduces an extra check on the C side, that should have minimal impact for use cases that don't use the `like` kwarg.
Is there a simple reproducer with NumPy only? I assume your case with Pandas is much more complex (unfortunately I'm not very experienced with DataFrames), but curiously I see NumPy 1.20.1 being considerably faster for small arrays and mildly-faster with large arrays (results in https://gist.github.com/pentschev/add38b5aee61da87b4b70a1c4649861f) . Best, Peter On Mon, Mar 15, 2021 at 12:29 PM PIERRE AUGIER <pierre.aug...@univ-grenoble-alpes.fr> wrote: > > > ----- Mail original ----- > > De: "Juan Nunez-Iglesias" <j...@fastmail.com> > > À: "numpy-discussion" <numpy-discussion@python.org> > > Envoyé: Dimanche 14 Mars 2021 07:15:39 > > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 > > and 0.20 explaining a perf regression with > > Pythran > > > Hi Pierre, > > > > If you’re able to compile NumPy locally and you have reliable benchmarks, > > you > > can write a script that tests the runtime of your benchmark and reports it > > as a > > test pass/fail. You can then use “git bisect run” to automatically find the > > commit that caused the issue. That will help narrow down the discussion > > before > > it gets completely derailed a second time. > > > > [ https://lwn.net/Articles/317154/ | https://lwn.net/Articles/317154/ ] > > > > Juan. > > Thanks a lot for this advice Juan! I wasn't able to use Git but with `hg > bisect` I managed to find that the first "bad" commit is > > https://github.com/numpy/numpy/commit/4cd6e4b336fbc68d88c0e9bc45a435ce7b721f1f > ENH: implement NEP-35's `like=` argument (gh-16935) > > From the point of view of my benchmark, this commit changes the behavior of > arr.copy() (the resulting arrays do not give to the same performance). This > makes sense because it is indeed about the array creation. > > I haven't yet studied in details this commit (which is quite big and not > simple) and I'm not sure I'm going to be able to understand it and in > particular understand why it leads to such performance regression! > > Cheers, > > Pierre > > > > > > > On 13 Mar 2021, at 10:34 am, PIERRE AUGIER > > <pierre.aug...@univ-grenoble-alpes.fr> wrote: > > > > > > > > > > Hi, > > > > I tried to compile Numpy with `pip install numpy==1.20.1 --no-binary numpy > > --force-reinstall` and I can reproduce the regression. > > > > Good news, I was able to reproduce the difference with only Numpy 1.20.1. > > > > Arrays prepared with (`df` is a Pandas dataframe) > > > > arr = df.values.copy() > > > > or > > > > arr = np.ascontiguousarray(df.values) > > > > lead to "slow" execution while arrays prepared with > > > > arr = np.copy(df.values) > > > > lead to faster execution. > > > > arr.copy() or np.copy(arr) do not give the same result, with arr obtained > > from a > > Pandas dataframe with arr = df.values. It's strange because type(df.values) > > gives <class 'numpy.ndarray'> so I would expect arr.copy() and np.copy(arr) > > to > > give exactly the same result. > > > > Note that I think I'm doing quite serious and reproducible benchmarks. I > > also > > checked that this regression is reproducible on another computer. > > > > Cheers, > > > > Pierre > > > > ----- Mail original ----- > > > > > > De: "Sebastian Berg" <sebast...@sipsolutions.net> > > > > > > À: "numpy-discussion" <numpy-discussion@python.org> > > > > > > Envoyé: Vendredi 12 Mars 2021 22:50:24 > > > > > > Objet: Re: [Numpy-discussion] Looking for a difference between Numpy 0.19.5 > > and > > 0.20 explaining a perf regression with > > > > > > Pythran > > > > > > > > On Fri, 2021-03-12 at 21:36 +0100, PIERRE AUGIER wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I'm looking for a difference between Numpy 0.19.5 and 0.20 which > > > > > > > > > > could explain a performance regression (~15 %) with Pythran. > > > > > > > > > > > > > > > > > > > > I observe this regression with the script > > > > > > > > > > https://github.com/paugier/nbabel/blob/master/py/bench.py > > > > > > > > > > > > > > > > > > > > Pythran reimplements Numpy so it is not about Numpy code for > > > > > > > > > > computation. However, Pythran of course uses the native array > > > > > > > > > > contained in a Numpy array. I'm quite sure that something has changed > > > > > > > > > > between Numpy 0.19.5 and 0.20 (or between the corresponding wheels?) > > > > > > > > > > since I don't get the same performance with Numpy 0.20. I checked > > > > > > > > > > that the values in the arrays are the same and that the flags > > > > > > > > > > characterizing the arrays are also the same. > > > > > > > > > > > > > > > > > > > > Good news, I'm now able to obtain the performance difference just > > > > > > > > > > with Numpy 0.19.5. In this code, I load the data with Pandas and need > > > > > > > > > > to prepare contiguous Numpy arrays to give them to Pythran. With > > > > > > > > > > Numpy 0.19.5, if I use np.copy I get better performance that with > > > > > > > > > > np.ascontiguousarray. With Numpy 0.20, both functions create array > > > > > > > > > > giving the same performance with Pythran (again, less good that with > > > > > > > > > > Numpy 0.19.5). > > > > > > > > > > > > > > > > > > > > Note that this code is very efficient (more that 100 times faster > > > > > > > > > > than using Numpy), so I guess that things like alignment or memory > > > > > > > > > > location can lead to such difference. > > > > > > > > > > > > > > > > > > > > More details in this issue > > > > > > > > > > https://github.com/serge-sans-paille/pythran/issues/1735 > > > > > > > > > > > > > > > > > > > > Any help to understand what has changed would be greatly appreciated! > > > > > > > > > > > > > > > > > > > > > > If you want to really dig into this, it would be good to do profiling > > > > > > to find out at where the differences are. > > > > > > > > > > > > Without that, I don't have much appetite to investigate personally. The > > > > > > reason is that fluctuations of ~30% (or even much more) when running > > > > > > the NumPy benchmarks are very common. > > > > > > > > > > > > I am not aware of an immediate change in NumPy, especially since you > > > > > > are talking pythran, and only the memory space or the interface code > > > > > > should matter. > > > > > > As to the interface code... I would expect it to be quite a bit faster, > > > > > > not slower. > > > > > > There was no change around data allocation, so at best what you are > > > > > > seeing is a different pattern in how the "small array cache" ends up > > > > > > being used. > > > > > > > > > > > > > > > > > > Unfortunately, getting stable benchmarks that reflect code changes > > > > > > exactly is tough... Here is a nice blog post from Victor Stinner where > > > > > > he had to go as far as using "profile guided compilation" to avoid > > > > > > fluctuations: > > > > > > > > > > > > https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html > > > > > > > > > > > > I somewhat hope that this is also the reason for the huge fluctuations > > > > > > we see in the NumPy benchmarks due to absolutely unrelated code > > > > > > changes. > > > > > > But I did not have the energy to try it (and a probably fixed bug in > > > > > > gcc makes it a bit harder right now). > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Sebastian > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Pierre > > > > > > > > > > _______________________________________________ > > > > > > > > > > NumPy-Discussion mailing list > > > > > > > > > > NumPy-Discussion@python.org > > > > > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion@python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion