Re: [Numpy-discussion] Add sliding_window_view method to numpy
On Fri, Nov 6, 2020 at 4:03 PM Zimmermann Klaus wrote: > Hi, > > On 06/11/2020 15:58, Ralf Gommers wrote: > > On Fri, Nov 6, 2020 at 9:51 AM Zimmermann Klaus > > mailto:klaus.zimmerm...@smhi.se>> wrote: > > I have absolutely no problem keeping this out of the main namespace. > > > > In fact I'd like to point out that it was not my idea. Rather, it was > > proposed by Bas van Beek in the comments [1,2] and received a little > > more scrutiny from Eric Wieser in [3]. > > > > Thanks, between two PRs with that many comments, I couldn't figure that > > out - just saw the commit that make the change. > > Understandable, no worries. > > > On the subject matter, I am also curious about the potential for > > confusion. What other behavior could one expect from a sliding window > > view with this shape? > > > > As I said, I am completely fine with keeping this out of the main > > namespace, but I agree with Sebastian's comment, that > > `np.lib.stride_tricks` is perhaps not the best namespace. > > > > > > I agree that that's not a great namespace. There's multiple issues with > > namespaces, we basically have three good ones (fft, linalg, random) and > > a bunch of other ones that range from questionable to terrible. See > > > https://github.com/numpy/numpy/blob/master/numpy/tests/test_public_api.py#L127 > > < > https://github.com/numpy/numpy/blob/master/numpy/tests/test_public_api.py#L127 > > > > for details. > > > > This would be a good thing to work on - making the `numpy.lib` namespace > > not bleed into `numpy` via `import *` is one thing to do there, and > > there's many others. But given backwards compat constraints it's not > easy. > > I understand cleaning up all the namespaces is a giant task, so far, far > out of scope here. As said before, I also completely agree to keep it > out of the main namespace (though I will still argue below :P). > > I was just wondering if, of the top your head, an existing, better fit > comes to mind? > Not really. Outside of stride_tricks there's nothing that quite fits. This function is more in scope for something like scipy.signal. Cheers, Ralf > > The reason > > from my point of view is that stride tricks is really a technical > (and > > slightly ominous) name that might throw of more application oriented > > programmers from finding and using this function. Thinking of my > > scientist colleagues, I think those are exactly the kind of users > that > > could benefit from such a prototyping tool. > > > > > > That phrasing is one of a number of concerns. NumPy is normally not in > > the business of providing things that are okay as a prototyping tool, > > but are potentially extremely slow (as pointed out in the Notes section > > of the docstring). A function like that would basically not be the right > > tool for almost anything in, e.g., SciPy - it requires an iterative > > algorithm. In NumPy we don't prefer performance at all costs, but in > > general it's pretty decent rather than "Numba or Cython may gain you > > 100x here". > > I still think that the performance concern is a bit overblown. Yes, > application with large windows can need more FLOPs by an equally large > factor. But most such applications will use small to moderate windows. > Furthermore, this view focuses only on FLOPs. In my current field of > climate science (and many others), that is almost never the limiting > factor. Memory demands are far more problematic and incidentally, those > are more likely to increase in other methods that require the storage of > ancillary, temporary data. > > > Other issues include: > > 2) It is very specific to NumPy's memory model (as pointed out by you > > and Sebastian) - just like the rest of stride_tricks > Not wrong, but on the other hand, that memory model is not exotic. C, > Fortran, and any number of other languages play very nicely with this, > just as important downstream libraries like dask. > > > 3) It has "view" in the name, which doesn't quite make sense for the > > main namespace (also connected to point 2 above). > Ok. > > > 4) The cost of putting something in the main namespace for other > > array/tensor libraries is large. Maybe other libraries, e.g. CuPy, Dask, > > TensorFlow, PyTorch, JAX, MXNet, aim to reimplement part or all of the > > main NumPy namespace as well as possible. This would trigger discussions > > and likely many person-weeks of work for others. > Agreed. Though I have to say that my whole motivation comes from > corresponding issues in dask that where specifically waiting for (the > older version of) this PR (see [1, 2,...]). But I understand that dask > is effectively much closer to the numpy memory model than, say, CuPy, so > don't take this to mean it should be in the main namespace. > > > 5) It's a useful function, but it's very much on the margins of NumPy's > > scope. It could easily have gone into, for example, scipy.signal. At > > this point the bar for fu
[Numpy-discussion] Showing by examples how Python-Numpy can be efficient even for computationally intensive tasks
I changed the email subject because I'd like to focus less on CO2 (a very interesting subject, but not my focus here) and more on computing... - Mail original - > De: "Andy Ray Terrel" > À: "numpy-discussion" > Envoyé: Mardi 24 Novembre 2020 18:27:52 > Objet: Re: [Numpy-discussion] Comment published in Nature Astronomy about The > ecological impact of computing with Python > I think we, the community, does have to take it seriously. NumPy and the rest > of > the ecosystem is trying to raise money to hire developers. This sentiment, > which is much wider than a single paper, is a prevalent roadblock. > > -- Andy I agree. I don't know if it is a matter of scientific field, but I tend to hear more and more people explaining that they don't use Python because of performance. Or telling that they don't have performance problems because they don't use Python. Some communities (I won't give names 🙂) communicate a lot on the bad performances of Python-Numpy. I am well aware that performance is in many cases not so important but it is not a good thing to have such bad reputation. I think we have to show what is doable with Python-Numpy code to get very good performance. - Mail original - > De: "Sebastian Berg" > Envoyé: Mardi 24 Novembre 2020 18:25:02 > Objet: Re: [Numpy-discussion] Comment published in Nature Astronomy about The > ecological impact of computing with Python >> Is there already something planned to answer to Zwart (2020)? > > I don't think there is any need for rebuttal. The author is right > right, you should not write the core of an N-Body simulation in Python > :). I completely disagree with the focus on programming > languages/tooling, quite honestly. I'm not a fan of this focus neither. But we have to realize that many people think like that and are sensible to such arguments. Being so bad in all benchmark games does not help the scientific Python community (especially in the long terms). > A PhD who writes performance critical code, must get the education > necessary to do it well. That may mean learning something beyond > Python, but not replacing Python entirely. I'm really not sure. Or at least that depends on the type of performance critical code. I see many students or scientists who sometimes need to write few functions that are not super inefficient. For many people, I don't see why they would need to learn and use another language. I did my PhD (in turbulence) with Fortran (and Matlab) and I have really nothing against Fortran. However, I'm really happy that we code in my group nearly everything in Python (+ a bit of C++ for the fun). For example, Fluidsim (https://foss.heptapod.net/fluiddyn/fluidsim) is ~100% Python and I know that it is very efficient (more efficient than many alternatives written with a lot of C++/Fortran). I realize that it wouldn't be possible for all kinds of code (and fluidsim uses fluidfft, written in C++ / Cython / Python), but being 100% Python has a lot of advantages (I won't list them here). For a N-Body simulation, why not using Python? Using Python, you get a very readable, clear and efficient implementation (see https://github.com/paugier/nbabel), even faster than what you can get with easy C++/Fortran/Julia. IMHO, it is just what one needs for most PhD in astronomy. Of course, for many things, one needs native languages! Have a look at Pythran C++ code, it's beautiful 🙂 ! But I don't think every scientist that writes critical code has to become an expert in C++ or Fortran (or Julia). I also sometimes have to read and use C++ and Fortran codes written by scientists. Sometimes (often), I tend to think that they would be more productive with other tools to reach the same performance. I think it is only a matter of education and not of tooling, but using serious tools does not make you a serious developer, and reaching the level in C++/Fortran to write efficient, clean, readable and maintainable codes in not so easy for a PhD or scientist that has other things to do. Python-Numpy is so slow for some algorithms that many Python-Numpy users would benefit to know how to accelerate it. Just an example, with some elapsed times (in s) for the N-Body problem (see https://github.com/paugier/nbabel#smaller-benchmarks-between-different-python-solutions): | Transonic-Pythran | Transonic-Numba | High-level Numpy | PyPy OOP | PyPy lists | |---|-|--|--|| | 0.48 | 3.91| 686 | 87 | 15 | For comparison, we have for this case `{"c++": 0.85, "Fortran": 0.62, "Julia": 2.57}`. Note that just adding `from transonic import jit` to the simple high-level Numpy code and then decorating the function `compute_accelerations` with `@jit`, the elapsed time decreases to 8 s (a x85 speedup!, with Pythran 0.9.8). I conclude from these types of results that we need to tell Python users how
Re: [Numpy-discussion] Showing by examples how Python-Numpy can be efficient even for computationally intensive tasks
On Thu, Nov 26, 2020 at 9:15 PM PIERRE AUGIER < pierre.aug...@univ-grenoble-alpes.fr> wrote: > > I conclude from these types of results that we need to tell Python users > how to accelerate their Python-Numpy codes when they feel the need of it. I > think acceleration tools should be mentioned in Numpy website. I also think > we should spend a bit of energy to play some benchmark games. > Good point, added an issue for it on the website repo: https://github.com/numpy/numpy.org/issues/370 Cheers, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Added Rivest-Floyd selection algorithm as an option to numpy.partition
On Tue, Nov 24, 2020 at 12:56 PM Виктория Малясова < malyasova.viktor...@yandex.ru> wrote: > Hello everyone! > > I've implemented the Rivest-Floyd selection algorithm as a second option > to the partition method. I found it works about 1.5 times faster on average > for big array sizes; here are average times for finding a median: > > array length 10 > introselect 4.6e-05 > rivest_floyd 4.4e-05 > array length 100 > introselect 5.5e-05 > rivest_floyd 4.7e-05 > array length 1000 > introselect 6.9e-05 > rivest_floyd 6.5e-05 > array length 1 > introselect 3.1e-04 > rivest_floyd 2.3e-04 > array length 10 > introselect 2.9e-03 > rivest_floyd 2.0e-03 > array length 100 > introselect 2.9e-02 > rivest_floyd 2.0e-02 > > I've created a pull request https://github.com/numpy/numpy/pull/17813 and > implemented reviewer's suggestions and fixes. Do you think this feature > should be added? I am new to open source, sorry if I am doing anything wrong. > > Hi Viktoriya, welcome! It looks like you're doing everything right, and the reviews so far are positive. Cheers, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Showing by examples how Python-Numpy can be efficient even for computationally intensive tasks
On Thu, 26 Nov 2020 22:14:40 +0100 (CET) PIERRE AUGIER wrote: > I changed the email subject because I'd like to focus less on CO2 (a very > interesting subject, but not my focus here) and more on computing... > Hi Pierre, We may turn the problem in another way around: one should more focus on the algorithm than on the programming language. I would like to share with you one example, where we published how to speed-up crystallographic computation written in Python. https://onlinelibrary.wiley.com/iucr/doi/10.1107/S1600576719008471 One referee asked us to validate vs C and Fortran equivalent code. C code was as fast as Pythran or Cython and Fortran was still faster (the std of the Fortran-compiled runtime was much smaller which allows Fortran to be faster by 3 std !) But I consider the difference to be marginal at this level ! If one considers the "Moore law", i.e. the time needed for "performance" to double in different aspects of computing, one gets 18 to 24 month for the number of transistor in a processor, 18 years for the compilers and 2 years (in average) for the development of new algorithms. In this sense one should more focus on the algorithm used. The Table 1 of the article is especially interesting: Pure Python is 10x slower than proper Numpy code, and parallel Pythran is 50x faster than Numpy (on the given computer) but using the proper algorithm, i.e. FFT in this case, is 13000x faster ! So I believe that Python, with its expressivity, helps much in understanding the algorithm and hence to design faster code. Cheers, Jerome ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion