This always seems like such a ridiculous argument. If CO2 emissions are directly proportional to the time it takes for a program to run, then there's no real need to concern ourselves with it. People already have a direct reason to avoid programs that take a long time to run, namely, that they take a long time to run. If I have two codes that compute the same thing and one takes a week and the other takes a few minutes, then obviously I will choose the one that takes a few minutes, and my decision will have nothing to do with ecological impact. The real issue with CO2 emissions are instances where the agency is completely removed and the people damaging the environment don't suffer any ill effects from it.
It would be more intellectually honest to try to determine why it is that people choose Python, an apparently very slow language, to do high performance computing. If one spends even a moment thinking about this, and actually looking at what the real scientific Python community does, one would realize that simply having a fast core in Python is enough for the majority of performance. NumPy array expressions are fast because the core loops are fast, and those dominate the runtime for the majority of uses. And for instances where it isn't fast enough, e.g., when writing a looping algorithm directly, there are multiple tools that allow writing fast Python or Python-like code, such as Numba, Cython, Pythran, PyPy, and so on. Aaron Meurer On Tue, Nov 24, 2020 at 8:57 AM PIERRE AUGIER <pierre.aug...@univ-grenoble-alpes.fr> wrote: > > Hi, > > I recently took a bit of time to study the comment "The ecological impact of > high-performance computing in astrophysics" published in Nature Astronomy > (Zwart, 2020, https://www.nature.com/articles/s41550-020-1208-y, > https://arxiv.org/pdf/2009.11295.pdf), where it is stated that "Best however, > for the environment is to abandon Python for a more environmentally friendly > (compiled) programming language.". > > I wrote a simple Python-Numpy implementation of the problem used for this > study (https://www.nbabel.org) and, accelerated by Transonic-Pythran, it's > very efficient. Here are some numbers (elapsed times in s, smaller is better): > > | # particles | Py | C++ | Fortran | Julia | > |-------------|-----|-----|---------|-------| > | 1024 | 29 | 55 | 41 | 45 | > | 2048 | 123 | 231 | 166 | 173 | > > The code and a modified figure are here: https://github.com/paugier/nbabel > (There is no check on the results for https://www.nbabel.org, so one still > has to be very careful.) > > I think that the Numpy community should spend a bit of energy to show what > can be done with the existing tools to get very high performance (and low CO2 > production) with Python. This work could be the basis of a serious reply to > the comment by Zwart (2020). > > Unfortunately the Python solution in https://www.nbabel.org is very bad in > terms of performance (and therefore CO2 production). It is also true for most > of the Python solutions for the Computer Language Benchmarks Game in > https://benchmarksgame-team.pages.debian.net/benchmarksgame/ (codes here > https://salsa.debian.org/benchmarksgame-team/benchmarksgame#what-else). > > We could try to fix this so that people see that in many cases, it is not > necessary to "abandon Python for a more environmentally friendly (compiled) > programming language". One of the longest and hardest task would be to > implement the different cases of the Computer Language Benchmarks Game in > standard and modern Python-Numpy. Then, optimizing and accelerating such code > should be doable and we should be able to get very good performance at least > for some cases. Good news for this project, (i) the first point can be done > by anyone with good knowledge in Python-Numpy (many potential workers), (ii) > for some cases, there are already good Python implementations and (iii) the > work can easily be parallelized. > > It is not a criticism, but the (beautiful and very nice) new Numpy website > https://numpy.org/ is not very convincing in terms of performance. It's > written "Performant The core of NumPy is well-optimized C code. Enjoy the > flexibility of Python with the speed of compiled code." It's true that the > core of Numpy is well-optimized C code but to seriously compete with C++, > Fortran or Julia in terms of numerical performance, one needs to use other > tools to move the compiled-interpreted boundary outside the hot loops. So it > could be reasonable to mention such tools (in particular Numba, Pythran, > Cython and Transonic). > > Is there already something planned to answer to Zwart (2020)? > > Any opinions or suggestions on this potential project? > > Pierre > > PS: Of course, alternative Python interpreters (PyPy, GraalPython, Pyjion, > Pyston, etc.) could also be used, especially if HPy > (https://github.com/hpyproject/hpy) is successful (C core of Numpy written in > HPy, Cython able to produce HPy code, etc.). However, I tend to be a bit > skeptical in the ability of such technologies to reach very high performance > for low-level Numpy code (performance that can be reached by replacing whole > Python functions with optimized compiled code). Of course, I hope I'm wrong! > IMHO, it does not remove the need for a successful HPy! > > -- > Pierre Augier - CR CNRS http://www.legi.grenoble-inp.fr > LEGI (UMR 5519) Laboratoire des Ecoulements Geophysiques et Industriels > BP53, 38041 Grenoble Cedex, France tel:+33.4.56.52.86.16 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion