Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
Hey, Ondřej 2012/1/21 Ondřej Čertík ondrej.cer...@gmail.com: I read the Mandelbrot code using NumPy at this page: http://mentat.za.net/numpy/intro/intro.html I wrote this as a tutorial for beginners, so the emphasis is on simplicity. Do you have any suggestions on how to improve the code without obfuscating the tutorial? Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
On Tue, Jan 24, 2012 at 4:33 PM, Mark Wiebe mwwi...@gmail.com wrote: 2012/1/21 Ondřej Čertík ondrej.cer...@gmail.com snip Let me know if you figure out something. I think the mask thing is quite slow, but the problem is that it needs to be there, to catch overflows (and it is there in Fortran as well, see the where statement, which does the same thing). Maybe there is some other way to write the same thing in NumPy? In the current master, you can replace z[mask] *= z[mask] z[mask] += c[mask] with np.multiply(z, z, out=z, where=mask) np.add(z, c, out=z, where=mask) I am getting: Traceback (most recent call last): File b.py, line 19, in module np.multiply(z, z, out=z, where=mask) TypeError: 'where' is an invalid keyword to ufunc 'multiply' I assume it is a new feature in numpy? The performance of this alternate syntax is still not great, but it is significantly faster than what it replaces. For a particular choice of mask, I get In [40]: timeit z[mask] *= z[mask] 10 loops, best of 3: 29.1 ms per loop In [41]: timeit np.multiply(z, z, out=z, where=mask) 100 loops, best of 3: 4.2 ms per loop That looks like 7x faster to me. If it works for you, can you run the mandelbrot example with and without your patch? That way we'll know the actual speedup. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
2012/1/21 Ondřej Čertík ondrej.cer...@gmail.com snip Let me know if you figure out something. I think the mask thing is quite slow, but the problem is that it needs to be there, to catch overflows (and it is there in Fortran as well, see the where statement, which does the same thing). Maybe there is some other way to write the same thing in NumPy? In the current master, you can replace z[mask] *= z[mask] z[mask] += c[mask] with np.multiply(z, z, out=z, where=mask) np.add(z, c, out=z, where=mask) The performance of this alternate syntax is still not great, but it is significantly faster than what it replaces. For a particular choice of mask, I get In [40]: timeit z[mask] *= z[mask] 10 loops, best of 3: 29.1 ms per loop In [41]: timeit np.multiply(z, z, out=z, where=mask) 100 loops, best of 3: 4.2 ms per loop -Mark Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn: On 01/23/2012 05:35 AM, Jonathan Rocher wrote: Hi all, I was reading this while learning about Pytables in more details and the origin of its efficiency. This sounds like a problem where out of core computation using pytables would shine since the dataset doesn't fit into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course C/Cythonizing the problem would be another good way... Well, since the data certainly fits in RAM, one would use numexpr directly (which is what pytables also uses). Personally I feel this debate is asking the wrong question. It is not uncommon for NumPy code to be 16x slower than C or Fortran. But that is not really interesting. This is what I think matters: - Is the NumPy code FAST ENOUGH? If not, then go ahead and optimize. If it's fast enough, then just leave it. In this case, it seems Python takes ~13 seconds compared to ~1 second for Fortran. Sure, those extra 12 seconds could be annoying. But how much coding time should we spend to avoid them? 15 minutes? An hour? Two hours? Taking the time spent optimizing into account, then perhaps Python is 'faster' anyway? It is common to ask what is fastest for the computer. But we should really be asking what is fastest for our selves. For example: I have a computation that will take a day in Fortran or a month in Python (estimated). And I am going to run this code several times (20 or so, I think). In this case, yes, coding the bottlenecks in Fortran matters to me. But 13 seconds versus 1 second? I find that hardly interesting. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
On Mon, Jan 23, 2012 at 12:23 PM, Sturla Molden stu...@molden.no wrote: Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn: On 01/23/2012 05:35 AM, Jonathan Rocher wrote: Hi all, I was reading this while learning about Pytables in more details and the origin of its efficiency. This sounds like a problem where out of core computation using pytables would shine since the dataset doesn't fit into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course C/Cythonizing the problem would be another good way... Well, since the data certainly fits in RAM, one would use numexpr directly (which is what pytables also uses). Personally I feel this debate is asking the wrong question. It is not uncommon for NumPy code to be 16x slower than C or Fortran. But that is not really interesting. This is what I think matters: - Is the NumPy code FAST ENOUGH? If not, then go ahead and optimize. If it's fast enough, then just leave it. In this case, it seems Python takes ~13 seconds compared to ~1 second for Fortran. Sure, those extra 12 seconds could be annoying. But how much coding time should we spend to avoid them? 15 minutes? An hour? Two hours? Taking the time spent optimizing into account, then perhaps Python is 'faster' anyway? It is common to ask what is fastest for the computer. But we should really be asking what is fastest for our selves. For example: I have a computation that will take a day in Fortran or a month in Python (estimated). And I am going to run this code several times (20 or so, I think). In this case, yes, coding the bottlenecks in Fortran matters to me. But 13 seconds versus 1 second? I find that hardly interesting. Sturla I would think that interactive zooming would be quite nice (illuminating) and for that 13 secs would not be tolerable Well... it's not at the top of my priority list ... ;-) -Sebastian Haase ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
On 01/23/2012 12:23 PM, Sturla Molden wrote: Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn: On 01/23/2012 05:35 AM, Jonathan Rocher wrote: Hi all, I was reading this while learning about Pytables in more details and the origin of its efficiency. This sounds like a problem where out of core computation using pytables would shine since the dataset doesn't fit into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course C/Cythonizing the problem would be another good way... Well, since the data certainly fits in RAM, one would use numexpr directly (which is what pytables also uses). Personally I feel this debate is asking the wrong question. It is not uncommon for NumPy code to be 16x slower than C or Fortran. But that is not really interesting. This is what I think matters: - Is the NumPy code FAST ENOUGH? If not, then go ahead and optimize. If it's fast enough, then just leave it. In this case, it seems Python takes ~13 seconds compared to ~1 second for Fortran. Sure, those extra 12 seconds could be annoying. But how much coding time should we spend to avoid them? 15 minutes? An hour? Two hours? Taking the time spent optimizing into account, then perhaps Python is 'faster' anyway? It is common to ask what is fastest for the computer. But we should really be asking what is fastest for our selves. For example: I have a computation that will take a day in Fortran or a month in Python (estimated). And I am going to run this code several times (20 or so, I think). In this case, yes, coding the bottlenecks in Fortran matters to me. But 13 seconds versus 1 second? I find that hardly interesting. You, me, Ondrej, and many more are happy to learn 4 languages and use them where they are most appropriate. But most scientists only want to learn and use one tool. And most scientists have both problems where performance doesn't matter, and problems where it does. So as long as examples like this exists, many people will prefer Fortran for *all* their tasks. (Of course, that's why I got involved in Cython...) Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
Den 23.01.2012 13:09, skrev Sebastian Haase: I would think that interactive zooming would be quite nice (illuminating) and for that 13 secs would not be tolerable Well... it's not at the top of my priority list ... ;-) Sure, that comes under the 'fast enough' issue. But even Fortran might be too slow here? For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader (which would be a text string in Python): madelbrot_fragment_shader = uniform sampler1D tex; uniform vec2 center; uniform float scale; uniform int iter; void main() { vec2 z, c; c.x = 1. * (gl_TexCoord[0].x - 0.5) * scale - center.x; c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y; int i; z = c; for(i=0; iiter; i++) { float x = (z.x * z.x - z.y * z.y) + c.x; float y = (z.y * z.x + z.x * z.y) + c.y; if((x * x + y * y) 4.0) break; z.x = x; z.y = y; } gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0); } The rest is just boiler-plate OpenGL... Sources: http://nuclear.mutantstargoat.com/articles/sdr_fract/ http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
On 01/23/12 13:51, Sturla Molden wrote: Den 23.01.2012 13:09, skrev Sebastian Haase: I would think that interactive zooming would be quite nice (illuminating) and for that 13 secs would not be tolerable Well... it's not at the top of my priority list ... ;-) Sure, that comes under the 'fast enough' issue. But even Fortran might be too slow here? For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader (which would be a text string in Python): madelbrot_fragment_shader = uniform sampler1D tex; uniform vec2 center; uniform float scale; uniform int iter; void main() { vec2 z, c; c.x = 1. * (gl_TexCoord[0].x - 0.5) * scale - center.x; c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y; int i; z = c; for(i=0; iiter; i++) { float x = (z.x * z.x - z.y * z.y) + c.x; float y = (z.y * z.x + z.x * z.y) + c.y; if((x * x + y * y) 4.0) break; z.x = x; z.y = y; } gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0); } The rest is just boiler-plate OpenGL... Sources: http://nuclear.mutantstargoat.com/articles/sdr_fract/ http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml Off-topic comment: Or use some algorithmic cleverness, see [1]. I recall Xaos had interactive, extremely fast a fluid fractal zooming more than 10 (or 15?) years ago (- on a laughable hardware by today's standards). r. [1] http://wmi.math.u-szeged.hu/xaos/doku.php ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
I'd like to add http://git.tiker.net/pyopencl.git/blob/HEAD:/examples/demo_mandelbrot.py to the discussion, since I use pyopencl (http://mathema.tician.de/software/pyopencl) with great success in my daily scientific computing. Install with pip. PyOpenCL does understand numpy arrays. You write a kernel (small c-program) directly into a python triple quoted strings and get a pythonic way to program GPU and core i5 and i7 CPUs with python Exception if something goes wrong. Whenever I hit a speed bottleneck that I cannot solve with pure numpy, I code a little part of the computation for GPU. The compilation is done just in time when you run the python code. Especially for the mandelbrot this may be a _huge_ gain in speed since its embarrassingly parallel. Samuel On 23.01.2012, at 14:02, Robert Cimrman wrote: On 01/23/12 13:51, Sturla Molden wrote: Den 23.01.2012 13:09, skrev Sebastian Haase: I would think that interactive zooming would be quite nice (illuminating) and for that 13 secs would not be tolerable Well... it's not at the top of my priority list ... ;-) Sure, that comes under the 'fast enough' issue. But even Fortran might be too slow here? For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader (which would be a text string in Python): madelbrot_fragment_shader = uniform sampler1D tex; uniform vec2 center; uniform float scale; uniform int iter; void main() { vec2 z, c; c.x = 1. * (gl_TexCoord[0].x - 0.5) * scale - center.x; c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y; int i; z = c; for(i=0; iiter; i++) { float x = (z.x * z.x - z.y * z.y) + c.x; float y = (z.y * z.x + z.x * z.y) + c.y; if((x * x + y * y) 4.0) break; z.x = x; z.y = y; } gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0); } The rest is just boiler-plate OpenGL... Sources: http://nuclear.mutantstargoat.com/articles/sdr_fract/ http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml Off-topic comment: Or use some algorithmic cleverness, see [1]. I recall Xaos had interactive, extremely fast a fluid fractal zooming more than 10 (or 15?) years ago (- on a laughable hardware by today's standards). r. [1] http://wmi.math.u-szeged.hu/xaos/doku.php ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
How does the algorithm and timing compare to this one: http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47fr=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f The author of original version is Dan Goodman # FAST FRACTALS WITH PYTHON AND NUMPY -Sebastian Haase 2012/1/22 Ondřej Čertík ondrej.cer...@gmail.com Hi, I read the Mandelbrot code using NumPy at this page: http://mentat.za.net/numpy/intro/intro.html but when I run it, it gives me integer overflows. As such, I have fixed the code, so that it doesn't overflow here: https://gist.github.com/1655320 and I have also written an equivalent Fortran program. You can compare both source codes to see that that it is pretty much one-to-one translation. The main idea in the above gist is to take an algorithm written in NumPy, and translate it directly to Fortran, without any special optimizations. So the above is my first try in Fortran. You can plot the result using this simple script (you can also just click on this gist to see the image there): https://gist.github.com/1655377 Here are my timings: Python Fortran Speedup Calculation 12.749 00.784 16.3x Saving 01.904 01.456 1.3x Total 14.653 02.240 6.5x I save the matrices to disk in an ascii format, so it's quite slow in both cases. The pure computation is however 16x faster in Fortran (in gfortran, I didn't even try Intel Fortran, that will probably be even faster). As such, I wonder how the NumPy version could be sped up? I have compiled NumPy with Lapack+Blas from source. Would anyone be willing to run the NumPy version? Just copy+paste should do it. If you want to run the Fortran version, the above gist uses some of my other modules that I use in my other programs, my goal was to see how much more complicated the Fortran code gets, compared to NumPy. As such, I put here https://gist.github.com/1655350 a single file with all the dependencies, just compile it like this: gfortran -fPIC -O3 -march=native -ffast-math -funroll-loops mandelbrot.f90 and run: $ ./a.out Iteration 1 Iteration 2 ... Iteration 100 Saving... Times: Calculation: 0.748045999 Saving: 1.36408502 Total: 2.11213102 Let me know if you figure out something. I think the mask thing is quite slow, but the problem is that it needs to be there, to catch overflows (and it is there in Fortran as well, see the where statement, which does the same thing). Maybe there is some other way to write the same thing in NumPy? Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
On 01/22/2012 04:55 AM, Ondřej Čertík wrote: Hi, I read the Mandelbrot code using NumPy at this page: http://mentat.za.net/numpy/intro/intro.html but when I run it, it gives me integer overflows. As such, I have fixed the code, so that it doesn't overflow here: https://gist.github.com/1655320 and I have also written an equivalent Fortran program. You can compare both source codes to see that that it is pretty much one-to-one translation. The main idea in the above gist is to take an algorithm written in NumPy, and translate it directly to Fortran, without any special optimizations. So the above is my first try in Fortran. You can plot the result using this simple script (you can also just click on this gist to see the image there): https://gist.github.com/1655377 Here are my timings: Python Fortran Speedup Calculation 12.749 00.784 16.3x Saving 01.904 01.456 1.3x Total 14.653 02.240 6.5x I save the matrices to disk in an ascii format, so it's quite slow in both cases. The pure computation is however 16x faster in Fortran (in gfortran, I didn't even try Intel Fortran, that will probably be even faster). As such, I wonder how the NumPy version could be sped up? I have compiled NumPy with Lapack+Blas from source. This is a pretty well known weakness with NumPy. In the Python code at least, each of c and z are about 15 MB, and the mask about 1 MB. So that doesn't fit in CPU cache, and so each and every statement you do in the loop transfer that data in and out of CPU cache the memory bus. There's no quick fix -- you can try to reduce the working set so that it fits in CPU cache, but then the Python overhead often comes into play. Solutions include numexpr and Theano -- and as often as not, Cython and Fortran. It's a good example, thanks!, Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase seb.ha...@gmail.com wrote: How does the algorithm and timing compare to this one: http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47fr=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f The author of original version is Dan Goodman # FAST FRACTALS WITH PYTHON AND NUMPY Thanks Sebastian. This one is much faster 2.7s on my laptop with the same dimensions/iterations. It uses a better datastructures -- it only keeps track of points that still need to be iterated --- very clever. If I have time, I'll try to provide an equivalent Fortran version too, for comparison. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
2012/1/22 Ondřej Čertík ondrej.cer...@gmail.com: If I have time, I'll try to provide an equivalent Fortran version too, for comparison. Ondrej here is a Cython example: http://wiki.cython.org/examples/mandelbrot I haven't looked to see if it's the same algorithm, but it may be instructive, none the less. -Chris -- -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
Hi all, I was reading this while learning about Pytables in more details and the origin of its efficiency. This sounds like a problem where out of core computation using pytables would shine since the dataset doesn't fit into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course C/Cythonizing the problem would be another good way... HTH, Jonathan 2012/1/22 Ondřej Čertík ondrej.cer...@gmail.com On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase seb.ha...@gmail.com wrote: How does the algorithm and timing compare to this one: http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47fr=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f The author of original version is Dan Goodman # FAST FRACTALS WITH PYTHON AND NUMPY Thanks Sebastian. This one is much faster 2.7s on my laptop with the same dimensions/iterations. It uses a better datastructures -- it only keeps track of points that still need to be iterated --- very clever. If I have time, I'll try to provide an equivalent Fortran version too, for comparison. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jroc...@enthought.com 1-512-536-1057 http://www.enthought.com ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
Hi, I read the Mandelbrot code using NumPy at this page: http://mentat.za.net/numpy/intro/intro.html but when I run it, it gives me integer overflows. As such, I have fixed the code, so that it doesn't overflow here: https://gist.github.com/1655320 and I have also written an equivalent Fortran program. You can compare both source codes to see that that it is pretty much one-to-one translation. The main idea in the above gist is to take an algorithm written in NumPy, and translate it directly to Fortran, without any special optimizations. So the above is my first try in Fortran. You can plot the result using this simple script (you can also just click on this gist to see the image there): https://gist.github.com/1655377 Here are my timings: Python Fortran Speedup Calculation 12.749 00.784 16.3x Saving 01.904 01.456 1.3x Total 14.653 02.240 6.5x I save the matrices to disk in an ascii format, so it's quite slow in both cases. The pure computation is however 16x faster in Fortran (in gfortran, I didn't even try Intel Fortran, that will probably be even faster). As such, I wonder how the NumPy version could be sped up? I have compiled NumPy with Lapack+Blas from source. Would anyone be willing to run the NumPy version? Just copy+paste should do it. If you want to run the Fortran version, the above gist uses some of my other modules that I use in my other programs, my goal was to see how much more complicated the Fortran code gets, compared to NumPy. As such, I put here https://gist.github.com/1655350 a single file with all the dependencies, just compile it like this: gfortran -fPIC -O3 -march=native -ffast-math -funroll-loops mandelbrot.f90 and run: $ ./a.out Iteration 1 Iteration 2 ... Iteration 100 Saving... Times: Calculation: 0.748045999 Saving: 1.36408502 Total: 2.11213102 Let me know if you figure out something. I think the mask thing is quite slow, but the problem is that it needs to be there, to catch overflows (and it is there in Fortran as well, see the where statement, which does the same thing). Maybe there is some other way to write the same thing in NumPy? Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion