Re: [pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

2011-11-16 Thread Jérémie Roquet
2011/11/16 Ian Ozsvald :
> From memory the 'native' flag made a difference (I think it allows use
> of SSE?).

It depends on the machine, of course, but yes, on most machines it
enables SSE. Just compare the output of

$ < /dev/null g++ -E -v - |& grep cc1

and

$ < /dev/null g++ -march=native -E -v - |& grep cc1

The following flags are added for me :
-march=core2 -mcx16 -msahf --param l1-cache-size=32 --param
l1-cache-line-size=64 --param l2-cache-size=4096 -mtune=core2

-mcx16 and -msahf enable some additional instructions (see
http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html)
-mtune=core2 enables Intel's 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3

Best regards,

-- 
Jérémie
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

2011-11-16 Thread Ian Ozsvald
>From memory the 'native' flag made a difference (I think it allows use
of SSE?). I guess that is something I'll normalise for a future v0.3
release of my handbook :-)
Cheers,
Ian.

2011/11/15 Jérémie Roquet :
> Hi,
>
> 2011/11/15 Armin Rigo :
>> On Tue, Nov 15, 2011 at 15:54, Ian Ozsvald  wrote:
>>> ShedSkin (from memory)
>>> requests fast math and a few other things in the generated Makefile.
>>
>> Ah, it is cheating that way.  Indeed, I didn't try to play with gcc
>> options; I just used -O2 (or -O3, which made no difference).
>
> FYI, here is the default FLAGS file for shedskin:
>
> CC=g++
> CCFLAGS=-O2 -march=native -Wno-deprecated $(CPPFLAGS)
> LFLAGS=-lgc -lpcre $(LDFLAGS)
>
> But of course you can change the compiler and play with its flags to
> improve performance.
>
> Best regards,
>
> --
> Jérémie
>



-- 
Ian Ozsvald (A.I. researcher)
i...@ianozsvald.com

http://IanOzsvald.com
http://MorConsulting.com/
http://StrongSteam.com/
http://SocialTiesApp.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

2011-11-15 Thread Jérémie Roquet
Hi,

2011/11/15 Armin Rigo :
> On Tue, Nov 15, 2011 at 15:54, Ian Ozsvald  wrote:
>> ShedSkin (from memory)
>> requests fast math and a few other things in the generated Makefile.
>
> Ah, it is cheating that way.  Indeed, I didn't try to play with gcc
> options; I just used -O2 (or -O3, which made no difference).

FYI, here is the default FLAGS file for shedskin:

CC=g++
CCFLAGS=-O2 -march=native -Wno-deprecated $(CPPFLAGS)
LFLAGS=-lgc -lpcre $(LDFLAGS)

But of course you can change the compiler and play with its flags to
improve performance.

Best regards,

-- 
Jérémie
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

2011-11-15 Thread Armin Rigo
Hi,

On Tue, Nov 15, 2011 at 15:54, Ian Ozsvald  wrote:
> ShedSkin (from memory)
> requests fast math and a few other things in the generated Makefile.

Ah, it is cheating that way.  Indeed, I didn't try to play with gcc
options; I just used -O2 (or -O3, which made no difference).

The C source code is completely obvious and surprise-less.  You can
see it here (it outputs the raw data to stdout, so you have to pipe it
to a converter program to display the result):
http://paste.pocoo.org/show/508215/


A bientôt,

Armin.
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

2011-11-15 Thread Ian Ozsvald
Hi Antonio! Apologies for the slow reply, this got filed into a subfolder.

The numbers are interesting, I'm also interested in the C version. I'm
hoping that my tutorial will be accepted for PyCon next March (the
talks are announced in two weeks), assuming I get to talk again I'll
update my tutorial. Adding more for PyPy and having a C equivalent
will be very useful.

Given that the C version should be very similar to the ShedSkin
version, maybe it just comes down to compiler differences? On my
Macbook (where I originally wrote the talk) I think the differences in
speed came from two versions of gcc (Cython seemed to prefer one,
ShedSkin the other, I ran out of time trying to unify that test). Do
you definitely use the same optimisation flags? ShedSkin (from memory)
requests fast math and a few other things in the generated Makefile.

Ian.

On 7 November 2011 18:04, Antonio Cuni  wrote:
> Hello Ian,
>
> On 25/07/11 11:00, Ian Ozsvald wrote:
>>
>> Dear all, I've published v0.2 of my High Performance Python tutorial
>> write-up from the session I ran at EuroPython:
>>
>> http://ianozsvald.com/2011/07/25/high-performance-python-tutorial-v0-2-from-europython-2011/
>
> today I and Armin investigated a bit more about the performances of the
> mandelbrot algorithm that you wrote for your tutorial.  What we found is
> very interesting :-).
>
> We compared three versions of the code:
>
> - a (slightly modified) pure python one on PyPy
> - the Cython one using calculate_z.pyx_2_bettermath
> - the shedskin one, using shedskin2.py
>
> The PyPy version looks like this:
>
> def calculate_z_serial_purepython(q, maxiter, z):
>    """Pure python with complex datatype, iterating over list of q and z"""
>    output = [0] * len(q)
>    for i in range(len(q)):
>        zi = z[i]
>        qi = q[i]
>        for iteration in range(maxiter):
>            zi = zi * zi + qi
>            if (zi.real*zi.real + zi.imag*zi.imag) > 4.0:
>                output[i] = iteration
>                break
>    return output
>
> i.e., it is exactly the same as pure_python_2.py, but we avoid to use
> abs(zi), so it is comparable with the cython and shedskin version.
>
> First, we ran the programs to calculate passing "1000 1000" as arguments,
> and these are the results:
>
> PyPy: 1.95 secs
> Cython: 0.58 secs
> Shedskin: 0.42 secs
>
> so, PyPy is ~4.5x slower than Shedskin.
>
> However, we realized that using the default values for x1,x2,y1,y2, the
> innermost loop runs very few iterations most of the time, and this is one
> case in which PyPy suffer most, because it needs to go through a bridge to
> continue the execution, and at the moment bridges are slower than loops.
>
> So, we changed the values of x1,x2,y1,y2 to compute a different region, in
> which the innermost loop runs more frequently.  We used these values:
> x1, x2, y1, y2 = 0.37865401-0.02, 0.37865401+0.02, 0.669227668-0.02,
> 0.669227668+0.02
>
> and since all programs are faster to compute the image, we used "3000 3000"
> as arguments from the command line.  These are the results:
>
> PyPy: 0.89
> Cython: 1.76
> Shedskin: 0.26
>
> So, in this case, PyPy is ~2x faster than Cython and ~3.5x slower than
> Shedskin.
>
> In the meantime, Armin wrote a C version of it:
> http://paste.pocoo.org/raw/504216/
>
> which tooks 0.946 seconds to complete. This is in line with the PyPy's
> result, but we are still investigating why the shedskin's version is so much
> faster.
>
> ciao,
> Anto
>



-- 
Ian Ozsvald (A.I. researcher)
i...@ianozsvald.com

http://IanOzsvald.com
http://MorConsulting.com/
http://StrongSteam.com/
http://SocialTiesApp.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

2011-11-07 Thread Antonio Cuni

Hello Ian,

On 25/07/11 11:00, Ian Ozsvald wrote:

Dear all, I've published v0.2 of my High Performance Python tutorial
write-up from the session I ran at EuroPython:
http://ianozsvald.com/2011/07/25/high-performance-python-tutorial-v0-2-from-europython-2011/


today I and Armin investigated a bit more about the performances of the 
mandelbrot algorithm that you wrote for your tutorial.  What we found is very 
interesting :-).


We compared three versions of the code:

- a (slightly modified) pure python one on PyPy
- the Cython one using calculate_z.pyx_2_bettermath
- the shedskin one, using shedskin2.py

The PyPy version looks like this:

def calculate_z_serial_purepython(q, maxiter, z):
"""Pure python with complex datatype, iterating over list of q and z"""
output = [0] * len(q)
for i in range(len(q)):
zi = z[i]
qi = q[i]
for iteration in range(maxiter):
zi = zi * zi + qi
if (zi.real*zi.real + zi.imag*zi.imag) > 4.0:
output[i] = iteration
break
return output

i.e., it is exactly the same as pure_python_2.py, but we avoid to use abs(zi), 
so it is comparable with the cython and shedskin version.


First, we ran the programs to calculate passing "1000 1000" as arguments, and 
these are the results:


PyPy: 1.95 secs
Cython: 0.58 secs
Shedskin: 0.42 secs

so, PyPy is ~4.5x slower than Shedskin.

However, we realized that using the default values for x1,x2,y1,y2, the 
innermost loop runs very few iterations most of the time, and this is one case 
in which PyPy suffer most, because it needs to go through a bridge to continue 
the execution, and at the moment bridges are slower than loops.


So, we changed the values of x1,x2,y1,y2 to compute a different region, in 
which the innermost loop runs more frequently.  We used these values:
x1, x2, y1, y2 = 0.37865401-0.02, 0.37865401+0.02, 0.669227668-0.02, 
0.669227668+0.02


and since all programs are faster to compute the image, we used "3000 3000" as 
arguments from the command line.  These are the results:


PyPy: 0.89
Cython: 1.76
Shedskin: 0.26

So, in this case, PyPy is ~2x faster than Cython and ~3.5x slower than Shedskin.

In the meantime, Armin wrote a C version of it:
http://paste.pocoo.org/raw/504216/

which tooks 0.946 seconds to complete. This is in line with the PyPy's result, 
but we are still investigating why the shedskin's version is so much faster.


ciao,
Anto
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

2011-07-25 Thread Ian Ozsvald
Ah! Ok, I've noted array.array (along with running bigger tests to
check for JIT warm-up overheads). Hopefully I'll get some more time in
a few weeks to play with variants.
Cheers!
i.

On 25 July 2011 10:08, Maciej Fijalkowski  wrote:
> On Mon, Jul 25, 2011 at 11:00 AM, Ian Ozsvald  wrote:
>> Dear all, I've published v0.2 of my High Performance Python tutorial
>> write-up from the session I ran at EuroPython:
>> http://ianozsvald.com/2011/07/25/high-performance-python-tutorial-v0-2-from-europython-2011/
>>
>> Antonio - you asked earlier if the 'expanded math' version of the
>> Mandelbrot solver (using doubles rather than complex numbers) would be
>> faster - I've timed it and it is a bit faster with a nightly build of
>> PyPy, but nowhere near as fast at ShedSkin's generated C output
>> (details below).
>>
>> Maciej - thanks for pointing me at the numpy module. I've added a tiny
>> section showing numpy in PyPy but I haven't converted the Mandelbrot
>> solver to use it (even finishing v0.2 took longer than I'd thought).
>> I'm hoping that some more exposure in the report might bring in more
>> volunteers from outside.
>>
>> Here's a clip from the report in the PyPy section:
>> "By running pypy pure_python.py 1000 1000 on my MacBook it takes 5.9
>> seconds, running pypy pure_python_2.py 1000 1000 takes 4.9 seconds.
>> (Ian - the only difference with pure_python_2.py is that local
>> dereferences in the tight loop are moved outside the loop, causing
>> fewer dereference operations)
>>
>> As an additional test (not shown in the graphs) I ran pypy
>> shedskin2.py 1000 1000 which runs the expanded math version of the
>> shedskin variant below (this replaces complex numbers with floats and
>> expands abs to avoid the square root). The shedskin2.py result takes
>> 3.2 seconds (which is still much slower than the 0.4s version compiled
>> using shedskin)."
>>
>> The pure_python src is here:
>> https://github.com/ianozsvald/EuroPython2011_HighPerformanceComputing/tree/master/mandelbrot/python
>>
>> shedskin2.py is available here:
>> https://github.com/ianozsvald/EuroPython2011_HighPerformanceComputing/tree/master/mandelbrot/shedskin
>>
>> I haven't tested whether the warm-up periods for PyPy are significant,
>> possibly they account for much of the difference between ShedSkin and
>> PyPy? I want to revisit this but for the next few weeks I have to go
>> back to other projects.
>
> Most come from the fact that you're using lists and not say
> array.array (or numpy array), so the storage is not optimized.
> ShedSkin doesn't allow you to store different types in a list. We'll
> make it fast one day even if you use list, but indeed, using
> array.array would make it much faster.
>
> Cheers,
> fijal
>



-- 
Ian Ozsvald (A.I. researcher, screencaster)
i...@ianozsvald.com

http://IanOzsvald.com
http://SocialTiesApp.com/
http://MorConsulting.com/
http://blog.AICookbook.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

2011-07-25 Thread Maciej Fijalkowski
On Mon, Jul 25, 2011 at 11:00 AM, Ian Ozsvald  wrote:
> Dear all, I've published v0.2 of my High Performance Python tutorial
> write-up from the session I ran at EuroPython:
> http://ianozsvald.com/2011/07/25/high-performance-python-tutorial-v0-2-from-europython-2011/
>
> Antonio - you asked earlier if the 'expanded math' version of the
> Mandelbrot solver (using doubles rather than complex numbers) would be
> faster - I've timed it and it is a bit faster with a nightly build of
> PyPy, but nowhere near as fast at ShedSkin's generated C output
> (details below).
>
> Maciej - thanks for pointing me at the numpy module. I've added a tiny
> section showing numpy in PyPy but I haven't converted the Mandelbrot
> solver to use it (even finishing v0.2 took longer than I'd thought).
> I'm hoping that some more exposure in the report might bring in more
> volunteers from outside.
>
> Here's a clip from the report in the PyPy section:
> "By running pypy pure_python.py 1000 1000 on my MacBook it takes 5.9
> seconds, running pypy pure_python_2.py 1000 1000 takes 4.9 seconds.
> (Ian - the only difference with pure_python_2.py is that local
> dereferences in the tight loop are moved outside the loop, causing
> fewer dereference operations)
>
> As an additional test (not shown in the graphs) I ran pypy
> shedskin2.py 1000 1000 which runs the expanded math version of the
> shedskin variant below (this replaces complex numbers with floats and
> expands abs to avoid the square root). The shedskin2.py result takes
> 3.2 seconds (which is still much slower than the 0.4s version compiled
> using shedskin)."
>
> The pure_python src is here:
> https://github.com/ianozsvald/EuroPython2011_HighPerformanceComputing/tree/master/mandelbrot/python
>
> shedskin2.py is available here:
> https://github.com/ianozsvald/EuroPython2011_HighPerformanceComputing/tree/master/mandelbrot/shedskin
>
> I haven't tested whether the warm-up periods for PyPy are significant,
> possibly they account for much of the difference between ShedSkin and
> PyPy? I want to revisit this but for the next few weeks I have to go
> back to other projects.

Most come from the fact that you're using lists and not say
array.array (or numpy array), so the storage is not optimized.
ShedSkin doesn't allow you to store different types in a list. We'll
make it fast one day even if you use list, but indeed, using
array.array would make it much faster.

Cheers,
fijal
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


[pypy-dev] Updated 'High Performance Python' tutorial (the one from EuroPython 2011)

2011-07-25 Thread Ian Ozsvald
Dear all, I've published v0.2 of my High Performance Python tutorial
write-up from the session I ran at EuroPython:
http://ianozsvald.com/2011/07/25/high-performance-python-tutorial-v0-2-from-europython-2011/

Antonio - you asked earlier if the 'expanded math' version of the
Mandelbrot solver (using doubles rather than complex numbers) would be
faster - I've timed it and it is a bit faster with a nightly build of
PyPy, but nowhere near as fast at ShedSkin's generated C output
(details below).

Maciej - thanks for pointing me at the numpy module. I've added a tiny
section showing numpy in PyPy but I haven't converted the Mandelbrot
solver to use it (even finishing v0.2 took longer than I'd thought).
I'm hoping that some more exposure in the report might bring in more
volunteers from outside.

Here's a clip from the report in the PyPy section:
"By running pypy pure_python.py 1000 1000 on my MacBook it takes 5.9
seconds, running pypy pure_python_2.py 1000 1000 takes 4.9 seconds.
(Ian - the only difference with pure_python_2.py is that local
dereferences in the tight loop are moved outside the loop, causing
fewer dereference operations)

As an additional test (not shown in the graphs) I ran pypy
shedskin2.py 1000 1000 which runs the expanded math version of the
shedskin variant below (this replaces complex numbers with floats and
expands abs to avoid the square root). The shedskin2.py result takes
3.2 seconds (which is still much slower than the 0.4s version compiled
using shedskin)."

The pure_python src is here:
https://github.com/ianozsvald/EuroPython2011_HighPerformanceComputing/tree/master/mandelbrot/python

shedskin2.py is available here:
https://github.com/ianozsvald/EuroPython2011_HighPerformanceComputing/tree/master/mandelbrot/shedskin

I haven't tested whether the warm-up periods for PyPy are significant,
possibly they account for much of the difference between ShedSkin and
PyPy? I want to revisit this but for the next few weeks I have to go
back to other projects.

I hope the report brings in some new folk for PyPy,
Ian.


-- 
Ian Ozsvald (A.I. researcher, screencaster)
i...@ianozsvald.com

http://IanOzsvald.com
http://SocialTiesApp.com/
http://MorConsulting.com/
http://blog.AICookbook.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev