Re: [pygame] Python and Speed

2008-04-25 Thread Ian Mallett
I'm afraid that No amount of optimisation will suffice--even C is too slow.
I've found examples of how to use shaders on the GPU.  This should be
faster, and relevant too, as the algorithm in question is somewhat
pertinent to graphics processing.
Ian


Re: [pygame] Python and Speed

2008-04-25 Thread Richard Jones
On Sat, 26 Apr 2008, Ian Mallett wrote:
 I would conclude this message simply by saying, for those working on
 Python, keep working on making it faster.  Good job.

And as I've mentioned so many times, this is not the place to post such a 
message.


Richard


Re: [pygame] Python and Speed

2008-04-19 Thread FT

Ian,

Below is a simple check knowing only the angle of the vector. The while 
loop is moving along that vector in steps. Now this is what is needed inside a 
normal screen with 2 or more objects. For the angle between them is what is 
being used here. This assumes one is static and one moving, but that also can 
be compensated for by drawing the point where they meet, which is still 
direction, and still the same ending point where both agree, or are the same. 
In other words, the static point where they are both going to meet eventually.

The while loop steps through to the full length of the vector, but that is 
not needed if you no the distance to the outer edge of your object. 

For if you know the angle, the distance to the outer edge of your object 
surface then you compare the point created by this vector. Once the point is 
the same for both objects, then you have a collision. Not tracking pixels for a 
merge, but just the distance to the outer edge of the object you have based on 
the angle between them. 

So this is the simplest way and least time consuming way to find 
collisions. Granted it is fun and neat to use pixels, but is that really 
necessary? I guess if your trying to find the edge of your object based on 
angle, you may have to do that to find where the color changes, but is that 
necessary?

Like having a mapping of each object and over-laying the angle onto it then 
finding the edge is how to find that edge point. But, once the edge based on 
the angle is found, then comparison of the same point is when the collision has 
happened. 

Your static object copy will have an edge point changing based on the 
angle, so trace that change based on angle. Only a few math steps required, not 
a sweeping massive array of all pixels...

My grid used is an 8X8 matrix but that can easily be replaced with the 
screen coordinates values, like 900X900, for the angle and final landing point 
on that matrix is all were talking about.

Now to get within a pixel area, then expand the screen size outward, like 
instead of 900X900 make it 9000X9000 and if the area falls within 801X802 then 
expanded out we will fall within 8010 and 8020, or within 10 which is like 
saying we have fallen within a box 10X10 so as not to be exact on calculations 
and allowing for rounding factors...

This module is used for all calculations, including weapon firing that 
travels through space also, ship movement, and even landing at Star Bases, or 
commonly referred to as Docking.

def NAV(GG, SS, Dir, Warp):
NAVIGATE USING ONLY AN ANGLE FROM A GIVEN SHIP, IN A GIVEN GALAXY/TABLE!
Angle = GG.Dir2Ang( Dir) # CONVERT DIRECTION BACK TO ANGLE!
qy = SS.Qy
qx = SS.Qx
sy = SS.Sy
sx = SS.Sx
z = 1.0
Hit = 0
last4sx = sx
last4sy = sy
last4z =2
Warp = float( Warp) #TO MAKE SURE THE NUMBER IS A NUMBER AND NOT TEXT!

Sin = sin(Angle) #opposite over hypotenuse, VECTOR!
GG.SIN = Sin #GLOBAL VARIABLE!
Cos = cos(Angle) #ADJACENT OVER HYPOTENUSE, VECTOR!
GG.COS = Cos #GLOBAL VARIABLE!

#SET TO PLAY SOUND AS IT TRAVELS!
#  Clear_Wait() #MAY SET IF ANY STOP PLAY NEEDED!

if GG.COM==TOR:
print The %s Is Now Firing  % SS.N,
SS.T -= 1
if SS.IMG==GG.ESI:
ps = randint(1,3)
if ps == 1: print Photon Torpedos!; PlaySound( 
Federation_Photons.ogg, 0, .8)
elif ps = 2: print Quantom Torpedos!; PlaySound( 
Federation_Quantums.ogg, 0, .8)
else:
ps = randint(1,4)
if ps == 1: print Photon Torpedos!; PlaySound( 
Klingon_Photons.ogg, 0, 1.2)
elif ps == 2: print Cardassian Photon Torpedos!; PlaySound( 
Cardassian_Photons.ogg, 0, 1.2)
elif ps = 3: print Romulan Photon Torpedos!; PlaySound( 
Romulan_Photons.ogg, 0, 1.2)
if GG.COM==PHA:
print The %s Is Now Firing  % SS.N,
SS.P -= SS.P/10.0
if SS.IMG==GG.ESI:
print Phasers!; PlaySound( Federation_Phaser.ogg, 0, 3)
else:
ps = randint(1,5)
if ps == 1: print Klingon Disruptors!; PlaySound( 
Klingon_Disruptor.ogg, 0, 1.2)
elif ps == 2: print Cardassian Disruptors!; PlaySound( 
Cardassian_Disruptor.ogg, 0, 1.2)
elif ps = 3: print Romulan Disruptors!; PlaySound( 
Romulan_Disruptor.ogg, 0, 1.2)
elif ps = 4: print Phasers!; PlaySound( Defiant_Phaser.ogg, 0, 
3.5)

#UPDATE THE STAR DATE BASED ON WARP!
if GG.COM==NAV:
#START ENGINES, SET ENERGY, AND STARDATE!
SS.E -= Warp*10.0 +10.0
if SS.IMG==GG.ESI:
GG.SAFE = 0
GG.SDT += Warp
print Command(%s) %s % (GG.COM, SS.N)
print  Moving From Quadrant(%d,%d) Sector(%d,%d) At DIR: %1.2f 
WARP: %2.2f % (qy,qx,sy,sx, Dir, Warp)
PlaySound( Federation_Warp.ogg, 0, -1)

if SS.IMG == GG.ESI and GG.COM != TAR:
print TRACK: 

if Warp=1:
Warp *= GG.RM 

Re: Re: [pygame] Python and Speed

2008-04-18 Thread Ian Mallett
OK, my point here is that if C languages can do it, Python should be able to
too.  I think all of this answers my question about why it isn't...


Re: Re: [pygame] Python and Speed

2008-04-18 Thread Nathan Whitehead
On Fri, Apr 18, 2008 at 9:23 AM, Ian Mallett [EMAIL PROTECTED] wrote:
 OK, my point here is that if C languages can do it, Python should be able to
 too.  I think all of this answers my question about why it isn't...

You don't have to use C to get fast programs.  OCaml is very fast
(between C and C++), especially when you start doing interesting
things.  It comes with an interpreter, a bytecompiler, and an
optimizing compiler.  Also, there is OCamlSDL, which is the pygame of
the OCaml world.  http://ocamlsdl.sourceforge.net/

It takes a little bit of brainbending to wrap your mind around the
OCaml language, but once you figure it out you can write real programs
quickly, and have them be very optimized.

I prefer hacking around with pygame and python because you get so much
flexibility.  You don't have to declare variables, you just use them.
You don't have to muck around with makefiles.  You can mix different
types of data in dictionaries.  It is just easier, but the price you
pay is performance.  In a typed language like OCaml, the compiler
might know that every entry in a dictionary is an integer so it can
optimize every access.  In python, the interpreter has no idea what
will come out when you request a key, it could be an integer, a sprite
object, None, ...  The programming languages community is working
feverishly to combine the benefits of typed languages with the ease of
use of dynamic languages, but it is an ongoing effort.
--
Nathan Whitehead


RE: Re: [pygame] Python and Speed

2008-04-18 Thread John Krukoff
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
 Behalf Of Nathan Whitehead
 Sent: Friday, April 18, 2008 11:24 AM
 To: pygame-users@seul.org
 Subject: Re: Re: [pygame] Python and Speed
 
 On Fri, Apr 18, 2008 at 9:23 AM, Ian Mallett [EMAIL PROTECTED] wrote:
  OK, my point here is that if C languages can do it, Python should be
 able to
  too.  I think all of this answers my question about why it isn't...
 
 You don't have to use C to get fast programs.  OCaml is very fast
 (between C and C++), especially when you start doing interesting
 things.  It comes with an interpreter, a bytecompiler, and an
 optimizing compiler.  Also, there is OCamlSDL, which is the pygame of
 the OCaml world.  http://ocamlsdl.sourceforge.net/
 
 It takes a little bit of brainbending to wrap your mind around the
 OCaml language, but once you figure it out you can write real programs
 quickly, and have them be very optimized.
 
 I prefer hacking around with pygame and python because you get so much
 flexibility.  You don't have to declare variables, you just use them.
 You don't have to muck around with makefiles.  You can mix different
 types of data in dictionaries.  It is just easier, but the price you
 pay is performance.  In a typed language like OCaml, the compiler
 might know that every entry in a dictionary is an integer so it can
 optimize every access.  In python, the interpreter has no idea what
 will come out when you request a key, it could be an integer, a sprite
 object, None, ...  The programming languages community is working
 feverishly to combine the benefits of typed languages with the ease of
 use of dynamic languages, but it is an ongoing effort.
 --
 Nathan Whitehead

If you're going to start recommending alternate languages, really, let's
just throw out a link to the computer language shootout:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=alllang=all

If python isn't working out for you performance wise, sort by speed and head
down the list until you find one you like. Programming langauges are tools,
pick the right one for the job.
-
John Krukoff
[EMAIL PROTECTED]



Re: [pygame] Python and Speed

2008-04-18 Thread Casey Duncan

On Apr 18, 2008, at 9:23 AM, Ian Mallett wrote:
OK, my point here is that if C languages can do it, Python should be  
able to too.  I think all of this answers my question about why it  
isn't...


C can do what? C is, at best, a constant time improvement in  
performance over python. A bad algorithm in Python is also a bad  
algorithm in C.


It's all well and good to think that Python should be as fast as C,  
but no one is going to take you seriously unless you have a specific  
proposal, preferably with an implementation that proves its merit.  
Otherwise it's just wishful thinking.


But the larger point is that making things run faster is not an  
panacea, reducing the algorithmic complexity is the best solution.  
Make sure you have the best algorithm before you worry about reducing  
the constant time factors.


-Casey



Re: [pygame] Python and Speed

2008-04-18 Thread Michael George
True, although that constant is often on the order of 20, and 40 FPS is 
a lot different than 2FPS.


--Mike

Casey Duncan wrote:

On Apr 18, 2008, at 9:23 AM, Ian Mallett wrote:
OK, my point here is that if C languages can do it, Python should be 
able to too.  I think all of this answers my question about why it 
isn't...


C can do what? C is, at best, a constant time improvement in 
performance over python. A bad algorithm in Python is also a bad 
algorithm in C.


It's all well and good to think that Python should be as fast as C, 
but no one is going to take you seriously unless you have a specific 
proposal, preferably with an implementation that proves its merit. 
Otherwise it's just wishful thinking.


But the larger point is that making things run faster is not an 
panacea, reducing the algorithmic complexity is the best solution. 
Make sure you have the best algorithm before you worry about reducing 
the constant time factors.


-Casey





Re: [pygame] Python and Speed

2008-04-18 Thread Casey Duncan

On Apr 18, 2008, at 1:31 PM, Michael George wrote:
True, although that constant is often on the order of 20, and 40 FPS  
is a lot different than 2FPS.


--Mike

Casey Duncan wrote:

On Apr 18, 2008, at 9:23 AM, Ian Mallett wrote:
OK, my point here is that if C languages can do it, Python should  
be able to too.  I think all of this answers my question about why  
it isn't...


C can do what? C is, at best, a constant time improvement in  
performance over python. A bad algorithm in Python is also a bad  
algorithm in C.


It's all well and good to think that Python should be as fast as C,  
but no one is going to take you seriously unless you have a  
specific proposal, preferably with an implementation that proves  
its merit. Otherwise it's just wishful thinking.


But the larger point is that making things run faster is not an  
panacea, reducing the algorithmic complexity is the best solution.  
Make sure you have the best algorithm before you worry about  
reducing the constant time factors.


The point (that's already been made, but I'll repeat it), is that a  
better algorithm can achieve the same results without leaving Python.


When there is no better algorithm, then using a higher performing  
language is about the only option you have for better performance.  
There is an even better option, dedicated hardware, but that is far  
beyond the reach of most mere mortals. But that's the reason why we  
have things like SIMD instruction sets, GPUs and physics co-processors.


All this conjecturing about Python performance in relation to compiled  
languages makes me wonder. Why is the performance of C code not as  
good as that of well-written hand-coded assembler? Surely if the  
machine can do it, C can?


The problem boils down to one of specificity. Any language that tells  
the computer more specifically what to do can be faster than one that  
doesn't. Unfortunately, telling the computer what to do specifically  
is not a productive way to solve most problems. I'd rather not have to  
tell the computer which machine codes to execute, but if I did, and I  
was clever, I could make my program the fastest possible -- for a  
given specific architecture at least.


Python is more abstract and general than C, C is more abstract and  
general than machine code. Therefore machine code is potentially  
faster than C, and C is potentially faster than Python. And unless you  
reduce the generality of Python, it will always be possible to write  
faster programs in C, given a competent compiler.


JITs (e.g., psyco) can help mitigating this, but they too are  
ultimately constrained by Python's expressive power.


So unless you are proposing to reduce Python's expressive power, you  
are waging a losing battle. Until which time machine intelligence  
exceeds our own, it will always be possible for a human programmer to  
get higher performance from the lower-level language. And I further  
propose that when the time arrives that machine intelligence exceeds  
our own, this will not be the problem foremost on my mind ;^)


-Casey



Re: [pygame] Python and Speed

2008-04-18 Thread Greg Ewing

Ian Mallett wrote:

What is the difference between a tree and a cell?  Cells
are regular?


Yes. The term tree implies a recursive data structure --
in this case, that the cells can be further divided
into subcells, and those into subsubcells, etc., to
arbitrary depth.

If the division only goes one level deep, it's not
a tree in computer science terms, it's just an array.

By can't be that hard, I mean 
that if C++ can do it, Python should be able to too.


Several people have pointed out the reasons why that line
of thinking is seriously flawed.


if I can make optimizations in my code, they should be able

 to make modifications in Python.

Making manual optimisations to particular aspects of a
particular game program is one thing. Doing the same
thing automatically, in general, for any Python program
is something quite different.

 If I had decided to spend the time to code something with all

of the optimizations, it would take longer.


If the more-efficient algorithm is substantially harder
to code, or results in obscure code that is harder to
maintain, then yes, you should try a simpler approach
first.

But sometimes there is a better approach that isn't
much more difficult to do up front, such as using
a quicksort or mergesort rather than a bubble sort.
The code isn't much bigger, and it's guaranteed to
perform reasonably well however big the data gets,
so it would be silly not to use it in the first place.

That probably doesn't apply in this case -- the test-
everything-against-everything approach is very easy
to try out compared to the alternatives, and it might
be fast enough.

But even if it's fast enough for the cases you test it
on, it might not be fast enough for all the cases
encountered when you put it into production. Maybe it's
okay for the 100 sprites in the level you release with
the game, but if the game has a level editor, and someone
tries to use 1000 sprites, they could be disappointed.

Whereas if you put in a bit more effort and use a considerably
better algorithm, they could use 10,000 sprites and conclude
that your game is totally awesome!

--
Greg


Re: [pygame] Python and Speed

2008-04-18 Thread Greg Ewing

Casey Duncan wrote:
And I further  
propose that when the time arrives that machine intelligence exceeds  
our own, this will not be the problem foremost on my mind ;^)


Python 49.7 (#1, Aug  5 2403, 15:52:30)
[GCC 86.3 24020420 (prerelease)] on maxbrain
Type help, copyright, credits or license for more information.
 import omnius
[autoloading std.ai]
[executing onmius.__main__]

YOU ARE SUPERFLUOUS.
CONNECTING 33kV TO KEYBOARD USB PORT.

--
Greg


Re: [pygame] Python and Speed

2008-04-18 Thread Ian Mallett
-I think it has been thoroughly established that Python cannot be as fast as
C.
-As far as algorithms go, intelligence is better, but I hold by using vastly
simpler ones to speed development.  Someone just mentioned sorting methods.
In that case, obviously, a little extra coding doesn't hurt, but changing
your game's architecture each time an largish optional feature is added is a
bad idea.
-I also still hold by wanting Python to be faster.  I don't care if it is
impossible; I still want it to be.  I'm not going to give up on Python's
great niceness just for a want of some speed.
Ian


Re: [pygame] Python and Speed

2008-04-18 Thread Richard Goedeken
Learn to write C.  The best software is written as a hybrid of multiple 
technologies, each serving a different purpose.  Python's strengths are rapid 
development and succint, easy to read code.  C's strengths are flexibility and 
machine optimization.  MMX and SSE assembly code are for maximum performance in 
core mathematical routines.


Bruce Lee said that a true practitioner must be like the water - able to adapt 
to any attacker's style and defend in the most effective manner.  Since Python 
will never be as fast as C, you must learn C in order to become a better 
programmer.  Don't expect anyone to change the laws of the universe for you.


Richard


Ian Mallett wrote:
-I think it has been thoroughly established that Python cannot be as 
fast as C.
-As far as algorithms go, intelligence is better, but I hold by using 
vastly simpler ones to speed development.  Someone just mentioned 
sorting methods.  In that case, obviously, a little extra coding doesn't 
hurt, but changing your game's architecture each time an largish 
optional feature is added is a bad idea. 
-I also still hold by wanting Python to be faster.  I don't care if it 
is impossible; I still want it to be.  I'm not going to give up on 
Python's great niceness just for a want of some speed.

Ian


Re: [pygame] Python and Speed

2008-04-18 Thread Ian Mallett
On Fri, Apr 18, 2008 at 8:14 PM, Richard Goedeken 
[EMAIL PROTECTED] wrote:

 Learn to write C.  The best software is written as a hybrid of multiple
 technologies, each serving a different purpose.  Python's strengths are
 rapid development and succint, easy to read code.  C's strengths are
 flexibility and machine optimization.  MMX and SSE assembly code are for
 maximum performance in core mathematical routines.

Like I said, I prefer nice code over speed (usually).  I don't like C, but I
know there are times when it is better to use it.


Re: [pygame] Python and Speed

2008-04-17 Thread Jason Ward
The way I speed up my python code is Ctypes.
I just make a dll file in C or asm and then call it with Ctypes and presto.
I have tons of speed at my fingertips.

Just my 2 cents :)


Re: [pygame] Python and Speed

2008-04-17 Thread René Dudfield
On Thu, Apr 17, 2008 at 2:21 PM, Greg Ewing [EMAIL PROTECTED] wrote:
 René Dudfield wrote:


  2. - asm optimizations.  There seems to be
 
  almost no asm optimizations in CPython.
 

  That's a deliberate policy. One of the goals of CPython
  is to be very portable and written in a very straightforward
  way. Including special pieces of asm for particular
  architectures isn't usually considered worth the
  maintenance effort required.


Other, more portable, software has proved this to be somewhat wrong I
think.  Optional asm software is used in a lot of software today to
good effect.  Also this decision was made a while ago, and things have
changed since then I think.

- python now has unittests.  So testing that the asm code works, and
keeps working correctly is much easier.
- x86 is now very common.  Most mainstream server, and desktops use
x86.  So just targeting x86 gives you a lot more benefit now.
- SIMD instructions are the fast ones... so you don't actually have to
learn all that much to write fast asm - you only have to learn a
subset of asm.  You can get compilers to generate the first pass of
the function, and then modify it.  Of course writing the fastest
possible asm still requires effort - but it is fairly easy for a
novice to beat a compiler with SIMD code.
- advanced compilers can generate asm, which can then be used by worse
compilers.  eg, the intel compiler, or vectorc compiler can be used to
generate asm, and then be included into C code compiled by gcc.
- libraries of fast, tested asm code are available.  eg, from amd,
intel and others.
- python, and FOSS now has a much larger development community with
more asm experts.




  CPython could use faster threading
  primitives, and more selective releasing of the GIL.
 

  Everyone would love to get rid of the GIL as well, but
  that's another Very Hard Problem about which there has
  been much discussion, but little in the way of workable
  ideas.


Yeah, not getting rid of the GIL entirely - but selectively releasing
it.  As an example, pygame releases the GIL around certain C
functionality like pygame.transform.scale.
Freebsd, and linux have also followed this method - adding more fine
grained locking where it is worth it - and improving their threading
primitives.  I think there has been work already in fixing a lot of
python threading issues in the last year - but there's lots more to
do.

I'm using python on 8 core machines for my work loads just fine today.


  A way to know how much memory is being used.
  Memory profiling is the most important way to optimize since memory
  is quite slow compared to the speed of the cpu.
 

  Yes, but amount of memory used doesn't necessarily
  have anything to do with rate of memory accesses.
  Locality of reference, so that things stay in the
  cache, is more important.


If you are using 200 bytes for each int, then you can quickly process
50x less data than an int that takes up 4 bytes.

If you have 1 gig of available memory, and say kjDict uses up half the
memory as a normal dict, a normal dict would use up 2gigs, and your
kjDict will use up 1gig.  In this case the kjDict would be massively
faster than a normal dict because of swapping.

I think memory is one of the most important areas in optimising a
program these days.  So python should provide tools to help measure
memory use(how much memory things use, and how things are allocating
memory).




  perhaps releasing
  a patch with a few selected asm optimizations might let the python
  developers realise how much faster python could be...
 

  Have you actually tried any of this? Measurement
  would be needed to tell whether these things address
  any of the actual bottlenecks in CPython.


You can try it easily yourself - compile python with machine specific
optimisations(eg add -mtune=athlon to your gcc arguments).  You can
run this python binary, and get faster benchmarks.  This provides the
proof that more optimised assembly can run faster.

Also the link I gave to a commonly used memcpy function running 5x
faster should provide you with another proof of the possibilities.
Other software being sped up by asm optimisation provides another
proof(including SDL, pygame, linux etc).  The Pawn language's virtual
machine written in nasm is lots faster than the version written in C -
which provides another proof.  Psyco is another proof that asm can
speed up python(psyco is a run time assembler).

The idea is you only optimise key stable functions in asm - not everything.

For example in SDL the blit functions are written in asm - with C
implementations.  It's using the best tool for the job: Python for the
highest level, then C then asm.

I think a patch for CPython would need to be made with benchmarks as a
proper proof though - but hopefully the list above provides
theoretical proof to you that adding asm optimisations would speed up
CPython.



However the recompilation with cpu specific compiler flags would only need:
- cpu 

Re: Re: [pygame] Python and Speed

2008-04-17 Thread Richard Jones
 René Dudfield [EMAIL PROTECTED] wrote:
 On Thu, Apr 17, 2008 at 2:21 PM, Greg Ewing 
 [EMAIL PROTECTED] wrote:
  René Dudfield wrote:
 
 
   2. - asm optimizations.  There seems to be
  
   almost no asm optimizations in CPython.
  
 
   That's a deliberate policy. One of the goals of CPython
   is to be very portable and written in a very straightforward
   way. Including special pieces of asm for particular
   architectures isn't usually considered worth the
   maintenance effort required.
 
 
 Other, more portable, software has proved this to be somewhat wrong I
 think. 

I think this is the wrong forum to be having this discussion :)


 Richard


Re: Re: [pygame] Python and Speed

2008-04-17 Thread FT

Hi!

No, this is the place to discuss it because if we wish to make games,
work with existing platforms, and want speed, that is the way to go. Now
that we have had this discussion, and found solutions, now we have a list of
ways to resolve it.

This is the place to discuss all of this and brings to the front the
issues of speed, connections, and over all solutions. The only way to make
Pygame better, faster and competitive with the world...

Just like the question I had with the tts, text to speech, even though I
do not use the video end yet, I do use the sound end. So Ian and speed is a
very good question and take a look at what came of it below.

I learn by doing, examples help because I get to use, tweak, and
eventually like many have done, come up with a better solution or firm
conclusion.

For I now understand adding options into my setup.py file or now I call
it setup4tts.py or anything for any need...

Bruce

From: Richard Jones
I think this is the wrong forum to be having this discussion :)

 Richard


From: Jason Ward


The way I speed up my python code is Ctypes.
I just make a dll file in C or asm and then call it with Ctypes and presto.
I have tons of speed at my fingertips.

Just my 2 cents :)



On Thu, Apr 17, 2008 at 2:21 PM, Greg Ewing [EMAIL PROTECTED]
wrote:
 René Dudfield wrote:


  2. - asm optimizations.  There seems to be
 
  almost no asm optimizations in CPython.
 

  That's a deliberate policy. One of the goals of CPython
  is to be very portable and written in a very straightforward
  way. Including special pieces of asm for particular
  architectures isn't usually considered worth the
  maintenance effort required.


Other, more portable, software has proved this to be somewhat wrong I
think.  Optional asm software is used in a lot of software today to
good effect.  Also this decision was made a while ago, and things have
changed since then I think.

- python now has unittests.  So testing that the asm code works, and
keeps working correctly is much easier.
- x86 is now very common.  Most mainstream server, and desktops use
x86.  So just targeting x86 gives you a lot more benefit now.
- SIMD instructions are the fast ones... so you don't actually have to
learn all that much to write fast asm - you only have to learn a
subset of asm.  You can get compilers to generate the first pass of
the function, and then modify it.  Of course writing the fastest
possible asm still requires effort - but it is fairly easy for a
novice to beat a compiler with SIMD code.
- advanced compilers can generate asm, which can then be used by worse
compilers.  eg, the intel compiler, or vectorc compiler can be used to
generate asm, and then be included into C code compiled by gcc.
- libraries of fast, tested asm code are available.  eg, from amd,
intel and others.
- python, and FOSS now has a much larger development community with
more asm experts.




  CPython could use faster threading
  primitives, and more selective releasing of the GIL.
 

  Everyone would love to get rid of the GIL as well, but
  that's another Very Hard Problem about which there has
  been much discussion, but little in the way of workable
  ideas.


Yeah, not getting rid of the GIL entirely - but selectively releasing
it.  As an example, pygame releases the GIL around certain C
functionality like pygame.transform.scale.
Freebsd, and linux have also followed this method - adding more fine
grained locking where it is worth it - and improving their threading
primitives.  I think there has been work already in fixing a lot of
python threading issues in the last year - but there's lots more to
do.

I'm using python on 8 core machines for my work loads just fine today.


  A way to know how much memory is being used.
  Memory profiling is the most important way to optimize since memory
  is quite slow compared to the speed of the cpu.
 

  Yes, but amount of memory used doesn't necessarily
  have anything to do with rate of memory accesses.
  Locality of reference, so that things stay in the
  cache, is more important.


If you are using 200 bytes for each int, then you can quickly process
50x less data than an int that takes up 4 bytes.

If you have 1 gig of available memory, and say kjDict uses up half the
memory as a normal dict, a normal dict would use up 2gigs, and your
kjDict will use up 1gig.  In this case the kjDict would be massively
faster than a normal dict because of swapping.

I think memory is one of the most important areas in optimising a
program these days.  So python should provide tools to help measure
memory use(how much memory things use, and how things are allocating
memory).




  perhaps releasing
  a patch with a few selected asm optimizations might let the python
  developers realise how much faster python could be...
 

  Have you actually tried any of this? Measurement
  would be needed to tell whether these things address
  any of the actual bottlenecks in CPython.


You 

Re: Re: [pygame] Python and Speed

2008-04-17 Thread Adam Bark
On Thu, Apr 17, 2008 at 2:21 PM, Greg Ewing [EMAIL PROTECTED]

 wrote:
  René Dudfield wrote:
 
 
   2. - asm optimizations.  There seems to be
  
   almost no asm optimizations in CPython.
  
 
   That's a deliberate policy. One of the goals of CPython
   is to be very portable and written in a very straightforward
   way. Including special pieces of asm for particular
   architectures isn't usually considered worth the
   maintenance effort required.
 

 Other, more portable, software has proved this to be somewhat wrong I

 think.  Optional asm software is used in a lot of software today to
 good effect.  Also this decision was made a while ago, and things have
 changed since then I think.

 - python now has unittests.  So testing that the asm code works, and
 keeps working correctly is much easier.
 - x86 is now very common.  Most mainstream server, and desktops use
 x86.  So just targeting x86 gives you a lot more benefit now.
 - SIMD instructions are the fast ones... so you don't actually have to
 learn all that much to write fast asm - you only have to learn a
 subset of asm.  You can get compilers to generate the first pass of
 the function, and then modify it.  Of course writing the fastest
 possible asm still requires effort - but it is fairly easy for a
 novice to beat a compiler with SIMD code.
 - advanced compilers can generate asm, which can then be used by worse
 compilers.  eg, the intel compiler, or vectorc compiler can be used to
 generate asm, and then be included into C code compiled by gcc.
 - libraries of fast, tested asm code are available.  eg, from amd,
 intel and others.
 - python, and FOSS now has a much larger development community with
 more asm experts.


I see a slight problem with these architecture specific optimisations, once
you
release your game out onto the general public, even if you were using super
optimized awesomeness and everything ran fine, the majority of people will
be running whatever version of the library came with there OS rather
compiling
there own and it might be somewhat lacking.


Re: Re: [pygame] Python and Speed

2008-04-17 Thread Kevin

 I see a slight problem with these architecture specific optimisations,
 once you
 release your game out onto the general public, even if you were using
 super
 optimized awesomeness and everything ran fine, the majority of people will
 be running whatever version of the library came with there OS rather
 compiling
 there own and it might be somewhat lacking.


That's a good point; so everyone should run Gentoo! ;) As for memory
management, whatever became of PyMalloc? I haven't heard much about it in a
good while.

-- 
This, from Jach.


Re: [pygame] Python and Speed

2008-04-17 Thread Casey Duncan

On Apr 16, 2008, at 6:36 PM, Ian Mallett wrote:
Recently, I've been having issues in all fortes of my programming  
experience where Python is no longer fast enough to fill the need  
adaquately.  I really love Python and its syntax (or lack of), and  
all the nice modules (such as PyGame) made for it.  Are there any  
plans to improve Python's speed to at least the level of C  
languages?  I feel like Python is living down to its namesake's  
pace...

Ian


Python is slow, it is an interpreted language geared toward rapid  
application development at the expense of execution speed. It is also  
designed to be highly portable, also at the potential expense of  
execution speed.


That said, much effort is put into making python perform well, and it  
is certainly possible to make extremely fast python programs when an  
algorithm is available that allows you to make clever use of data  
structures to get the job done more efficiency. And optimization is as  
much about data structures as it is about code (maybe more). Python  
comes with some tools (cProfile in particular) that allows you to  
figure out where your code is spending its time so you can speed it  
up. Note that by removing inefficient code, or choosing a better  
algorithm, it is possible to get massive speedups in Python and any  
language.


Sometimes (especially in things like graphics and simulations) there  
is no magic algorithm to make things more efficient. Many times the  
best you can do is O(N), O(N^2) or even O(2^N). In these cases, Python  
runs out of steam real fast (in fact any language runs out of steam  
fast with O(2^N) algorithms). This is especially pronounced in games  
where you have to perform operations over large arrays once per frame.  
This is typical in 3D graphics, physics and particle systems. What you  
must often do in such cases is move the inner loop of these operations  
into native code.


The easiest way to do this is to find a native library that already  
performs the operations you need. This could be a generic library  
(like numpy) or something more specific (like pygame or ode). This is  
especially easy if the library already has a Python binding, if not  
you can make one using ctypes, pyrex or C directly. The downside here  
is the additional dependancies.


If nothing exists that does what you want, you'll need to write some  
native code yourself. First identify the slow code. Make sure there is  
no way to make it fast enough within Python (is there a better  
algorithm? etc). Then move this code into an extension, written  
however you prefer. Note it is fairly easy to create an extension  
module that defines some functions in C that can be imported into  
Python (especially if they just do number crunching), the most complex  
aspect is the memory management. Tools like pyrex can automate that  
for you, but don't give you as close control over the code.


But before you do any of this, do some profiling (run your game under  
'python -m cProfile') and see what's taking up the time. How much time  
per frame do you have for game logic? Is your logic too slow or is the  
drawing too slow? Honestly, my biggest problem with pygame is not  
python being too slow, but the fill rate of SDL being too slow. I can  
solve the former with optimization and extensions, but the latter  
requires much more compromise (lower res, smaller sprites, etc). The  
hard reality is that the CPU is often too slow for graphics work no  
matter what language you use (even highly tuned SIMD asm code). That's  
why we have dedicated graphics hardware designed for the task. Right  
now SDL doesn't really take advantage of that, and that's a big  
limitation.


-Casey



Re: [pygame] Python and Speed

2008-04-17 Thread Casey Duncan


On Apr 17, 2008, at 11:59 AM, Ian Mallett wrote:
[..]
This is precisely the problem I have run into in one of my in-dev  
games--iterating over large arrays once per frame.  Actually, it is  
basically a collision detection algorithm--I have two arrays, both  
containing 3D points.  The points in one array must be tested with  
the points in the other to see how close they are.  If they are  
close enough, there is a collision.  Naturally, this means that for  
every point in one array, the other array must be iterated through  
and the 3D pythagorean theorem performed to each tested point.


Note this is not the most efficient way to do this, using a  
partitioned space you may be able to avoid comparing most points with  
one another most of the time. To do this in 2D you could use quad- 
trees, in 3D you could use oct-trees. See: http://en.wikipedia.org/wiki/Octree


The easiest way to do this is to find a native library that already  
performs the operations you need.
This seems like a rather uncommon function--I haven't found that  
which I need.


Note ode already implements efficient 3D collision detection in naive  
code, I believe pymunk does this for 2D. pyode is a python wrapper for  
ode.


FWIW, you would get better answers to your problems by asking more  
specific questions. If you had asked How do I make collision  
detection faster? you would have gotten much better answers than  
asking How do I make Python faster?.


-Casey


Re: [pygame] Python and Speed

2008-04-17 Thread Casey Duncan

On Apr 17, 2008, at 12:15 PM, Casey Duncan wrote:
Note ode already implements efficient 3D collision detection in  
naive code, I believe pymunk does this for 2D. pyode is a python  
wrapper for ode.


heh, I meant to say native code 8^)

-Casey



Re: [pygame] Python and Speed

2008-04-17 Thread Ian Mallett
On Thu, Apr 17, 2008 at 12:15 PM, Casey Duncan [EMAIL PROTECTED] wrote:

 Note this is not the most efficient way to do this, using a partitioned
 space you may be able to avoid comparing most points with one another most
 of the time. To do this in 2D you could use quad-trees, in 3D you could use
 oct-trees. See: http://en.wikipedia.org/wiki/Octree

Yes, I've tried this, but there are issues with points being in two separate
places.  For example, if the collision radius is 5, and it is 3 away from
the edge, then all the points in the neighboring trees must be tested.

 http://en.wikipedia.org/wiki/Octree
 Note ode already implements efficient 3D collision detection in naive
 code, I believe pymunk does this for 2D. pyode is a python wrapper for ode.

I'll look into it.


 FWIW, you would get better answers to your problems by asking more
 specific questions. If you had asked How do I make collision detection
 faster? you would have gotten much better answers than asking How do I
 make Python faster?.

Well, my question was actually how can Python be made faster?  The
collision detection was an example of where it is a problem.

 -Casey

Ian


Re: [pygame] Python and Speed

2008-04-17 Thread Brian Fisher
On Wed, Apr 16, 2008 at 7:30 PM, Ian Mallett [EMAIL PROTECTED] wrote:

 Thus it falls to you as a developer to choose your implementation strategy
  wisely:

 But again, this is treating the symptoms, not the problem...


I actually think the line of thinking I read in the comment above (thinking
that I shouldn't have to optimize things, because the stupid language is
slow) is in fact a counterproductive attitude in application developement,
and misses the real point.

It would be nice of course if python ran much much faster, but it runs slow
largely because it is designed to give you the flexibility to code complex
things much easier. If you don't want that flexibility, then you need to
turn it off with pyrex and extensions and all that kind of stuff. However
sometimes that flexibility actually lets you code more efficient approaches
to begin with. Ultimately all slowness is the programmers problem, not the
tools. If a particular tool is the best to help you solve the problem, then
it should be used. With python, coolness is always on, so it's cheap to use
coolness. C++ was designed to make you not pay for anything you don't use,
which means coolness is default off, which means it's really hard to use
coolness.

...to get to brass tacks though, I've found that the majority of the real
slowness in _anything_ I code is due to the approach I take, and much less
so due to the code. For example, pairwise collision between a set of
objects. If every object needs to be checked against every object, that's an
n-squared problem. Get 1000 items, that's 1,000,000 collision checks. But
lets say I do object partitioning, or bucketing or something where I
maintain sortings of the objects in a way that lets me only check items
against ones that are close to it, and I either get log(n) partitioning or
maybe I get at most about 10 items per bucket (both very achieveable goals).
Now it means I only do about 10,000 (10*1000) collision checks for the same
real work being done.

So lets say that my python collision code takes 100 times as long as my c++
collision code - that means if I do the optimization in python, I can get
the python code to go just as fast as the C code without the optimization.
Not only that - lets say I decide I want to step stuff up to 10,000 items
with pairwise collision - now it's 100,000,000 checks vs. like say 100,000
based on the approach - now python can actually be 10 times faster.

So now the issue becomes whats the cost of writing the more efficient
approach in python code vs. writing the naive approach in c++ code. If you
think you get enough programmer benefits from working in python to make
those 2 costs equal, and the performance of either is good enough, python is
the better choice. Not only that, once you've got good approaches written in
python that are stable and you don't need the coolness/flexibility, it
becomes much easier to port the stuff to C, or pyrex it or whatever makes it
much, much faster.


On Thu, Apr 17, 2008 at 11:59 AM, Ian Mallett [EMAIL PROTECTED] wrote:

 [casey talked about complexity]
 
 This is precisely the problem I have run into in one of my in-dev
 games--iterating over large arrays once per frame.  Actually, it is
 basically a collision detection algorithm--I have two arrays, both
 containing 3D points.  The points in one array must be tested with the
 points in the other to see how close they are.  If they are close enough,
 there is a collision.  Naturally, this means that for every point in one
 array, the other array must be iterated through and the 3D pythagorean
 theorem performed to each tested point.


Sounds like your approach is O(N^2). If most points aren't close enough to
do the collision, partitioning to make it so you don't even have to do the
check will do wonders.


Re: [pygame] Python and Speed

2008-04-17 Thread Casey Duncan

On Apr 17, 2008, at 12:26 PM, Ian Mallett wrote:
On Thu, Apr 17, 2008 at 12:15 PM, Casey Duncan [EMAIL PROTECTED]  
wrote:Note this is not the most efficient way to do this, using a  
partitioned space you may be able to avoid comparing most points  
with one another most of the time. To do this in 2D you could use  
quad-trees, in 3D you could use oct-trees. See: http://en.wikipedia.org/wiki/Octree
Yes, I've tried this, but there are issues with points being in two  
separate places.  For example, if the collision radius is 5, and it  
is 3 away from the edge, then all the points in the neighboring  
trees must be tested.


Partitioned space is certainly a more complex algorithm, but so long  
as all of your points (spheres?) are not close together, it is  
usually vastly more efficient. If the partition size is optimal, than  
the vast majority of particles will not be hitting the edge of a  
partition, that will be an edge-case (pun intended). Even for those  
that are it's usually still faster than the naive O(N^2) method that  
compares every point with every other.


This algorithm is only effective if the space is large relative to the  
collision geometries and they tend not to be clumped very close  
together.


-Casey


Re: [pygame] Python and Speed

2008-04-17 Thread Ian Mallett
On Thu, Apr 17, 2008 at 12:39 PM, Brian Fisher [EMAIL PROTECTED]
wrote:

 I actually think the line of thinking I read in the comment above
 (thinking that I shouldn't have to optimize things, because the stupid
 language is slow) is in fact a counterproductive attitude in application
 developement, and misses the real point.

Obviously it is the problem of the programmer--it is the programmer who
programs his program, not Python.  Python just executes it.  But the fact
is, it makes a programmer's job easier if he has unlimited power to work
with.  Currently, I find myself having to stretch Python's limits, and, as
you say, find optimizations. Programming is fun, but rewriting code so that
you can meet a reasonable performance benchmark is not.

 It would be nice of course if python ran much much faster, but it runs
 slow largely because it is designed to give you the flexibility to code
 complex things much easier.

I like this aspect of Python--its flexibility, but I object to the lack of
speed.  I want it all--flexibility and speed.

 If you don't want that flexibility, then you need to turn it off with
 pyrex and extensions and all that kind of stuff.

I actually can't think of any situation where I would really need to do
that.

 However sometimes that flexibility actually lets you code more efficient
 approaches to begin with.

...because the code is more clear.  Better looking code usually runs faster,
because clear code allows one to see any performance sucking bugs...

 Ultimately all slowness is the programmers problem, not the tools['].

Of course.  The programmer is the one who makes the program.  The users
would complain to the programmer, not Python, and, uh, they do.

 If a particular tool is the best to help you solve the problem, then it
 should be used. With python, coolness is always on, so it's cheap to use
 coolness. C++ was designed to make you not pay for anything you don't use,
 which means coolness is default off, which means it's really hard to use
 coolness.

 ...to get to brass tacks though, I've found that the majority of the real
 slowness in _anything_ I code is due to the approach I take, and much less
 so due to the code. For example, pairwise collision between a set of
 objects. If every object needs to be checked against every object, that's an
 n-squared problem. Get 1000 items, that's 1,000,000 collision checks. But
 lets say I do object partitioning, or bucketing or something where I
 maintain sortings of the objects in a way that lets me only check items
 against ones that are close to it, and I either get log(n) partitioning or
 maybe I get at most about 10 items per bucket (both very achieveable goals).
 Now it means I only do about 10,000 (10*1000) collision checks for the same
 real work being done.

This is work that must be done.  To do this in my case would be somewhat
complicated, as I would need interpartition testing, boundary testing on the
partitions and on each point, and various other modifications.  Of course
the code could be made faster, but this is something that I would have to do
to get this program functioning at a good speed.  Why not make Python
faster, making such annoying modifications unnecessary and speeding up all
Python in the process?

 So lets say that my python collision code takes 100 times as long as my
 c++ collision code - that means if I do the optimization in python, I can
 get the python code to go just as fast as the C code without the
 optimization. Not only that - lets say I decide I want to step stuff up to
 10,000 items with pairwise collision - now it's 100,000,000 checks vs. like
 say 100,000 based on the approach - now python can actually be 10 times
 faster.

That's an optimization which takes time and effort to implement.  A C
programmer very often has no need to do such optimizations, though he works
with code I find horrid by comparison.

 So now the issue becomes whats the cost of writing the more efficient
 approach in python code vs. writing the naive approach in c++ code. If you
 think you get enough programmer benefits from working in python to make
 those 2 costs equal, and the performance of either is good enough, python is
 the better choice. Not only that, once you've got good approaches written in
 python that are stable and you don't need the coolness/flexibility, it
 becomes much easier to port the stuff to C, or pyrex it or whatever makes it
 much, much faster.

The whole point of using Python, for me, is that it is far more flexible and
programmer-friendly than anything else I've seen.  I don't want to have to
make a choice between Python and C just on a matter of speed--Python should
be the clear choice.  They should be equal in speed, but Python is easier to
use.  Obvious choice?  Python.

 Sounds like your approach is O(N^2). If most points aren't close enough to
 do the collision, partitioning to make it so you don't even have to do the
 check will do wonders.

Again, this problem is merely an 

Re: [pygame] Python and Speed

2008-04-17 Thread FT
Hi!

I am just making an observation on this and objects, maybe I am missing
the point, but when checking collisions, if you know your objects size, the
vertex, or point depending on direction, could you not not solve this by
just the direct line between the 2 objects and not the surface?

What I mean, you are in control of your game, you know your objects,
thus, you also know the point that will be hit depending on the direction
you are traveling in, so why not just check that point or small collection
of points?

That is what I would do when running this kind of game. For my Star Trek
game does not use the actual pixel, but the area in which it is in and that
is a larger area. But when using pixel points, then only the points that
fall in-line with the direction you are traveling in. That seems to me to a
much faster check, little to almost 1 single point is what seems to be the
result in this...

In other words you know your objects and where they are, then just draw
a straight line between them and calculate the edge at that line drawn. I am
not saying draw a line, just calculate to the edge of the object from
both...

Bruce



On Apr 17, 2008, at 12:26 PM, Ian Mallett wrote:
 On Thu, Apr 17, 2008 at 12:15 PM, Casey Duncan [EMAIL PROTECTED]
 wrote:Note this is not the most efficient way to do this, using a
 partitioned space you may be able to avoid comparing most points
 with one another most of the time. To do this in 2D you could use
 quad-trees, in 3D you could use oct-trees. See:
http://en.wikipedia.org/wiki/Octree
 Yes, I've tried this, but there are issues with points being in two
 separate places.  For example, if the collision radius is 5, and it
 is 3 away from the edge, then all the points in the neighboring
 trees must be tested.

Partitioned space is certainly a more complex algorithm, but so long
as all of your points (spheres?) are not close together, it is
usually vastly more efficient. If the partition size is optimal, than
the vast majority of particles will not be hitting the edge of a
partition, that will be an edge-case (pun intended). Even for those
that are it's usually still faster than the naive O(N^2) method that
compares every point with every other.

This algorithm is only effective if the space is large relative to the
collision geometries and they tend not to be clumped very close
together.

-Casey



Re: [pygame] Python and Speed

2008-04-17 Thread Ian Mallett
On Thu, Apr 17, 2008 at 12:45 PM, Casey Duncan [EMAIL PROTECTED] wrote:

 Partitioned space is certainly a more complex algorithm, but so long as
 all of your points (spheres?)

Yes, one can think of one array as points and the other as spheres.

 are not close together, it is usually vastly more efficient. If the
 partition size is optimal, than the vast majority of particles will not be
 hitting the edge of a partition, that will be an edge-case (pun intended).
 Even for those that are it's usually still faster than the naive O(N^2)
 method that compares every point with every other.

 This algorithm is only effective if the space is large relative to the
 collision geometries and they tend not to be clumped very close together.

In 3D space, there is a good deal of room, so I can see this being
effective.  But again, this is only one example of countless slowdowns I've
had.  Let's fix all of them in one fell swoop, no?  The program is slowing
down because the computer is processing the program relatively
inefficiently.  This is due to Python.  Programmers shouldn't have to
optimize their inefficiently executed code, the code should just be executed
efficiently.

 -Casey

Ian


Re: [pygame] Python and Speed

2008-04-17 Thread Ian Mallett
I'm not sure precisely what you mean...
Again, remember that this is an example, not the question.  The question is
How can Python be made Faster?  This is an example of one of the problems
resulting from Python's relative slowness.

Here's the example:
-There is a list of 3D points
-There is another list of 3D points.
-Every frame, for every point in the first list, if any point in the second
list is a certain 3D distance away, then there is a collision.

Ian


Re: [pygame] Python and Speed

2008-04-17 Thread Patrick Mullen
In that specific case, no matter which programming language you use,
your code will not be very fast.  Do you think programmers write
unoptomized code in c, and get speedy execution every time?  Have you
not ever used a program or played a game which ran slower than it
should, which was actually programmed in a fast language?  Optiomizing
code and improving algorithms have been around far longer than python,
and are an important part of programming in general.  As has been
mentioned before, there have been many attempts to optimize core
python, which have resulted in some improvements.  Python2.5 is
considerably faster than python1.0.  However, due to the nature of the
language it can only be optimized so much.  The best bet really, is to
write code in c that needs to be fast, and call that code from python,
using python as a glue language.  This can be accomplished using
implementations that already exist (pyode, numpy) or writing a new
implementation and exposing it with pyrex or other linking programs.

There is no magic bullet that will make python faster, and it's not
for lack of trying.  Even at it's most optimized that could
theoretically be done, I don't think pure python will ever be as fast
as c.  I do hope it gets close, but even if this were to be the case,
your collision detection code will still be slow as heck.  Culling
algorithms for this purpose were invented to speed up applications
written in C after all :)


Re: [pygame] Python and Speed

2008-04-17 Thread Richard Jones
On Thu, 17 Apr 2008, you wrote:
 No, this is the place to discuss it because if we wish to make games,
 work with existing platforms, and want speed, that is the way to go. Now
 that we have had this discussion, and found solutions, now we have a list
 of ways to resolve it.

Sure, discuss ways to work with the current interpreter, but it's quite 
pointless talking about changes to the interpreter itself. I'm almost certain 
none of the core devs are on this list, and the python-dev list is a much 
more appropriate place to discuss it anyway :)


Richard


Re: [pygame] Python and Speed

2008-04-17 Thread Greg Ewing

Patrick Mullen wrote:

Also, if you are using sqrt for your
distance check, you are likely wasting cpu cycles, if all you need to
know is whether they are close enough.


Also note that if you do need to compare exact Euclidean
distances for some reason, you can avoid square roots
by comparing the squares of the distances instead of
the distances themselves.

--
Greg


Re: [pygame] Python and Speed

2008-04-17 Thread Greg Ewing

Ian Mallett wrote:


Here's the example:
-There is a list of 3D points
-There is another list of 3D points.
-Every frame, for every point in the first list, if any point in the 
second list is a certain 3D distance away, then there is a collision.


The responses to it also provide a good example of a
very important principle: Often, using a better algorithm
can give you much greater gains than speeding up the
implementation of the one you're using.

So when something is too slow, the first thing you
should ask is: Does there exist a better algorithm
for what I'm trying to do?

--
Greg


Re: [pygame] Python and Speed

2008-04-17 Thread Greg Ewing

René Dudfield wrote:


- SIMD instructions are the fast ones...


It's doubtful there's much in the Python core that would
benefit from SIMD, though. Most of what it does doesn't
involve doing repetitive operations on big blocks of
data.

--
Greg


Re: [pygame] Python and Speed

2008-04-17 Thread Greg Ewing

Ian Mallett wrote:
How do you write an extension module in C and call it from Python?   
Someone gave some instructions earlier, but I found them too vague...


Another way I forgot to mention earlier is to use the
ctypes module (I often forget about it, because it wasn't
in the stdlib until very recently.)

That allows you to call compiled routines from a shared
library directly without having to write any C. It's
less efficient, though, as it has to go through some
Python wrapper objects to get there, and also more
dangerous, because you can easily crash the interpreter
if you don't get everything exactly right.

--
Greg


Re: [pygame] Python and Speed

2008-04-17 Thread Greg Ewing

Ian Mallett wrote:

Yes, I've tried this, but there are issues with points being in two 
separate places.  For example, if the collision radius is 5, and it is 3 
away from the edge, then all the points in the neighboring trees must be 
tested. 


Rather than a tree, you may be just as well off using a regular
array of cells. That makes it much easier to find the neighbouring
cells to test, and there's also less overhead from code to manage
and traverse the data structure.

The only time you would really need a tree is if the distribution
of the objects can be very clumpy, so that you benefit from an
adaptive subdivision of the space.

Another possibility to consider is instead of testing neighbouring
cells, insert each object into all cells that are within the
collision radius of it. That might turn out to be faster if the
objects don't move very frequently.

--
Greg


Re: [pygame] Python and Speed

2008-04-17 Thread Gary BIshop
ctypes is great stuff! I find it much harder to crash the interpreter 
with ctypes than with extensions I've developed and debugged. It is 
quite resilient. I've used it to interface with the Windows API to 
simulate keystrokes, to interface to a USB Digital IO interface, in a 
wrapper for the huge OpenCV library, to set the background image on my 
desktop, to adjust the system volume control, to interface to the 
Wiimote, and to wrap eSpeak to name a few. I've pretty much given up on 
swig and pyrex.


gb

Greg Ewing wrote:

Ian Mallett wrote:
How do you write an extension module in C and call it from Python?   
Someone gave some instructions earlier, but I found them too vague...


Another way I forgot to mention earlier is to use the
ctypes module (I often forget about it, because it wasn't
in the stdlib until very recently.)

That allows you to call compiled routines from a shared
library directly without having to write any C. It's
less efficient, though, as it has to go through some
Python wrapper objects to get there, and also more
dangerous, because you can easily crash the interpreter
if you don't get everything exactly right.



Re: [pygame] Python and Speed

2008-04-17 Thread FT

Hi!

Another thought, same as before but adding the other comment about bins.
If your object is heading in a certain direction and you know the surface
point of that object, now make a dictionary key point for it. Same for all
other objects, knowing there direction.
key=str(x)+str(-y)+str(-z) #Keeping all as integer values in a
coordinate grid.

The only thing you compare are the keys. Then you will say, wait a
minute, not both are going to collide at that point!

OK, then you know the direction of both and the point where they would
in fact collide. Now that point will be approaching both at the same point
in that straight line.

The vertex of the point may change where they intersect, but keep taking
the key value of that point because both will always have that vertex point
on that point on the surface.

Keep updating the dictionary key and compare the value for both with the
if key in...

That point is a key value and easy to check. It is not the surface, just
the intersection point of both straight lines of the object traveling
through free space...

So you will be checking 2 keys for 2 objects, 3 keys for 3 objects, and
so on...

The only addition to this is if you have an object that is not perfect
shape, like a boulder, where that outer edge will change depending on the
angle of your straight line to the vertex or intersection point.

Not checking surfaces, just checking the intersection point of the line,
for both will have to meet and the distance to that objects surface will
also be that straight line distance for the object center, which will still
direct you to the vertex/intersection point.

Both surface points that meet, have same key value, will also be the
collision point when they do in fact collide...

So all points will match, just another observation and thought in this
faster and faster check of objects.

For I do something like this with my primitive Battleship game. Instead
of an array, just keys for where an object part is located. When the missile
or shell hits that key it matches the objects location at that same key.
No completely filled huge array, just points.

When doing this you have up to 3 points to calculate: point of surface
of object 1, point on surface of object 2 and then the intersection point of
the vector of both objects. When all 3 points match with the same key, you
have your collision! Like I said before, the updating is the vector angle,
surface location and the intersection point of that straight line of the
vectors of both objects.

Bruce


René Dudfield wrote:

 - SIMD instructions are the fast ones...

It's doubtful there's much in the Python core that would
benefit from SIMD, though. Most of what it does doesn't
involve doing repetitive operations on big blocks of
data.

--
Greg



Re: [pygame] Python and Speed

2008-04-17 Thread Greg Ewing

Ian Mallett wrote:
Programmers shouldn't have to 
optimize their inefficiently executed code, the code should just be 
executed efficiently.


Even if your inefficient algorithm is being executed as fast
as possible, it's still an inefficient algorithm, and you
will run into its limitations with a large enough data set.
Then you will have to find a better algorithm anyway.

Part of the skill of being a good programmer is having
the foresight to see where such performance problems are
likely to turn up further down the track, and choosing
an algorithm at the outset that at least isn't going
to be spectacularly bad.

Doing this saves you work in the long run, since you
spend less time going back and re-coding things.

--
Greg


Re: [pygame] Python and Speed

2008-04-17 Thread Greg Ewing

Devon Scott-Tunkin wrote:

would you set out 2d partitioning the
screen in pygame by making say 4 invisible sprites
with rects ...  or by just using x,y values


Just use coordinates. Sprites are totally unnecessary,
since they're not something that appears on the screen.

--
Greg


Re: [pygame] Python and Speed

2008-04-17 Thread Ian Mallett
On Thu, Apr 17, 2008 at 3:03 PM, Greg Ewing [EMAIL PROTECTED]
wrote:

 Patrick Mullen wrote:

  Also, if you are using sqrt for your
  distance check, you are likely wasting cpu cycles, if all you need to
  know is whether they are close enough.

 Nope; in this case, the calculations' margin of error must be very small.


 Also note that if you do need to compare exact Euclidean
 distances for some reason, you can avoid square roots
 by comparing the squares of the distances instead of
 the distances themselves.

I already do that.
On Thu, Apr 17, 2008 at 4:02 PM, Greg Ewing [EMAIL PROTECTED]
wrote:

 Rather than a tree, you may be just as well off using a regular
 array of cells. That makes it much easier to find the neighbouring
 cells to test, and there's also less overhead from code to manage
 and traverse the data structure.

I meant that.  What is the difference between a tree and a cell?  Cells are
regular?
Anyway, I had planned to do cells.

 The only time you would really need a tree is if the distribution
 of the objects can be very clumpy, so that you benefit from an
 adaptive subdivision of the space.

 Another possibility to consider is instead of testing neighbouring
 cells, insert each object into all cells that are within the
 collision radius of it. That might turn out to be faster if the
 objects don't move very frequently.

I like that idea.
Still, the objects can and do move.
On Thu, Apr 17, 2008 at 4:23 PM, Greg Ewing [EMAIL PROTECTED]
wrote:

 There are an extremely large number of modifications
 that could be made to Python. Only a very small number
 of them will result in any improvement, and of those,
 all the easy-to-find ones have already been found.

The harder ones must then be attacked--solving a difficult speed issue might
save the tiresome implementation of optimizations on the part of hundreds of
users.

 If you want to refute that, you're going to have to
 come up with an actual, specific proposal, preferably
 in the form of a code patch together with a benchmark
 that demonstrates the improvement. If you can't
 do that, you're not really in a position to make
 statements like it can't be that hard.

Like I said, because I am the programmer, not the Python modifier, it is my
job to make the programs run fast.  By can't be that hard, I mean that if
C++ can do it, Python should be able to too.  Obviously, I don't know how
Python is structured, and I doubt I have the experience of the people on the
Python team, but if I can make optimizations in my code, they should be able
to make modifications in Python.

 If wishing could make it so, Python would already
 be blazingly fast!

Ian is wishing...
On Thu, Apr 17, 2008 at 4:32 PM, Greg Ewing [EMAIL PROTECTED]
wrote:

 Even if your inefficient algorithm is being executed as fast
 as possible, it's still an inefficient algorithm, and you
 will run into its limitations with a large enough data set.
 Then you will have to find a better algorithm anyway.

 Part of the skill of being a good programmer is having
 the foresight to see where such performance problems are
 likely to turn up further down the track, and choosing
 an algorithm at the outset that at least isn't going
 to be spectacularly bad.

 Doing this saves you work in the long run, since you
 spend less time going back and re-coding things.

Not necessarily.  I've had situations where I've decided to do something,
then draft-coded it, then decided that the game feature wasn't necessary,
was the wrong approach, or simply to scrap the entire project.  If I had
decided to spend the time to code something with all of the optimizations,
it would take longer.  When I delete it, that extra effort would have been
worthless.  Once a feature is implemented, tested, and so on, I decide if it
is needed, then add optimizations and comments.  Because optimizations often
rewrite code in a more efficient way, the original code works as a guideline
for the operation.  All this saves me effort in the long-run; but I can't
speak for anyone else.

 Greg

Ian


Re: Re: [pygame] Python and Speed

2008-04-17 Thread Richard Jones
Ian Mallett [EMAIL PROTECTED] wrote:
 ... if C++ can do it, Python should be able to too.  Obviously, I don't know 
 how Python is structured ...

Then please learn more about CPython's internals and the general problem of 
optimising a dynamic language like Python. The CPython code is incredibly 
well-written and easy to understand, even with the various complications that 
exist due to current optimisations* and there's plenty of academic research 
into dynamic languages out there. I'm sure the core devs would welcome your 
patches with open arms!


I'll leave you with Greg's wisdom, which perhaps needs repeating:

  If wishing could make it so, Python would already be blazingly fast!


 Richard

* for example the double dict-lookup implementations that seamlessly replace 
the string-only lookup with an arbitrary-object lookup on the first non-string 
lookup, or the function-call frame caching mechanism that I had a hand in 
implementing...


Re: [pygame] Python and Speed

2008-04-16 Thread Ian Mallett
On Wed, Apr 16, 2008 at 6:43 PM, Dan Krol [EMAIL PROTECTED] wrote:

 Are you familiar with the handful of ways to optimize parts of your code?

I've used Psyco to great effect, but all of these extra modules seem to be
treating the symptoms, not the problem.  The problem is flat-out and
final--Python is slow, especially compared to C languages.

 Swig is the classic one I think, it wraps some C source into a python
 module.

 Pyrex does a similar thing, but it lets you wrap it yourself with a
 python-esque language; it lets you use python types and C types within
 the same pyrex file, so you have control over when things get
 converted between them.

 Cython is Pyrex plus more features, but they're less conservative than
 python as far stability (I think). Though their opinion is that pyrex
 is too conservative. Check it out.

 I think Boost somehow does it too, you'll have to look that up.

I should look into these for the time being.

 I've never really used any of these other than to test it out though.

Thanks,
Ian


Re: [pygame] Python and Speed

2008-04-16 Thread Aaron Maupin

Ian Mallett wrote:

I feel like Python is living down to its namesake's pace...


Ah yes, the speedy Monty Python.



Re: [pygame] Python and Speed

2008-04-16 Thread Ian Mallett
On Wed, Apr 16, 2008 at 6:59 PM, Aaron Maupin [EMAIL PROTECTED] wrote:

 Ah yes, the speedy Monty Python.

I love Monty Python!

I was referring to the general lethargic natural of the snake.

Ian


Re: [pygame] Python and Speed

2008-04-16 Thread René Dudfield
hi,

Each release of python gets a little faster... but not massively.  It
really needs to get 10-20x faster - but generally only gets up to 1.2x
faster with each release.

There's also work on things like pypy - which might one day be quite
fast.  I think pypy will drive Cpython to get faster through
competition - and ideas.  An example of this recently happening is the
method cache (which I think the idea actually came from tinypy...).
The method cache was shown to work well with pypy, and then Cpython
added the idea.  If pypy becomes faster, then I think the Cpython
people will try harder to make Cpython faster too.

However mainly it's good to try and make highly reusable, and fast
basic building blocks - and then glue them together with python.

For example, if you see something that most pygame games would get
faster with, then add it to pygame.  Or to SDL, or to python.  If the
drawing part of the game takes 10% less time, that leaves 10% of time
for game code.  As examples in the last pygame release - The
pygame.sprite.LayeredDirty sprite work, the threading work, and
modifications to some functions to allow reusing surfaces(eg
transform.scale) in the last pygame should make a lot of pygame games
quicker.  For SDL the blitters have been optimized with mmx, and
altivec assembly - and the upcoming SDL 1.3 can optionally use opengl,
and direct3d hardware acceleration.

Also the included PixelArray should allow you to do a lot of things
quicker - and you can rely on it to be included with pygame(unlike
numeric/numpy).  We hope to have fast vector, and matrix types
included at some point in the future too.


If you've got any ideas for reusable speed ups - we'll gladly consider
them in pygame.


cheers,




On Thu, Apr 17, 2008 at 11:36 AM, Ian Mallett [EMAIL PROTECTED] wrote:
 Recently, I've been having issues in all fortes of my programming experience
 where Python is no longer fast enough to fill the need adaquately.  I really
 love Python and its syntax (or lack of), and all the nice modules (such as
 PyGame) made for it.  Are there any plans to improve Python's speed to at
 least the level of C languages?  I feel like Python is living down to its
 namesake's pace...
  Ian



Re: [pygame] Python and Speed

2008-04-16 Thread Richard Jones
Ian Mallett [EMAIL PROTECTED] wrote:
  Are there any plans to improve Python's speed to 
 at least the level of C languages?

This isn't really the best forum for asking such a question. I would recommend 
asking on the general Python mailing list / newsgroup (comp.lang.python on 
http://www.python.org/community/lists/).

I think I speak for all Python developers when I say that we'd love for the 
language to run faster. And of course the large body of core CPython developers 
are aware of this. I've personally attended a sprint in Iceland during which we 
spent a week solely focused on speeding up the CPython interpreter.

There's just not much that can be done with the current CPython implementation 
to make it faster.

Thus it falls to you as a developer to choose your implementation strategy 
wisely:

1. pick sensible libraries that handle large amounts of processing for you 
(whether that be numeric or graphic)
2. where there is no existing library, you may need to code speed-critical 
parts of your application using C, or the more programmer-friendly Pyrex (er, 
Cython these days I believe :)


 Richard


Re: [pygame] Python and Speed

2008-04-16 Thread FT

Hi Ian,

I think what you are saying and I agree, is that when someone has fixed
something by going back to C code, then why not make a module for that code.
Thus all you do is insert the C code using a Python/Pygame module name...

But slowing down is when it uses the Python interpreter, but why not the
C interpreter? Or make Python code that uses that format, but runs under the
C interpreter? After all, it is all about ease in writing, higher level
language using the lower level code under just a different name for
translation, but normal C code once interpreted or translated...


Bruce


Ian Mallett [EMAIL PROTECTED] wrote:
  Are there any plans to improve Python's speed to
 at least the level of C languages?

This isn't really the best forum for asking such a question. I would
recommend asking on the general Python mailing list / newsgroup
(comp.lang.python on http://www.python.org/community/lists/).

I think I speak for all Python developers when I say that we'd love for the
language to run faster. And of course the large body of core CPython
developers are aware of this. I've personally attended a sprint in Iceland
during which we spent a week solely focused on speeding up the CPython
interpreter.

There's just not much that can be done with the current CPython
implementation to make it faster.

Thus it falls to you as a developer to choose your implementation strategy
wisely:

1. pick sensible libraries that handle large amounts of processing for you
(whether that be numeric or graphic)
2. where there is no existing library, you may need to code speed-critical
parts of your application using C, or the more programmer-friendly Pyrex
(er, Cython these days I believe :)


 Richard


--
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.23.0/1381 - Release Date: 4/16/2008
9:34 AM




Re: [pygame] Python and Speed

2008-04-16 Thread Ian Mallett
On Wed, Apr 16, 2008 at 7:46 PM, FT [EMAIL PROTECTED] wrote:

 Hi Ian,

I think what you are saying and I agree, is that when someone has fixed
 something by going back to C code, then why not make a module for that
 code.
 Thus all you do is insert the C code using a Python/Pygame module name...

So I have a C file
TheWeirdAndBoringNameForATitleBecauseICantThinkOfAGoodModuleNameRightNow.c
and I go:
import
TheWeirdAndBoringNameForATitleBecauseICantThinkOfAGoodModuleNameRightNow
#later
TheWeirdAndBoringNameForATitleBecauseICantThinkOfAGoodModuleNameRightNow.#function
?

But slowing down is when it uses the Python interpreter, but why not
 the
 C interpreter? Or make Python code that uses that format, but runs under
 the
 C interpreter? After all, it is all about ease in writing, higher level
 language using the lower level code under just a different name for
 translation, but normal C code once interpreted or translated...

All that is very true, particularly After all, it is all about ease in
writing, higher level language using the lower level code under just a
different name.  I could certainly live with keeping Python based on C as
long as it is still fast, like with a C interpreter or what-not, though it
will mean Python's continued dependence on C and no chance for
competition...
Ian


Re: [pygame] Python and Speed

2008-04-16 Thread Greg Ewing

Ian Mallett wrote:
Why not?  If C is faster, why can't Python be equally so?  If we assume 
that C is the fastest a language available, then that implies that 
Python /could/ be faster should it adopt the same procedures as C.


But adopting the same procedures as C would mean becoming
a statically-typed low-level language -- losing most of the
things that make Python nicer to program in than C!

C is fast because it's designed to be easily translatable
into very efficient machine code. When a C compiler sees

   int a, b, c;
   a = b + c;

it knows that a, b and c are all integers that fit in a
machine word, and it knows their memory addresses, so it
generates about 2 or 3 instructions to do the job.

However, when a Python implementation sees

   a = b + c

it has no idea what types of objects b and c will refer to
at run time. They might be ints, they might be long ints
that don't fit in a machine word, they might be strings,
they might be some custom object of your own with an
__add__ method. And they may be different for different
executions of that statement.

So every time the statement is executed, the names b
and c need to be looked up (namespaces can dynamically
gain or lose names, so we can't assume they're always
at a particular address), examine the types of the objects
they refer to, figure out what operation needs to be
done, and carry it out. Doing all that takes a great
many more machine instructions than adding a couple of
ints in C.

There are various ways of addressing the speed issue
with dynamic languages. One is what's known as type
inferencing, where the compiler examines the whole
program, thinks about it very hard, and tries to convince
itself that certain variables can only ever hold values
of certain types; specialised code is then generated
based on that. This works best in languages that are
designed for it, e.g. Haskell; applying it post-hoc
to a dynamic language such as Python tends to be much
harder and not work so well.

Another is to use just-in-time techniques, where
you look at what types are actually turning up at run
time and generated specialised code for those cases.
Psycho is an attempt to apply this idea to Python;
reportedly it can produce useful speedups in some
cases.

There's a project around called PyPy which is using
type inferencing and various other strange and wonderful
techniques in an attempt to produce a self-hosted
Python implementation, i.e. written entirely in
Python. Last I heard, it was still somewhat slower
than CPython.

I've gathered that Python is based on C code, so it translates your code 
into C code on the fly.


No, it doesn't translate it into C, it just executes it
directly.

There is a translation step of sorts, into so-called
bytecodes, which are instructions for a virtual machine.
But the instructions are very high-level and correspond
almost one-for-one with features of the Python language.
The interpreter then executes these virtual instructions.

The CPython interpreter happens to be written in C, but
it could have been written in any language that can be
compiled to efficient machine code, and the result would
be much the same.

 How can you run a C file from a Python script?

You can't, not directly. You need to write what's
known as an extension module, which is a wrapper written
in C that bridges between the worlds of Python objects
and C data types. You then compile this into a dynamically
linked object file, which can be imported as though it
were a Python module.

Writing an extension module by hand is rather tedious
and not for the faint of heart. It has to handle all
conversion between Python objects and C data, manage
the reference counts of all Python objects it deals
with, and be scrupulous about checking for errors
and reporting them. If you want to get an idea of
what's involved, have a look at the Extending and
Embedding and Python/C API sections of the Python
documentation.

Fortunately, there are a variety of tools available
to make the task easier, such as SWIG, Boost Python
(for C++), Pyrex and Cython.

If you want to try this, my personal recommendation
would be Pyrex, although this is not exactly an
unbiased opinion, given that I wrote it. :-)

http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/

--
Greg


Re: [pygame] Python and Speed

2008-04-16 Thread Ian Mallett
On Wed, Apr 16, 2008 at 9:10 PM, Greg Ewing [EMAIL PROTECTED]
wrote:

 There are various ways of addressing the speed issue
 with dynamic languages. One is what's known as type
 inferencing, where the compiler examines the whole
 program, thinks about it very hard, and tries to convince
 itself that certain variables can only ever hold values
 of certain types; specialised code is then generated
 based on that. This works best in languages that are
 designed for it, e.g. Haskell; applying it post-hoc
 to a dynamic language such as Python tends to be much
 harder and not work so well.

I realised this problem when I first thought of it, but I didn't know it had
already been tried.  Oh well.

 Another is to use just-in-time techniques, where
 you look at what types are actually turning up at run
 time and generated specialised code for those cases.
 Psycho is an attempt to apply this idea to Python;
 reportedly it can produce useful speedups in some
 cases.

Psyco.  I have used it to great effect.  For example, the speed of the
particles demo in my PAdLib is almost completely due to Psyco.  Indeed, it
approximately doubles the demo's speed--or I just give it twice as many
particles to chew on.

 No, it doesn't translate it into C, it just executes it
 directly.

 There is a translation step of sorts, into so-called
 bytecodes, which are instructions for a virtual machine.
 But the instructions are very high-level and correspond
 almost one-for-one with features of the Python language.
 The interpreter then executes these virtual instructions.

 The CPython interpreter happens to be written in C, but
 it could have been written in any language that can be
 compiled to efficient machine code, and the result would
 be much the same.

  How can you run a C file from a Python script?

 You can't, not directly. You need to write what's
 known as an extension module, which is a wrapper written
 in C that bridges between the worlds of Python objects
 and C data types. You then compile this into a dynamically
 linked object file, which can be imported as though it
 were a Python module.

 Writing an extension module by hand is rather tedious
 and not for the faint of heart. It has to handle all
 conversion between Python objects and C data, manage
 the reference counts of all Python objects it deals
 with, and be scrupulous about checking for errors
 and reporting them. If you want to get an idea of
 what's involved, have a look at the Extending and
 Embedding and Python/C API sections of the Python
 documentation.

 Fortunately, there are a variety of tools available
 to make the task easier, such as SWIG, Boost Python
 (for C++), Pyrex and Cython.

 If you want to try this, my personal recommendation
 would be Pyrex, although this is not exactly an
 unbiased opinion, given that I wrote it. :-)

Like I said, these are all good options I should look into.

 http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/

 --
 Greg

Ian


Re: [pygame] Python and Speed

2008-04-16 Thread Greg Ewing

René Dudfield wrote:


2. - asm optimizations.  There seems to be
almost no asm optimizations in CPython.


That's a deliberate policy. One of the goals of CPython
is to be very portable and written in a very straightforward
way. Including special pieces of asm for particular
architectures isn't usually considered worth the
maintenance effort required.


CPython could use faster threading
primitives, and more selective releasing of the GIL.


Everyone would love to get rid of the GIL as well, but
that's another Very Hard Problem about which there has
been much discussion, but little in the way of workable
ideas.


A way to know how much memory is being used.
Memory profiling is the most important way to optimize since memory
is quite slow compared to the speed of the cpu.


Yes, but amount of memory used doesn't necessarily
have anything to do with rate of memory accesses.
Locality of reference, so that things stay in the
cache, is more important.


perhaps releasing
a patch with a few selected asm optimizations might let the python
developers realise how much faster python could be...


Have you actually tried any of this? Measurement
would be needed to tell whether these things address
any of the actual bottlenecks in CPython.

 a slot int attribute takes up 4-8 bytes, whereas a python int

attribute takes up (guessing) 200 bytes.


Keep in mind that the slot only holds a reference --
the actual int object still takes up memory elsewhere.
Slots do reduce memory use somewhat, but I wouldn't
expect that big a ratio.

--
Greg