[pygame] Python - Pygame - PyOpenGL performance

2009-02-26 Thread Zack Schilling
I know the PyOpenGL mailing list might be a better place to ask this  
question, but I've had a lot of luck talking to the experienced people  
here so I figured I'd try it first.


I'm trying to migrate a game I created from using the Pygame / SDL  
software rendering to OpenGL. Before attempting the massive and  
complex conversion involved with moving the whole game, I decided to  
make a little test program while I learned OpenGL.


In this test, I set up OpenGL to work in 2D and began loading images  
into texture objects and drawing textured quads as sprites. I created  
a little glSprite class to handle the drawing and translation. At  
first its draw routine looked like this:


glPushMatrix()
glTranslate(self.positionx,self.positiony,0)
glBindTexture(GL_TEXTURE_2D, self.texture)
glBegin(GL_QUADS)
glTexCoord2f(0, 1)
glVertex2f(0, 0)
glTexCoord2f(1, 1)
glVertex2f(w, 0)
glTexCoord2f(1, 0)
glVertex2f(w, h)
glTexCoord2f(0, 0)
glVertex2f(0, h)
glEnd()
glPopMatrix()

Note: self.texture is a texture ID of a loaded OpenGL texture object.  
My sprite class keeps a dictionary cache and only loads the sprite's  
image into a texture if it needs to.


I'd get maybe 200 identical sprites (same texture) onscreen and my CPU  
would hit 100% load from Python execution. I looked into what could be  
causing this and found out that it's probably function call overhead.  
That's 14 external library function calls per sprite draw.


The next thing I tried was to create a display list at each sprite's  
initialization. Then my code looked like this:

glPushMatrix()
glTranslate(self.positionx,self.positiony,0)
glCallList(self.displist)
glPopMatrix()

Well, that's nice, down to 4 calls per draw. I was able to push ~500  
sprites per frame using this method before the CPU tapped out. I need  
more speed than this. My game logic uses 30-40% of the CPU alone and  
I'd like to push at least 1000 sprites. What can I do? I've looked  
into passing sprites as a matrix with vertex arrays, but forming a  
proper vertex array with numpy can sometimes be more trouble than it's  
worth. Plus, I can't swap out textures easily mid-draw, so it makes  
things much more complex than the simple way I'm doing things now.


Is there any design pattern I could follow that will get me more speed  
without sending me off the deep end with complexity.


Thanks,

Zack


Re: [pygame] Python - Pygame - PyOpenGL performance

2009-02-26 Thread RB[0]
Well, most likely your main speed down is looping through all those sprites
as it is - trying just looping through them and calling a dummy function
instead of the opengl one - and see what happens - otherwise, that seems
quite odd to me, I haven't run into that before (are you using psyco,
perhaps?)

On Thu, Feb 26, 2009 at 1:04 PM, Zack Schilling zack.schill...@gmail.comwrote:

 I know the PyOpenGL mailing list might be a better place to ask this
 question, but I've had a lot of luck talking to the experienced people here
 so I figured I'd try it first.

 I'm trying to migrate a game I created from using the Pygame / SDL software
 rendering to OpenGL. Before attempting the massive and complex conversion
 involved with moving the whole game, I decided to make a little test program
 while I learned OpenGL.

 In this test, I set up OpenGL to work in 2D and began loading images into
 texture objects and drawing textured quads as sprites. I created a little
 glSprite class to handle the drawing and translation. At first its draw
 routine looked like this:

glPushMatrix()
glTranslate(self.positionx,self.positiony,0)
glBindTexture(GL_TEXTURE_2D, self.texture)
glBegin(GL_QUADS)
glTexCoord2f(0, 1)
glVertex2f(0, 0)
glTexCoord2f(1, 1)
glVertex2f(w, 0)
glTexCoord2f(1, 0)
glVertex2f(w, h)
glTexCoord2f(0, 0)
glVertex2f(0, h)
glEnd()
glPopMatrix()

 Note: self.texture is a texture ID of a loaded OpenGL texture object. My
 sprite class keeps a dictionary cache and only loads the sprite's image into
 a texture if it needs to.

 I'd get maybe 200 identical sprites (same texture) onscreen and my CPU
 would hit 100% load from Python execution. I looked into what could be
 causing this and found out that it's probably function call overhead. That's
 14 external library function calls per sprite draw.

 The next thing I tried was to create a display list at each sprite's
 initialization. Then my code looked like this:
glPushMatrix()
glTranslate(self.positionx,self.positiony,0)
glCallList(self.displist)
glPopMatrix()

 Well, that's nice, down to 4 calls per draw. I was able to push ~500
 sprites per frame using this method before the CPU tapped out. I need more
 speed than this. My game logic uses 30-40% of the CPU alone and I'd like to
 push at least 1000 sprites. What can I do? I've looked into passing sprites
 as a matrix with vertex arrays, but forming a proper vertex array with numpy
 can sometimes be more trouble than it's worth. Plus, I can't swap out
 textures easily mid-draw, so it makes things much more complex than the
 simple way I'm doing things now.

 Is there any design pattern I could follow that will get me more speed
 without sending me off the deep end with complexity.

 Thanks,

 Zack



Re: [pygame] Python - Pygame - PyOpenGL performance

2009-02-26 Thread RB[0]
Hmm, how are you loading your textures - GL_NEAREST or are you using LINEAR
or Mip mapped filtering?
Otherwise, only thing I can think of is you might have a cruddy card that is
dumping some odd functionality to your cpu instead of gpu?

On Thu, Feb 26, 2009 at 1:17 PM, Zack Schilling zack.schill...@gmail.comwrote:

 That was the first thing I tried. Using a dummy draw function that crunched
 a few numbers instead of doing the OpenGL calls. That worked fine and let me
 create thousands and thousands of sprites before the CPU tapped out.

 No I'm not using psyco or any other performance enhancer.

 -Zack



 On Feb 26, 2009, at 2:08 PM, RB[0] wrote:

  Well, most likely your main speed down is looping through all those
 sprites as it is - trying just looping through them and calling a dummy
 function instead of the opengl one - and see what happens - otherwise, that
 seems quite odd to me, I haven't run into that before (are you using psyco,
 perhaps?)

 On Thu, Feb 26, 2009 at 1:04 PM, Zack Schilling zack.schill...@gmail.com
 wrote:
 I know the PyOpenGL mailing list might be a better place to ask this
 question, but I've had a lot of luck talking to the experienced people here
 so I figured I'd try it first.

 I'm trying to migrate a game I created from using the Pygame / SDL
 software rendering to OpenGL. Before attempting the massive and complex
 conversion involved with moving the whole game, I decided to make a little
 test program while I learned OpenGL.

 In this test, I set up OpenGL to work in 2D and began loading images into
 texture objects and drawing textured quads as sprites. I created a little
 glSprite class to handle the drawing and translation. At first its draw
 routine looked like this:

   glPushMatrix()
   glTranslate(self.positionx,self.positiony,0)
   glBindTexture(GL_TEXTURE_2D, self.texture)
   glBegin(GL_QUADS)
   glTexCoord2f(0, 1)
   glVertex2f(0, 0)
   glTexCoord2f(1, 1)
   glVertex2f(w, 0)
   glTexCoord2f(1, 0)
   glVertex2f(w, h)
   glTexCoord2f(0, 0)
   glVertex2f(0, h)
   glEnd()
   glPopMatrix()

 Note: self.texture is a texture ID of a loaded OpenGL texture object. My
 sprite class keeps a dictionary cache and only loads the sprite's image into
 a texture if it needs to.

 I'd get maybe 200 identical sprites (same texture) onscreen and my CPU
 would hit 100% load from Python execution. I looked into what could be
 causing this and found out that it's probably function call overhead. That's
 14 external library function calls per sprite draw.

 The next thing I tried was to create a display list at each sprite's
 initialization. Then my code looked like this:
   glPushMatrix()
   glTranslate(self.positionx,self.positiony,0)
   glCallList(self.displist)
   glPopMatrix()

 Well, that's nice, down to 4 calls per draw. I was able to push ~500
 sprites per frame using this method before the CPU tapped out. I need more
 speed than this. My game logic uses 30-40% of the CPU alone and I'd like to
 push at least 1000 sprites. What can I do? I've looked into passing sprites
 as a matrix with vertex arrays, but forming a proper vertex array with numpy
 can sometimes be more trouble than it's worth. Plus, I can't swap out
 textures easily mid-draw, so it makes things much more complex than the
 simple way I'm doing things now.

 Is there any design pattern I could follow that will get me more speed
 without sending me off the deep end with complexity.

 Thanks,

 Zack





Re: [pygame] Python - Pygame - PyOpenGL performance

2009-02-26 Thread Casey Duncan
Immediate mode calls (glVertex et al) are the very slowest way to use  
OpenGL. In fact they are deprecated in OpenGL 3.0 and will eventually  
be removed.


The display list is better as you discovered, but you still are making  
a few OpenGL state changes per sprite, which is likely slowing you  
down. Also there is some overhead for the display list call, which  
makes them sub-optimal for just drawing a single quad.



   glPushMatrix()
   glTranslate(self.positionx,self.positiony,0)
   glCallList(self.displist)
   glPopMatrix()


You really need to batch the quads up into a few vertex arrays or vbos  
to stream them to the card in one go. pyglet has a high-level python  
sprite api that automates this for you fwiw.


-Casey

On Feb 26, 2009, at 11:04 AM, Zack Schilling wrote:

I know the PyOpenGL mailing list might be a better place to ask this  
question, but I've had a lot of luck talking to the experienced  
people here so I figured I'd try it first.


I'm trying to migrate a game I created from using the Pygame / SDL  
software rendering to OpenGL. Before attempting the massive and  
complex conversion involved with moving the whole game, I decided to  
make a little test program while I learned OpenGL.


In this test, I set up OpenGL to work in 2D and began loading images  
into texture objects and drawing textured quads as sprites. I  
created a little glSprite class to handle the drawing and  
translation. At first its draw routine looked like this:


   glPushMatrix()
   glTranslate(self.positionx,self.positiony,0)
   glBindTexture(GL_TEXTURE_2D, self.texture)
   glBegin(GL_QUADS)
   glTexCoord2f(0, 1)
   glVertex2f(0, 0)
   glTexCoord2f(1, 1)
   glVertex2f(w, 0)
   glTexCoord2f(1, 0)
   glVertex2f(w, h)
   glTexCoord2f(0, 0)
   glVertex2f(0, h)
   glEnd()
   glPopMatrix()

Note: self.texture is a texture ID of a loaded OpenGL texture  
object. My sprite class keeps a dictionary cache and only loads the  
sprite's image into a texture if it needs to.


I'd get maybe 200 identical sprites (same texture) onscreen and my  
CPU would hit 100% load from Python execution. I looked into what  
could be causing this and found out that it's probably function call  
overhead. That's 14 external library function calls per sprite draw.


The next thing I tried was to create a display list at each sprite's  
initialization. Then my code looked like this:

   glPushMatrix()
   glTranslate(self.positionx,self.positiony,0)
   glCallList(self.displist)
   glPopMatrix()

Well, that's nice, down to 4 calls per draw. I was able to push ~500  
sprites per frame using this method before the CPU tapped out. I  
need more speed than this. My game logic uses 30-40% of the CPU  
alone and I'd like to push at least 1000 sprites. What can I do?  
I've looked into passing sprites as a matrix with vertex arrays, but  
forming a proper vertex array with numpy can sometimes be more  
trouble than it's worth. Plus, I can't swap out textures easily mid- 
draw, so it makes things much more complex than the simple way I'm  
doing things now.


Is there any design pattern I could follow that will get me more  
speed without sending me off the deep end with complexity.


Thanks,

Zack




Re: [pygame] Python - Pygame - PyOpenGL performance

2009-02-26 Thread Zack Schilling
I'm on a Macbook Pro with a GeForce 8600M GT. Textures are loaded with  
GL_LINEAR. Here's the texture loading code:


# Create an OpenGL texture with it and place it in the system
glEnable(GL_TEXTURE_2D)
rgbadata = pygame.image.tostring(image, RGBA, True)
texture = glGenTextures(1)
glBindTexture(GL_TEXTURE_2D, texture)
glTexImage2D(GL_TEXTURE_2D, 0, 4, w, h, 0, GL_RGBA,  
GL_UNSIGNED_BYTE, rgbadata)

glTexParameter(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR)
glTexParameter(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR)

Casey Duncan has pretty much confirmed my suspicions. I guess I'll end  
up creating a sprite class that uses vertex arrays or vertex buffer  
objects and presses all the animation frames I need into one texture  
at loading time. Then I'll use a texture map array to move the texture  
around so that each sprite can animate independently.


Thanks for your help. If anyone else has any design patterns I might  
find useful, let me know. I'm very new to OpenGL and most tutorials  
teach you how to get things done, but never the proper way.


-Zack


On Feb 26, 2009, at 2:23 PM, RB[0] wrote:

Hmm, how are you loading your textures - GL_NEAREST or are you using  
LINEAR or Mip mapped filtering?
Otherwise, only thing I can think of is you might have a cruddy card  
that is dumping some odd functionality to your cpu instead of gpu?


On Thu, Feb 26, 2009 at 1:17 PM, Zack Schilling zack.schill...@gmail.com 
 wrote:
That was the first thing I tried. Using a dummy draw function that  
crunched a few numbers instead of doing the OpenGL calls. That  
worked fine and let me create thousands and thousands of sprites  
before the CPU tapped out.


No I'm not using psyco or any other performance enhancer.

-Zack



On Feb 26, 2009, at 2:08 PM, RB[0] wrote:

Well, most likely your main speed down is looping through all those  
sprites as it is - trying just looping through them and calling a  
dummy function instead of the opengl one - and see what happens -  
otherwise, that seems quite odd to me, I haven't run into that  
before (are you using psyco, perhaps?)


On Thu, Feb 26, 2009 at 1:04 PM, Zack Schilling zack.schill...@gmail.com 
 wrote:
I know the PyOpenGL mailing list might be a better place to ask this  
question, but I've had a lot of luck talking to the experienced  
people here so I figured I'd try it first.


I'm trying to migrate a game I created from using the Pygame / SDL  
software rendering to OpenGL. Before attempting the massive and  
complex conversion involved with moving the whole game, I decided to  
make a little test program while I learned OpenGL.


In this test, I set up OpenGL to work in 2D and began loading images  
into texture objects and drawing textured quads as sprites. I  
created a little glSprite class to handle the drawing and  
translation. At first its draw routine looked like this:


  glPushMatrix()
  glTranslate(self.positionx,self.positiony,0)
  glBindTexture(GL_TEXTURE_2D, self.texture)
  glBegin(GL_QUADS)
  glTexCoord2f(0, 1)
  glVertex2f(0, 0)
  glTexCoord2f(1, 1)
  glVertex2f(w, 0)
  glTexCoord2f(1, 0)
  glVertex2f(w, h)
  glTexCoord2f(0, 0)
  glVertex2f(0, h)
  glEnd()
  glPopMatrix()

Note: self.texture is a texture ID of a loaded OpenGL texture  
object. My sprite class keeps a dictionary cache and only loads the  
sprite's image into a texture if it needs to.


I'd get maybe 200 identical sprites (same texture) onscreen and my  
CPU would hit 100% load from Python execution. I looked into what  
could be causing this and found out that it's probably function call  
overhead. That's 14 external library function calls per sprite draw.


The next thing I tried was to create a display list at each sprite's  
initialization. Then my code looked like this:

  glPushMatrix()
  glTranslate(self.positionx,self.positiony,0)
  glCallList(self.displist)
  glPopMatrix()

Well, that's nice, down to 4 calls per draw. I was able to push ~500  
sprites per frame using this method before the CPU tapped out. I  
need more speed than this. My game logic uses 30-40% of the CPU  
alone and I'd like to push at least 1000 sprites. What can I do?  
I've looked into passing sprites as a matrix with vertex arrays, but  
forming a proper vertex array with numpy can sometimes be more  
trouble than it's worth. Plus, I can't swap out textures easily mid- 
draw, so it makes things much more complex than the simple way I'm  
doing things now.


Is there any design pattern I could follow that will get me more  
speed without sending me off the deep end with complexity.


Thanks,

Zack







Re: [pygame] Python - Pygame - PyOpenGL performance

2009-02-26 Thread Ian Mallett
There are certain easy ways to optimize certain techniques.

For example, I wanted an OpenGL program with many many particles.
They had only to be one color, and should be pretty small.  The
solution was to use shaders to draw points.  I got over one million
(1024**2) particles at 50fps using this technique.  Still, the
solutution ended up being pretty complicated.

I'm guessing that for better results, you'll likewise want something
more complex.  There's not really a way around it.  Display lists are
the easiest method I've seen for drastically improving performance.
If you want faster, you'll need more code.

Ian


Re: [pygame] surfarray on 64-bit machines

2009-02-26 Thread Marius Gedminas
This was a long time ago (shame on me for not finding the time to
investigate this further):

 On Wed, Oct 22, 2008 at 7:23 PM, Marius Gedminas mar...@gedmin.as wrote:
  A user reported that PySpaceWar fails on 64-bit Linux machines if I try
  to scale the alpha channel.  Here's the code (simplified):
 
 import pygame
 import Numeric
 image = pygame.image.load('title.png')   # has an alpha channel
 mask = pygame.surfarray.array_alpha(image).astype(Numeric.Int)
 array = pygame.surfarray.pixels_alpha(self.image)
 alpha = 42.5 # a float between 1 and 255
 array[:] = (mask * alpha / 255).astype(Numeric.UnsignedInt8)
 
  The error happens on the last line, and it says
 
 ValueError: matrices are not aligned for copy
 
  Any ideas?  The code works fine on 32-bit systems.

On Wed, Oct 22, 2008 at 09:28:46PM -0500, Charlie Nolan wrote:
 I may be having this same error.  I've got a bug report with that same
 error message at one point (and on a 64-bit machine), even though it
 works fine on my (32-bit) machine.  Could you try printing out
 array[:].shape?  In my case, I do a sensible slice and somehow end
 up with a 0x600 array.

On a 32-bit machine:

  array[:].shape == array.shape == (333, 83)

On a 64-bit machine:

  array[:].shape == (0, 83)

On Wed, Oct 22, 2008 at 07:46:53PM -0700, Lenard Lindstrom wrote:
 I am curious, but what happens if array[:] is replaced with array[...].  

The code starts working!  Thank you!

 It is a two dimension array, so I am surprised the single index slice  
 [:] even works.

(on 32-bit only, for some reason).

 The alternate form [..] is indifferent to array 
 dimension.

It's a thinko on my part.  I want an in-place assignment, I tend to
write container[:] = new_value, without considering dimensionality at
all.

Cheers!
Marius Gedminas
-- 
A programmer started to cuss
Because getting to sleep was a fuss
As he lay there in bed
Looping 'round in his head
was: while(!asleep()) sheep++;


signature.asc
Description: Digital signature


Re: [pygame] surfarray on 64-bit machines

2009-02-26 Thread René Dudfield
hey,

is it possible to use numpy instead of Numeric?  Numeric really is
dying now...  even we are going to stop trying to keep it working.


cheers,



On Fri, Feb 27, 2009 at 10:49 AM, Marius Gedminas mar...@gedmin.as wrote:
 This was a long time ago (shame on me for not finding the time to
 investigate this further):

 On Wed, Oct 22, 2008 at 7:23 PM, Marius Gedminas mar...@gedmin.as wrote:
  A user reported that PySpaceWar fails on 64-bit Linux machines if I try
  to scale the alpha channel.  Here's the code (simplified):
 
     import pygame
     import Numeric
     image = pygame.image.load('title.png')   # has an alpha channel
     mask = pygame.surfarray.array_alpha(image).astype(Numeric.Int)
     array = pygame.surfarray.pixels_alpha(self.image)
     alpha = 42.5 # a float between 1 and 255
     array[:] = (mask * alpha / 255).astype(Numeric.UnsignedInt8)
 
  The error happens on the last line, and it says
 
     ValueError: matrices are not aligned for copy
 
  Any ideas?  The code works fine on 32-bit systems.

 On Wed, Oct 22, 2008 at 09:28:46PM -0500, Charlie Nolan wrote:
 I may be having this same error.  I've got a bug report with that same
 error message at one point (and on a 64-bit machine), even though it
 works fine on my (32-bit) machine.  Could you try printing out
 array[:].shape?  In my case, I do a sensible slice and somehow end
 up with a 0x600 array.

 On a 32-bit machine:

  array[:].shape == array.shape == (333, 83)

 On a 64-bit machine:

  array[:].shape == (0, 83)

 On Wed, Oct 22, 2008 at 07:46:53PM -0700, Lenard Lindstrom wrote:
 I am curious, but what happens if array[:] is replaced with array[...].

 The code starts working!  Thank you!

 It is a two dimension array, so I am surprised the single index slice
 [:] even works.

 (on 32-bit only, for some reason).

 The alternate form [..] is indifferent to array
 dimension.

 It's a thinko on my part.  I want an in-place assignment, I tend to
 write container[:] = new_value, without considering dimensionality at
 all.

 Cheers!
 Marius Gedminas
 --
 A programmer started to cuss
 Because getting to sleep was a fuss
 As he lay there in bed
 Looping 'round in his head
 was: while(!asleep()) sheep++;

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (GNU/Linux)

 iD8DBQFJpyqMkVdEXeem148RAt0uAKCKWetlrgaEJwE9Y39Ue2Ms8UNgaQCfX/U/
 TaEkGWual9RhcREfRIr3D7k=
 =IslQ
 -END PGP SIGNATURE-




Re: [pygame] surfarray on 64-bit machines

2009-02-26 Thread Marius Gedminas
On Fri, Feb 27, 2009 at 11:16:28AM +1100, René Dudfield wrote:
 hey,
 
 is it possible to use numpy instead of Numeric?  Numeric really is
 dying now...  even we are going to stop trying to keep it working.

I suppose I should.  Since I'm really clueless about
Numeric/numarray/numpy, please tell me if this code has any obvious
shortcomings:

  # initialization, done once
  import pygame
  import numpy
  image = pygame.image.load('title.png')   # has an alpha channel
  mask = pygame.surfarray.array_alpha(image)

  # this is done once every frame
  array = pygame.surfarray.pixels_alpha(image)
  alpha = 42.5 # a float varying between 1 and 255
  array[...] = (mask * alpha / 255).astype('b')

Cheers!
Marius Gedminas
-- 
Never trust a computer you can't repair yourself.


signature.asc
Description: Digital signature


Re: [pygame] Python - Pygame - PyOpenGL performance

2009-02-26 Thread Zack Schilling
I spent a good portion of this evening updating my glSprite class to  
use VBOs to render. I was able to push 1800 individually animated,  
arbitrarily sized sprites at 60FPS before my CPU tapped out. That's  
more than 3 times faster than the display lists.


I've done some performance analysis on my code and found that the  
largest bottleneck by far is iterating through my game objects and  
populating the NumPy array that gets streamed off to the GPU (I've  
already made sure that I never duplicate, copy or type convert the  
array at any time.). The actual streaming and draw calls are  
negligible by comparison.  I tried hard-coding the arrays to see just  
how much CPU the streaming and drawing took, but I was never able to  
get it past 20%, no matter how many quads I told it to draw.


So I guess I'll continue looking for ways to keep pushing that array  
population faster. It looks like I've gotten the OpenGL side of things  
running as quickly as technically possible. Thanks everyone for the  
nudges in the right direction.


-Zack


On Feb 26, 2009, at 3:06 PM, Ian Mallett wrote:


There are certain easy ways to optimize certain techniques.

For example, I wanted an OpenGL program with many many particles.
They had only to be one color, and should be pretty small.  The
solution was to use shaders to draw points.  I got over one million
(1024**2) particles at 50fps using this technique.  Still, the
solutution ended up being pretty complicated.

I'm guessing that for better results, you'll likewise want something
more complex.  There's not really a way around it.  Display lists are
the easiest method I've seen for drastically improving performance.
If you want faster, you'll need more code.

Ian