Re: [pygame] OBJ loader using VBOs

2010-12-04 Thread Christopher Night
On Sat, Dec 4, 2010 at 5:49 PM, Ian Mallett  wrote:

> Buffer binding is one of the slowest GL calls you can do, short of
> transferring huge chunks of data around (glTexImage2D, glReadPixels, etc.).
> State changing is one of the worst things you can do for efficiency,
> especially on top of a scripting language where the overhead is much
> higher.  You have 40 sprites, each with 3 VBO bindings, and 5 texture
> bindings.  If I'm understanding right, that's 120 VBO bindings and 200
> texture bindings!  Transferring this data across the bus once (when you put
> it in a display list) will speed things up greatly, but you'll notice that
> the framerate in 6 is still much lower than in 2 or 4.
>
> If you're just drawing sprites, chances are you don't need very many
> textures.  At the very least, you can use a texture atlas, or batch calls by
> the texture required.
>
> I don't know exactly the situation you're in here, but unless the 40
> sprites all have different geometry, you need only bind the data once, and
> then call glDrawArrays 40 times.
>
> In general, try to minimize binding calls, such as glUseShader,
> glBindFramebuffer, glBindTexture, and glBindBuffer.
>
> Yeah, that makes a lot of sense. I'm not sure how I could implement this
advice in the standalone OBJ loader without imposing a lot of restrictions
on how it could be used. For instance, I can see how you might make it more
efficient when the sprites are sorted by the texture required. But I can't
see how to do that and also make it so it doesn't break horribly if the
sprites are unsorted. I'll think about it, but for now I might stick with
the display lists.

My test is not precise enough to worry about the difference between 113fps
and 121fps. I think based on the test that method #6 is effectively as fast
as #4. I may try to squeeze a few more frames per second out, in which case
I'll make a more controlled test, but mostly I was interested in getting the
three-order-of-magnitude speedup in loading that you see between #2 and
#4/6.

-Christopher


Re: [pygame] OBJ loader using VBOs

2010-12-04 Thread Ian Mallett
On Sat, Dec 4, 2010 at 3:37 PM, Christopher Night wrote:

> For this model, glEnableClientState gets called once for vertices, normals,
> and texcoords. There are 5 materials, each with one glBindTexture and two
> glDrawArrays (one for triangles, one for quads). So the total calls per
> render is:
>
> 2 x glEnable/glDisable
> 1 x glFrontFace
> 3 x vbo.bind
> 3 x glEnableClientState
> 10 x glDrawArrays
> 5 x glBindTexture
> 4 x glColor
>
> And I'm rendering 40 sprites, so I'm doing this 40 times per frame. I'm
> assuming that in a real application, each model would have its own separate
> VBOs. Is that what I'm doing wrong? Or is there something else?
>
Buffer binding is one of the slowest GL calls you can do, short of
transferring huge chunks of data around (glTexImage2D, glReadPixels, etc.).
State changing is one of the worst things you can do for efficiency,
especially on top of a scripting language where the overhead is much
higher.  You have 40 sprites, each with 3 VBO bindings, and 5 texture
bindings.  If I'm understanding right, that's 120 VBO bindings and 200
texture bindings!  Transferring this data across the bus once (when you put
it in a display list) will speed things up greatly, but you'll notice that
the framerate in 6 is still much lower than in 2 or 4.

If you're just drawing sprites, chances are you don't need very many
textures.  At the very least, you can use a texture atlas, or batch calls by
the texture required.

I don't know exactly the situation you're in here, but unless the 40 sprites
all have different geometry, you need only bind the data once, and then call
glDrawArrays 40 times.

In general, try to minimize binding calls, such as glUseShader,
glBindFramebuffer, glBindTexture, and glBindBuffer.

> The reason it takes so long to load on 2 is generating the display list.
> This method was taken from the objloader on the wiki, and it involves 1646
> glVertex3f calls, one for each vertex in the model, and similarly with
> glNormal and glTexCoords.
>
> Thanks again!
>
> -Christopher
>
Ian


Re: [pygame] OBJ loader using VBOs

2010-12-04 Thread Christopher Night
On Sat, Dec 4, 2010 at 4:05 PM, Ian Mallett  wrote:

>
> On Sat, Dec 4, 2010 at 1:21 PM, Christopher Night 
> wrote:
>
>> Hi, I'm working on a standalone OBJ loader based on the well-known one on
>> the pygame wiki:
>> http://www.pygame.org/wiki/OBJFileLoader
>>
>> My goal is to speed up load times by making the model objects picklable,
>> so the OBJ file doesn't have to be read every time you start up. Here's my
>> current version:
>> http://christophernight.net/stuff/fasterobj-0.tgz
>>
>> It still needs some cleaning up, but it's got almost all the functionality
>> I wanted. In addition to making things picklable, it has a small
>> optimization by combining triangles and quads when possible to reduce the
>> number of GL calls.
>>
>> There are three classes: OBJ (using fixed function), OBJ_array (using
>> vertex arrays), and OBJ_vbo (using vertex buffer objects). Additionally, any
>> of these can be used with or without a display list. Here's the results of
>> my test on some model I had lying around:
>>
>>type  list? parse save load   render
>> 1. fixed False  146   13   140.03fps
>> 2. fixed  True  124   10  950  117.80fps
>> 3. array False  179891.26fps
>> 4. array  True  1747   30  121.08fps
>> 5. vbo   False  14378   16.06fps
>> 6. vboTrue  1428   12  112.98fps
>>
>> #2 is the method in the original OBJ loader. The times listed under parse,
>> save, and load are times in milliseconds to read from the OBJ file and do
>> some preprocessing, pickle to a file, and unpickle from a file. The load
>> step also includes generating the display list, if necessary.
>>
>

> I completely would have expected the results in 1-4.
>
> However, I'm quite surprised at the vbo method 5.  It should run in speed
> between 2 and 4.  I also would have expected 4 and 6 to be much closer.
>
> How many VBOs are you using?  If you switch buffer bindings a lot for each
> draw (like your object has 10 different parts, each with a vertex, normal,
> and texcoord VBO) then you *might* get results like that . . .
>

Awesome, thanks so much for taking a look! I'm using 3 VBOs, one each for
vertex, normal, and texcoord. This is the entire rendering code for OBJ_vbo:

glEnable(GL_TEXTURE_2D)
glFrontFace(GL_CCW)
self.vbo_v.bind()
glVertexPointerf(self.vbo_v)
self.vbo_n.bind()
glNormalPointerf(self.vbo_n)
self.vbo_t.bind()
glTexCoordPointerf(self.vbo_t)
glEnableClientState(GL_VERTEX_ARRAY)
texon, normon = None, None
for material, mindices in self.indices:
self.mtl.bind(material)
 for nvs, dotex, donorm, ioffset, isize in mindices:
 if donorm != normon:
 normon = donorm
 (glEnableClientState if donorm else glDisableClientState)(GL_NORMAL_ARRAY)
 if dotex != texon:
 texon = dotex
 (glEnableClientState if dotex else
glDisableClientState)(GL_TEXTURE_COORD_ARRAY)
 shape = [GL_TRIANGLES, GL_QUADS, GL_POLYGON][nvs-3]
 glDrawArrays(shape, ioffset, isize)
glDisable(GL_TEXTURE_2D)

For this model, glEnableClientState gets called once for vertices, normals,
and texcoords. There are 5 materials, each with one glBindTexture and two
glDrawArrays (one for triangles, one for quads). So the total calls per
render is:

2 x glEnable/glDisable
1 x glFrontFace
3 x vbo.bind
3 x glEnableClientState
10 x glDrawArrays
5 x glBindTexture
4 x glColor

And I'm rendering 40 sprites, so I'm doing this 40 times per frame. I'm
assuming that in a real application, each model would have its own separate
VBOs. Is that what I'm doing wrong? Or is there something else?

The reason it takes so long to load on 2 is generating the display list.
This method was taken from the objloader on the wiki, and it involves 1646
glVertex3f calls, one for each vertex in the model, and similarly with
glNormal and glTexCoords.

Thanks again!

-Christopher


Re: [pygame] OBJ loader using VBOs

2010-12-04 Thread Ian Mallett
Hi,
On Sat, Dec 4, 2010 at 1:21 PM, Christopher Night wrote:

> Hi, I'm working on a standalone OBJ loader based on the well-known one on
> the pygame wiki:
> http://www.pygame.org/wiki/OBJFileLoader
>
> My goal is to speed up load times by making the model objects picklable, so
> the OBJ file doesn't have to be read every time you start up. Here's my
> current version:
> http://christophernight.net/stuff/fasterobj-0.tgz
>
> It still needs some cleaning up, but it's got almost all the functionality
> I wanted. In addition to making things picklable, it has a small
> optimization by combining triangles and quads when possible to reduce the
> number of GL calls.
>
> There are three classes: OBJ (using fixed function), OBJ_array (using
> vertex arrays), and OBJ_vbo (using vertex buffer objects). Additionally, any
> of these can be used with or without a display list. Here's the results of
> my test on some model I had lying around:
>
>type  list? parse save load   render
> 1. fixed False  146   13   140.03fps
> 2. fixed  True  124   10  950  117.80fps
> 3. array False  179891.26fps
> 4. array  True  1747   30  121.08fps
> 5. vbo   False  14378   16.06fps
> 6. vboTrue  1428   12  112.98fps
>
> #2 is the method in the original OBJ loader. The times listed under parse,
> save, and load are times in milliseconds to read from the OBJ file and do
> some preprocessing, pickle to a file, and unpickle from a file. The load
> step also includes generating the display list, if necessary. Obviously
> methods #1 and #3 render far too slow; they're just there for comparison.
>
> So anyway, it looks pretty good. I think that #4 or #6 would do fine for my
> purposes. However, I know that people don't like to put vertex arrays and
> VBOs inside display lists, so I want to know if there's some problem with
> this method. I understand that putting a VBO in a display list defeats the
> whole purpose of having a VBO, since you can't update it, but I imagine
> you're probably not going to be doing that with OBJ models anyway. Also,
> when I asked about this a few months ago, someone said that method #5 should
> outperform methods #1-4, and that doesn't seem to be the case. So I might be
> misusing the VBOs.
>
I completely would have expected the results in 1-4.

However, I'm quite surprised at the vbo method 5.  It should run in speed
between 2 and 4.  I also would have expected 4 and 6 to be much closer.

How many VBOs are you using?  If you switch buffer bindings a lot for each
draw (like your object has 10 different parts, each with a vertex, normal,
and texcoord VBO) then you *might* get results like that . . .

> Any other comments welcome too! If you have any OBJ files you want me to
> test, just let me know.
>
This is great, actually.  I imagine pickling could make things much faster.
Wonder why it took longer to load on 2?

> -Christopher
>
Ian


[pygame] OBJ loader using VBOs

2010-12-04 Thread Christopher Night
Hi, I'm working on a standalone OBJ loader based on the well-known one on
the pygame wiki:
http://www.pygame.org/wiki/OBJFileLoader

My goal is to speed up load times by making the model objects picklable, so
the OBJ file doesn't have to be read every time you start up. Here's my
current version:
http://christophernight.net/stuff/fasterobj-0.tgz

It still needs some cleaning up, but it's got almost all the functionality I
wanted. In addition to making things picklable, it has a small optimization
by combining triangles and quads when possible to reduce the number of GL
calls.

There are three classes: OBJ (using fixed function), OBJ_array (using vertex
arrays), and OBJ_vbo (using vertex buffer objects). Additionally, any of
these can be used with or without a display list. Here's the results of my
test on some model I had lying around:

   type  list? parse save load   render
1. fixed False  146   13   140.03fps
2. fixed  True  124   10  950  117.80fps
3. array False  179891.26fps
4. array  True  1747   30  121.08fps
5. vbo   False  14378   16.06fps
6. vboTrue  1428   12  112.98fps

#2 is the method in the original OBJ loader. The times listed under parse,
save, and load are times in milliseconds to read from the OBJ file and do
some preprocessing, pickle to a file, and unpickle from a file. The load
step also includes generating the display list, if necessary. Obviously
methods #1 and #3 render far too slow; they're just there for comparison.

So anyway, it looks pretty good. I think that #4 or #6 would do fine for my
purposes. However, I know that people don't like to put vertex arrays and
VBOs inside display lists, so I want to know if there's some problem with
this method. I understand that putting a VBO in a display list defeats the
whole purpose of having a VBO, since you can't update it, but I imagine
you're probably not going to be doing that with OBJ models anyway. Also,
when I asked about this a few months ago, someone said that method #5 should
outperform methods #1-4, and that doesn't seem to be the case. So I might be
misusing the VBOs.

Any other comments welcome too! If you have any OBJ files you want me to
test, just let me know.

-Christopher