A lot of this performance issue may depend on the specific implementation
in your driver. Are you using Intel, nVidia or AMD (ATI) graphics?


> Hi,
> I am currently evaluating the performance of uniform buffer objects vs.
> regular uniforms. My demo applications uses hardware instancing to render
> thousands of quads that represent grass. The geometry is static in my demo.
> I tried out different approaches to store the model matrix for each
> instance: an uniform array, a texture and an uniform buffer object. I
> thought UBOs should be faster than regular uniforms, as they are bigger(i
> can store 4 times more matrices) and the data will only be uploaded once.
> But in fact they are way slower: 60 FPS vs.  16 FPS with 65.536 instances.
> Here is the code that creates my UBO and binds it:
> Code:
> // create uniform buffer object for all matrices
> osg::FloatArray* matrixArray = new osg::FloatArray(maxUBOMatrices*16);
> for (unsigned int i = start, j = 0; i < end; ++i, ++j)
> {
>     for (unsigned int k = 0; k < 16; ++k)
>     {
>         (*matrixArray)[j*16+k] = m_matrices[i].ptr()[k];
>     }
> }
> osg::ref_ptr<osg::UniformBufferObject> ubo = new osg::UniformBufferObject;
> ubo->setUsage(GL_STATIC_DRAW_ARB);
> ubo->setDataVariance(osg::Object::STATIC);
> matrixArray->setBufferObject(ubo);
> // create uniform buffer binding and add it to the stateset
> osg::ref_ptr<osg::UniformBufferBinding> ubb = new
> osg::UniformBufferBinding(0, ubo, 0, maxUBOMatrices*16*sizeof(GLfloat));
> geode->getOrCreateStateSet()->setAttributeAndModes(ubb,
> osg::StateAttribute::ON);
> // set uniform block location
> program->addBindUniformBlock("instanceData", 0);
> And here is how I access it in the vertex shader:
> Code:
> #version 150 compatibility
> #extension GL_ARB_uniform_buffer_object : enable
> #define MAX_INSTANCES {will be set by c++ code}
> layout(std140) uniform instanceData
> {
>         mat4 instanceModelMatrix[MAX_INSTANCES];
> };
> smooth out vec2 texCoord;
> smooth out vec3 normal;
> smooth out vec3 lightDir;
> void main()
> {
>         mat4 _instanceModelMatrix = instanceModelMatrix[gl_InstanceID];
>         gl_Position = gl_ModelViewProjectionMatrix * _instanceModelMatrix
> * gl_Vertex;
>         texCoord = gl_MultiTexCoord0.xy;
>         mat3 normalMatrix = mat3(_instanceModelMatrix[0][0],
> _instanceModelMatrix[0][1], _instanceModelMatrix[0][2],
>  _instanceModelMatrix[1][0], _instanceModelMatrix[1][1],
> _instanceModelMatrix[1][2],
>  _instanceModelMatrix[2][0], _instanceModelMatrix[2][1],
> _instanceModelMatrix[2][2]);
>         normal = gl_NormalMatrix * normalMatrix * gl_Normal;
>         lightDir = gl_LightSource[0].position.xyz;
> }
> Am I doing something wrong, that the performance of UBOs is so bad? Or are
> they just not meant to be used, like I use them?
> Another question is about the software design of osg::BufferObject. When I
> set the buffer object of an array, it will get added to the internal
> BufferDataList of this BufferObject. But the BufferDataLists uses regular
> pointers instead of a ref_ptr. Is there a reason the BufferObject was
> designed this way? I have the problem right now, that I need to store the
> pointer to my matrix array somewhere else in my program for memory
> management reasons, even though I don't need this pointer anymore.
> In osg::Geometry on the other hand ref_ptr's are used for the storage of
> vertex arrays or other vertex attribute arrays. So I can just add an array
> to a geometry and the array will automatically be destroyed if the geometry
> is destroyed(and there is no other reference to this array). Wouldn't it be
> better if osg::BufferObject worked the same way? Or am I missing something
> important here?
> Thank you!
> Cheers,
> Marcel
