Keith Whitwell wrote:

Ian Romanick wrote:

One thing about Jakub's patch is that, on x86, it eliminates the need for the specialized _ts_* versions of the dispatch functions. It basically converts the DISPATCH macro (as used in src/mesa/main/dispatch.c) from:

#define DISPATCH(FUNC, ARGS, MESSAGE) \
    (_glapi_Dispatch->FUNC) ARGS

to:

#define DISPATCH(FUNC, ARGS, MESSAGE)                 \
    const struct _glapi_table * d = _glapi_Dispatch;    \
    if ( __builtin_expect( d == NULL, 0 ) )            \
        d = get_dispatch();                \
    (d->FUNC) ARGS

There is some extra cost in the non-threaded case, but it seems very minimal. In the x86 assembly case, it's only a test and a conditional branch that is usually not taken. Does this seem like a reasonable change to make across the board?

Hmm. The _ts_* macros were introduced to eliminate exactly that sort of test - though we probably coded it up in a less optimal way than that. Are you saying that the dispatch tables would really become compiled 'C'? At the moment they are typically generated as assembly and use a jmp rather than calling a new function as in either of the examples above.


My feeling is that the non-threaded case should run as fast as possible, being the normal usage. Maybe some timings would make things clearer.

Attached is the test program I used. It takes turns calling a few API functions 1,000,000 (or more if specified on the command line) times. I tried it on a 2.4GHz Pentium 4 and a 400MHz K6-3. Both systems are Redhat 7.3 + patches (and in need of upgrades, I know). All code was compiled with gcc 2.96-113.


On the K6-3, the results were within the measurable margin of error for the two x86 assembly dispatch methods.

On the P4, the old-style dispatch was between 5 and 20 clock cycles faster. This amounts to an increase of between 5% and 38% on each call. The worst was glTexCoord3fv, which increased from ~52 cycles to ~72 cycles. The two exceptions were glMultiTexCoord2fv and glMultiTexCoord2f. The timings for these were virtually identical.

I'm a bit confused as to why the overhead isn't constant from function to function. The difference per-call should be identical. I suspect there is some other difference in my build. :( I'll keep looking into it...

#include <stdio.h>
#include <stdlib.h>
#define GL_GLEXT_PROTOTYPES
#include <GL/gl.h>
#include <GL/glext.h>
#include <GL/glut.h>

#include <asm/timex.h>

static float Width = 400.0;
static float Height = 400.0;
static unsigned count = 1000000;


static void Idle( void )
{
   glutPostRedisplay();
}

#define DO_FUNC(f,p) \
   do { \
      t0 = get_cycles(); \
      for ( i = 0 ; i < count ; i++ ) { \
         f p ; \
      } \
      t1 = get_cycles(); \
      printf("%u calls to % 20s required %llu cycles.\n", count, # f, t1 - t0); \
   } while( 0 )

static void Display( void )
{
   int i;
   const float v[3] = { 1.0, 0.0, 0.0 };
   cycles_t t0;
   cycles_t t1;

   glBegin(GL_TRIANGLE_STRIP);

   DO_FUNC( glColor3fv, (v) );
   DO_FUNC( glNormal3fv, (v) );
   DO_FUNC( glTexCoord2fv, (v) );
   DO_FUNC( glTexCoord3fv, (v) );
   DO_FUNC( glMultiTexCoord2fv, (GL_TEXTURE0, v) );
   DO_FUNC( glMultiTexCoord2f, (GL_TEXTURE0, 0.0, 0.0) );
   DO_FUNC( glFogCoordfv, (v) );
   DO_FUNC( glFogCoordf, (0.5) );

   glEnd();

   exit(0);
}


static void Reshape( int width, int height )
{
   Width = width;
   Height = height;
   glViewport( 0, 0, width, height );
   glMatrixMode( GL_PROJECTION );
   glLoadIdentity();
   glOrtho(0.0, width, 0.0, height, -1.0, 1.0);
   glMatrixMode( GL_MODELVIEW );
   glLoadIdentity();
}


static void Key( unsigned char key, int x, int y )
{
   (void) x;
   (void) y;
   switch (key) {
      case 27:
         exit(0);
         break;
   }
   glutPostRedisplay();
}


int main( int argc, char *argv[] )
{
   glutInit( &argc, argv );
   glutInitWindowSize( (int) Width, (int) Height );
   glutInitWindowPosition( 0, 0 );

   glutInitDisplayMode( GLUT_RGB );

   glutCreateWindow( argv[0] );

   if ( argc > 1 ) {
      count = strtoul( argv[1], NULL, 0 );
   }

   glutReshapeFunc( Reshape );
   glutKeyboardFunc( Key );
   glutDisplayFunc( Display );
   glutIdleFunc( Idle );

   glutMainLoop();
   return 0;
}

Reply via email to