Re: Thread Local Storage libGL

2004-06-21 Thread Ian Romanick
Ian Romanick wrote:
Ian Romanick wrote:
Ian Romanick wrote:
Alan Hourihane wrote:
Is there someone looking to integrate the TLS patches for libGL ??
We should certainly take a look soon and comment upon the patches used.
Here is a patch that covers part of what's in the Redhat patch.  This 
convert the static_functions table to a list of offsets instead of a 
list of pointers.  According to 'objdump -R' on the Mesa libGL, it 
cuts out about 1800 R_386_RELATIVE relocs.  However, the size of the 
library *increases* by about 24k.  That doesn't make sense to me.
Here's an updated version of that patch.  There are some significant 
differences.

1. *All* architectures use the string offset table.  To do this, 
gl_procs.py was modified to generate a big character array called 
gl_string_table in glprocs.h.  The static_functions array now contains 
offsets into that array instead of pointers to strings.  If 
gl_procs.py is invoked with '-m short' it will generate a (hard to 
read) character array.  If it is invoked with no option or '-m long' 
it will generate a big (~16k) string.  The string version of the .h 
file generates a warning from GCC.

2. The same glprocs.h is used even for the optimized x86 case.  This 
is done by defining NEED_FUNC_POINTERS only on non-x86.  Actually, it 
should only be defined on architectures that don't have generated 
assembly dispatch stubs.

3. All of the _ts_ dispatch code is *gone*.  The x86 assembly dispatch 
code and the C dispatch code reflect this.  The SPARC assembly 
dispatch has not yet been updated, but it should follow the x86 
model.  This means that this cod will catch fire, fallover, and sink 
into the swamp on SPARC.  This will obviously need to be fixed before 
that portion of the patch is committed.

Unless there are objections, I would like to commit the new glprocs.h 
and the non-x86 specific code in glapi.c to support it.
This patch is about the final form, I hope.  It builds on the previous 
version by replacing all _glapi_Dispatch-Foo calls with the GL_CALL 
macro.  In addition to several programs from progs/demos, this patch has 
been tested with progs/xdemos/glthreads with 20 threads.  The previous 
patch would die in glthreads because, in the threaded case, 
_glapi_Dispatch is NULL.  That shouldn't have been a big surprise to me 
since that's most of the point of that patch!  Duh!

Conceptually, this is similar to the GL_CALL macro in Jakub's patch, but 
it does not directly call the dispatch function in the threaded case. 
Since we can't call the dispatch functions from with in a *_dri.so, it 
inlines the dispatch function.  As a nice side effect, dispatch.c uses 
GL_CALL to define DISPATCH and RETURN_DISPATCH.  The GL_CALL macro 
currently lives in glthread.h because it might use some threading 
related functions.  The pthreads-specific version directly calls 
pthread_getspecific, for example.

For functions that have a lot of GL_CALL invocations, it might be 
possible to make a new macro, GL_CALL_GET_DISPATCH or something, to 
cache the dispatch table pointer.  This should make the compiled code a 
lot smaller and reduce the performance hit in the threaded case.  Note 
that the performance hit in the threaded case was just as bad (maybe 
worse) when the _ts_ dispatch functions were used.

Like I threatened yesterday (heh), tomorrow (Wednesday) I plan to commit 
the following parts of this patch:

1. The new glprocs.h (generated with 'python2 gl_procs.py -m short') and 
the non-x86 specific code to support it.

2. The GL_CALL changes and the non-threaded version of the macro. That's 
the one that looks like #define GL_CALL(name) (_glapi_Dispatch- name).

My hope is that we can discuss the remaining changes in the patch at 
Monday's #dri-devel chat.  I should be able to get some performance 
numbers out later today.
This patch is the same as the -03 patch except it is against today's 
CVS.  Also, you *MUST* regenerate glapi_x86.S on your own.  Including 
that file made the patch way to big to get through the list. :)  You can 
do this by doing (after applying the patch):

cd src/mesa/glapi
python2 gl_x86_asm.py  ../x86/glapi_x86.S
Index: src/mesa/glapi/gl_x86_asm.py
===
RCS file: /cvs/mesa/Mesa/src/mesa/glapi/gl_x86_asm.py,v
retrieving revision 1.1
diff -u -d -r1.1 gl_x86_asm.py
--- src/mesa/glapi/gl_x86_asm.py18 May 2004 18:33:40 -  1.1
+++ src/mesa/glapi/gl_x86_asm.py21 Jun 2004 21:17:25 -
@@ -77,15 +77,65 @@
print '#define GLOBL_FN(x) GLOBL x'
print '#endif'
print ''
-   print '#define GL_STUB(fn,off,stack)\t\t\t\t\\'
+   print '#if defined(PTHREADS)'
+   print '#  define GL_STUB(fn,off,fn_alt)\t\t\t\\'
print 'ALIGNTEXT16;\t\t\t\t\t\t\\'
-   print 'GLOBL_FN(GL_PREFIX(fn, fn ## @ ## stack));\t\t\\'
-   print 'GL_PREFIX(fn, fn ## @ ## 

Re: Thread Local Storage libGL

2004-05-25 Thread Ian Romanick
Keith Whitwell wrote:
Ian Romanick wrote:
One thing about Jakub's patch is that, on x86, it eliminates the need 
for the specialized _ts_* versions of the dispatch functions.  It 
basically converts the DISPATCH macro (as used in 
src/mesa/main/dispatch.c) from:

#define DISPATCH(FUNC, ARGS, MESSAGE) \
(_glapi_Dispatch-FUNC) ARGS
to:
#define DISPATCH(FUNC, ARGS, MESSAGE) \
const struct _glapi_table * d = _glapi_Dispatch;\
if ( __builtin_expect( d == NULL, 0 ) )\
d = get_dispatch();\
(d-FUNC) ARGS
There is some extra cost in the non-threaded case, but it seems very 
minimal.  In the x86 assembly case, it's only a test and a conditional 
branch that is usually not taken.  Does this seem like a reasonable 
change to make across the board?
Hmm.  The _ts_* macros were introduced to eliminate exactly that sort of 
test - though we probably coded it up in a less optimal way than that.  
Are you saying that the dispatch tables would really become compiled 
'C'?  At the moment they are typically generated as assembly and use a 
jmp rather than calling a new function as in either of the examples above.

My feeling is that the non-threaded case should run as fast as possible, 
being the normal usage.  Maybe some timings would make things clearer.
Attached is the test program I used.  It takes turns calling a few API 
functions 1,000,000 (or more if specified on the command line) times.  I 
tried it on a 2.4GHz Pentium 4 and a 400MHz K6-3.  Both systems are 
Redhat 7.3 + patches (and in need of upgrades, I know).  All code was 
compiled with gcc 2.96-113.

On the K6-3, the results were within the measurable margin of error for 
the two x86 assembly dispatch methods.

On the P4, the old-style dispatch was between 5 and 20 clock cycles 
faster.  This amounts to an increase of between 5% and 38% on each call. 
 The worst was glTexCoord3fv, which increased from ~52 cycles to ~72 
cycles.  The two exceptions were glMultiTexCoord2fv and 
glMultiTexCoord2f.  The timings for these were virtually identical.

I'm a bit confused as to why the overhead isn't constant from function 
to function.  The difference per-call should be identical.  I suspect 
there is some other difference in my build. :(  I'll keep looking into it...

#include stdio.h
#include stdlib.h
#define GL_GLEXT_PROTOTYPES
#include GL/gl.h
#include GL/glext.h
#include GL/glut.h

#include asm/timex.h

static float Width = 400.0;
static float Height = 400.0;
static unsigned count = 100;


static void Idle( void )
{
   glutPostRedisplay();
}

#define DO_FUNC(f,p) \
   do { \
  t0 = get_cycles(); \
  for ( i = 0 ; i  count ; i++ ) { \
 f p ; \
  } \
  t1 = get_cycles(); \
  printf(%u calls to % 20s required %llu cycles.\n, count, # f, t1 - t0); \
   } while( 0 )

static void Display( void )
{
   int i;
   const float v[3] = { 1.0, 0.0, 0.0 };
   cycles_t t0;
   cycles_t t1;

   glBegin(GL_TRIANGLE_STRIP);

   DO_FUNC( glColor3fv, (v) );
   DO_FUNC( glNormal3fv, (v) );
   DO_FUNC( glTexCoord2fv, (v) );
   DO_FUNC( glTexCoord3fv, (v) );
   DO_FUNC( glMultiTexCoord2fv, (GL_TEXTURE0, v) );
   DO_FUNC( glMultiTexCoord2f, (GL_TEXTURE0, 0.0, 0.0) );
   DO_FUNC( glFogCoordfv, (v) );
   DO_FUNC( glFogCoordf, (0.5) );

   glEnd();

   exit(0);
}


static void Reshape( int width, int height )
{
   Width = width;
   Height = height;
   glViewport( 0, 0, width, height );
   glMatrixMode( GL_PROJECTION );
   glLoadIdentity();
   glOrtho(0.0, width, 0.0, height, -1.0, 1.0);
   glMatrixMode( GL_MODELVIEW );
   glLoadIdentity();
}


static void Key( unsigned char key, int x, int y )
{
   (void) x;
   (void) y;
   switch (key) {
  case 27:
 exit(0);
 break;
   }
   glutPostRedisplay();
}


int main( int argc, char *argv[] )
{
   glutInit( argc, argv );
   glutInitWindowSize( (int) Width, (int) Height );
   glutInitWindowPosition( 0, 0 );

   glutInitDisplayMode( GLUT_RGB );

   glutCreateWindow( argv[0] );

   if ( argc  1 ) {
  count = strtoul( argv[1], NULL, 0 );
   }

   glutReshapeFunc( Reshape );
   glutKeyboardFunc( Key );
   glutDisplayFunc( Display );
   glutIdleFunc( Idle );

   glutMainLoop();
   return 0;
}


Re: Thread Local Storage libGL

2004-05-24 Thread Keith Whitwell
Ian Romanick wrote:
Keith Whitwell wrote:
Ian Romanick wrote:
One thing about Jakub's patch is that, on x86, it eliminates the need 
for the specialized _ts_* versions of the dispatch functions.  It 
basically converts the DISPATCH macro (as used in 
src/mesa/main/dispatch.c) from:

#define DISPATCH(FUNC, ARGS, MESSAGE) \
(_glapi_Dispatch-FUNC) ARGS
to:
#define DISPATCH(FUNC, ARGS, MESSAGE) \
const struct _glapi_table * d = _glapi_Dispatch;\
if ( __builtin_expect( d == NULL, 0 ) )\
d = get_dispatch();\
(d-FUNC) ARGS
There is some extra cost in the non-threaded case, but it seems very 
minimal.  In the x86 assembly case, it's only a test and a 
conditional branch that is usually not taken.  Does this seem like a 
reasonable change to make across the board?

Hmm.  The _ts_* macros were introduced to eliminate exactly that sort 
of test - though we probably coded it up in a less optimal way than 
that.  Are you saying that the dispatch tables would really become 
compiled 'C'?  At the moment they are typically generated as assembly 
and use a jmp rather than calling a new function as in either of the 
examples above.

Assembly dispatch stubs are only generated for x86 and SPARC.  It looks 
like the #if test in dispatch.c is wrong, so that stubs don't even get 
used on SPARC.  In any case, Jakub's patch did modify the x86 assembly, 
not the C version.  I wasn't really clear about that before.  My 
proposal is to modify the C version, the x86 assembly version, and the 
SPARC assembly version.  I've worked up a patch to gl_x86_asm.py that I 
can post on Monday.

Some time in the next couple weeks I'm going to create PowerPC dispatch 
stubs.  The PPC ABI is a little odd, though, so it may not be trivial.

My feeling is that the non-threaded case should run as fast as 
possible, being the normal usage.  Maybe some timings would make 
things clearer.

Since the branch is going to be correctly predicted every time it's 
executed (in the non-threaded case), the performance hit should be on 
the order of a couple clock-cycles.  I should be able to get some 
timings on Monday or Tuesday.  I'll just do a loop of calling some GL 
function a million times or something.  Any idea which function would be 
likely to take the least time to execute?  I want to find the case where 
the dispatch functions have the most impact.
Just stick a return at the top of some random function  use that...
Keith

---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-24 Thread Keith Whitwell
Mike Mestnik wrote:
--- Ian Romanick [EMAIL PROTECTED] wrote:
Assembly dispatch stubs are only generated for x86 and SPARC.  It looks 
like the #if test in dispatch.c is wrong, so that stubs don't even get 
used on SPARC.  In any case, Jakub's patch did modify the x86 assembly, 
not the C version.  I wasn't really clear about that before.  My 
proposal is to modify the C version, the x86 assembly version, and the 
SPARC assembly version.  I've worked up a patch to gl_x86_asm.py that I 
can post on Monday.

Just to get every one on the same page.  The SPARC assembly version is
only for solaris.  It dose not, and can not, even build on linux!  Last
time I checked DRI would not build on SPARC/Linux, crashing on the Solaris
ASM code.
Hmm, there are definite references to __linux__ in the code - looks like the 
intention is there.  Why not look at the generator scripts  post a patch to 
fix it?

Keith

---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-24 Thread Ian Romanick
Mike Mestnik wrote:
--- Ian Romanick [EMAIL PROTECTED] wrote:
Assembly dispatch stubs are only generated for x86 and SPARC.  It looks 
like the #if test in dispatch.c is wrong, so that stubs don't even get 
used on SPARC.  In any case, Jakub's patch did modify the x86 assembly, 
not the C version.  I wasn't really clear about that before.  My 
proposal is to modify the C version, the x86 assembly version, and the 
SPARC assembly version.  I've worked up a patch to gl_x86_asm.py that I 
can post on Monday.
Just to get every one on the same page.  The SPARC assembly version is
only for solaris.  It dose not, and can not, even build on linux!  Last
time I checked DRI would not build on SPARC/Linux, crashing on the Solaris
ASM code.
Does the code in src/mesa/glapi/glapi.c (generate_entrypoint 
specifically) cause the crash or just the code in 
src/mesa/sparc/glapi_sparc.S?  My guess is that Linux uses 32-bit 
user-mode, but the asm code in glapi_sparc.S defaults to 64-bit on v9. 
Perhaps we could come up with a better define, such as 
USE_SPARC_32BIT_USER to determine which stubs to build.


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-24 Thread Ian Romanick
Ian Romanick wrote:
Mike Mestnik wrote:
--- Ian Romanick [EMAIL PROTECTED] wrote:
Assembly dispatch stubs are only generated for x86 and SPARC.  It 
looks like the #if test in dispatch.c is wrong, so that stubs don't 
even get used on SPARC.  In any case, Jakub's patch did modify the 
x86 assembly, not the C version.  I wasn't really clear about that 
before.  My proposal is to modify the C version, the x86 assembly 
version, and the SPARC assembly version.  I've worked up a patch to 
gl_x86_asm.py that I can post on Monday.
Just to get every one on the same page.  The SPARC assembly version is
only for solaris.  It dose not, and can not, even build on linux!  Last
time I checked DRI would not build on SPARC/Linux, crashing on the 
Solaris
ASM code.
Does the code in src/mesa/glapi/glapi.c (generate_entrypoint 
specifically) cause the crash or just the code in 
src/mesa/sparc/glapi_sparc.S?  My guess is that Linux uses 32-bit 
user-mode, but the asm code in glapi_sparc.S defaults to 64-bit on v9. 
Perhaps we could come up with a better define, such as 
USE_SPARC_32BIT_USER to determine which stubs to build.
I just committed a version of the entrypoint generator script based on 
the new XML Python code.  I played around with the output of that on a 
SPARC based SunOS 5.9 (from uname) box.  It looks like when 64-bit 
binaries are generated (i.e., -m64 is used), __arch64__ is defined.  If 
32-bit binaries are generated it is not defined.  This was true with GCC 
2.95.2, 3.0, and 3.3.  It turned out that __sparc_v9__ was *never* 
defined, even if -mcpu=v9 was supplied.

None of the SPARC systems that I have access to have Sun's compiler 
installed, so I don't know if that also defined __arch64__ when building 
64-bit binaries.

If SPARC Linux also gets the __arch64__ treatment, I'd suggest that we 
use that determine which version of the stub to use instead of a 
combination of __sparc_v9__ and __linux__.


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-24 Thread Ian Romanick
Ian Romanick wrote:
Alan Hourihane wrote:
Is there someone looking to integrate the TLS patches for libGL ??
We should certainly take a look soon and comment upon the patches used.
Here is a patch that covers part of what's in the Redhat patch.  This 
convert the static_functions table to a list of offsets instead of a 
list of pointers.  According to 'objdump -R' on the Mesa libGL, it cuts 
out about 1800 R_386_RELATIVE relocs.  However, the size of the library 
*increases* by about 24k.  That doesn't make sense to me.
Here's an updated version of that patch.  There are some significant 
differences.

1. *All* architectures use the string offset table.  To do this, 
gl_procs.py was modified to generate a big character array called 
gl_string_table in glprocs.h.  The static_functions array now contains 
offsets into that array instead of pointers to strings.  If gl_procs.py 
is invoked with '-m short' it will generate a (hard to read) character 
array.  If it is invoked with no option or '-m long' it will generate a 
big (~16k) string.  The string version of the .h file generates a 
warning from GCC.

2. The same glprocs.h is used even for the optimized x86 case.  This is 
done by defining NEED_FUNC_POINTERS only on non-x86.  Actually, it 
should only be defined on architectures that don't have generated 
assembly dispatch stubs.

3. All of the _ts_ dispatch code is *gone*.  The x86 assembly dispatch 
code and the C dispatch code reflect this.  The SPARC assembly dispatch 
has not yet been updated, but it should follow the x86 model.  This 
means that this cod will catch fire, fallover, and sink into the swamp 
on SPARC.  This will obviously need to be fixed before that portion of 
the patch is committed.

Unless there are objections, I would like to commit the new glprocs.h 
and the non-x86 specific code in glapi.c to support it.


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-24 Thread Ian Romanick
Ian Romanick wrote:
Here's an updated version of that patch.  There are some significant 
differences.
I hate it when I do that
Index: src/mesa/glapi/gl_procs.py
===
RCS file: /cvs/mesa/Mesa/src/mesa/glapi/gl_procs.py,v
retrieving revision 1.1
diff -u -d -r1.1 gl_procs.py
--- a/src/mesa/glapi/gl_procs.py18 May 2004 18:33:40 -  1.1
+++ b/src/mesa/glapi/gl_procs.py24 May 2004 23:04:12 -
@@ -36,48 +36,137 @@
 class PrintGlProcs(gl_XML.FilterGLAPISpecBase):
name = gl_procs.py (from Mesa)
 
-   def __init__(self):
+   def __init__(self, long_strings):
+   self.long_strings = long_strings
gl_XML.FilterGLAPISpecBase.__init__(self)
self.license = license.bsd_license_template % ( \
 Copyright (C) 1999-2001  Brian Paul   All Rights Reserved.
 (C) Copyright IBM Corporation 2004, BRIAN PAUL, IBM)
 
+
def printRealHeader(self):
-   print ''
print '/* This file is only included by glapi.c and is used for'
print ' * the GetProcAddress() function'
print ' */'
print ''
-   print 'static const struct name_address_offset static_functions[] = {'
+   print 'typedef struct {'
+   print 'int Name_offset;'
+   print '#ifdef NEED_FUNCTION_POINTER'
+   print 'void * Address;'
+   print '#endif'
+   print 'unsigned int Offset;'
+   print '} glprocs_table_t;'
+   print ''
+   print '#ifdef NEED_FUNCTION_POINTER'
+   print '#  define NAME_FUNC_OFFSET(n,f,o) { n , gl ## f , o }'
+   print '#else'
+   print '#  define NAME_FUNC_OFFSET(n,f,o) { n , o }'
+   print '#endif'
+   print ''
return
 
def printRealFooter(self):
-   print '   { NULL, NULL, 0 }  /* end of list marker */'
-   print '};'
+   print ''
+   print '#undef NAME_FUNC_OFFSET'
return
 
-   def printFunction(self, f):
-   print '   { gl%s, (GLvoid *) gl%s, _gloffset_%s },' \
-   % (f.name, f.name, f.real_name)
+   def printFunctionString(self, f):
+   if self.long_strings:
+   print 'gl%s\\0' % (f.name)
+   else:
+   print 'g','l',,
+   for c in f.name:
+   print '%s', % (c),
+   
+   print '\\0',
+
+   def printFunctionOffset(self, f, offset_of_name):
+   print 'NAME_FUNC_OFFSET( % 5u, %s, _gloffset_%s ),' % 
(offset_of_name, f.name, f.real_name)
+
+
+   def printFunctions(self):
+   print ''
+   if self.long_strings:
+   print 'static const char gl_string_table[] ='
+   else:
+   print 'static const char gl_string_table[] = {'
+
+   keys = self.functions.keys()
+   keys.sort()
+   for k in keys:
+   if k  0: continue
+   self.printFunctionString(self.functions[k])
+
+   keys.reverse()
+   for k in keys:
+   if k = -1: continue
+   self.printFunctionString(self.functions[k])
+
+   if self.long_strings:
+   print ';'
+   else:
+   print '};'
+
+   print ''
+   print 'static const glprocs_table_t static_functions[] = {'
+
+   keys = self.functions.keys()
+   keys.sort()
+   base_offset = 0
+   for k in keys:
+   if k  0: continue
+   self.printFunctionOffset(self.functions[k], base_offset)
+
+   # The length of the function's name, plus 2 for gl,
+   # plus 1 for the NUL.
+
+   base_offset += len(self.functions[k].name) + 3
+
+   keys.reverse()
+   for k in keys:
+   if k = -1: continue
+   self.printFunctionOffset(self.functions[k], base_offset)
+
+   # The length of the function's name, plus 2 for gl,
+   # plus 1 for the NUL.
+
+   base_offset += len(self.functions[k].name) + 3
+
+   print 'NAME_FUNC_OFFSET( -1, NULL, -1 )'
+   print '};'
+   return
 
 
 def show_usage():
-   print Usage: %s [-f input_file_name] % sys.argv[0]
+   print Usage: %s [-f input_file_name] [-m mode] % sys.argv[0]
+   print mode can be one of:
+   print long  - Create code for compilers that can handle very 
+   print long 

Re: Thread Local Storage libGL

2004-05-24 Thread Mike Mestnik
Currently every inst that referances a register(this is most of them)
needs a global .register setting??  As far as making the build system use
the C vs the asm I could also not find where this is soposed to happen.  I
got as far as stoping the asm from being built, but then I coulden't find
what C code I needed to replace it with.

--- Keith Whitwell [EMAIL PROTECTED] wrote:
 Mike Mestnik wrote:
  --- Ian Romanick [EMAIL PROTECTED] wrote:
  
 Assembly dispatch stubs are only generated for x86 and SPARC.  It
 looks 
 like the #if test in dispatch.c is wrong, so that stubs don't even get
 
 used on SPARC.  In any case, Jakub's patch did modify the x86
 assembly, 
 not the C version.  I wasn't really clear about that before.  My 
 proposal is to modify the C version, the x86 assembly version, and the
 
 SPARC assembly version.  I've worked up a patch to gl_x86_asm.py that
 I 
 can post on Monday.
 
  
  Just to get every one on the same page.  The SPARC assembly version is
  only for solaris.  It dose not, and can not, even build on linux! 
 Last
  time I checked DRI would not build on SPARC/Linux, crashing on the
 Solaris
  ASM code.
 
 Hmm, there are definite references to __linux__ in the code - looks like
 the 
 intention is there.  Why not look at the generator scripts  post a
 patch to 
 fix it?
 
 Keith
 





__
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-24 Thread Mike Mestnik
I know I posted the exact errmsg but esecaly it's missing a global
.register deffinition and it's just the sam code (.S).

I build it in 64bit (not using sparc32 to fake a 32bit system).  I needed
this cause DRI is knowen to not work with mixed user/kernel bitdepths and
I have a 64bit kernel.  glibc is built for both 64bit and 32bit binarys.

Normally the differance is...
sparc32 make all; # vs just running make all and this is needed for many
programs that don't take into account 64bit systems.

--- Ian Romanick [EMAIL PROTECTED] wrote:
 Mike Mestnik wrote:
  --- Ian Romanick [EMAIL PROTECTED] wrote:
  
 Assembly dispatch stubs are only generated for x86 and SPARC.  It
 looks 
 like the #if test in dispatch.c is wrong, so that stubs don't even get
 
 used on SPARC.  In any case, Jakub's patch did modify the x86
 assembly, 
 not the C version.  I wasn't really clear about that before.  My 
 proposal is to modify the C version, the x86 assembly version, and the
 
 SPARC assembly version.  I've worked up a patch to gl_x86_asm.py that
 I 
 can post on Monday.
  
  Just to get every one on the same page.  The SPARC assembly version is
  only for solaris.  It dose not, and can not, even build on linux! 
 Last
  time I checked DRI would not build on SPARC/Linux, crashing on the
 Solaris
  ASM code.
 
 Does the code in src/mesa/glapi/glapi.c (generate_entrypoint 
 specifically) cause the crash or just the code in 
 src/mesa/sparc/glapi_sparc.S?  My guess is that Linux uses 32-bit 
 user-mode, but the asm code in glapi_sparc.S defaults to 64-bit on v9. 
 Perhaps we could come up with a better define, such as 
 USE_SPARC_32BIT_USER to determine which stubs to build.
 
 





__
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-24 Thread Ian Romanick
Mike Mestnik wrote:
Currently every inst that referances a register(this is most of them)
needs a global .register setting??  As far as making the build system use
the C vs the asm I could also not find where this is soposed to happen.  I
got as far as stoping the asm from being built, but then I coulden't find
what C code I needed to replace it with.
The C dispatch stubs are in the header file src/mesa/glapi/glapitemps.h. 
 They are built in src/mesa/main/dispatch.c.


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-23 Thread Ian Romanick
One thing about Jakub's patch is that, on x86, it eliminates the need 
for the specialized _ts_* versions of the dispatch functions.  It 
basically converts the DISPATCH macro (as used in 
src/mesa/main/dispatch.c) from:

#define DISPATCH(FUNC, ARGS, MESSAGE) \
(_glapi_Dispatch-FUNC) ARGS
to:
#define DISPATCH(FUNC, ARGS, MESSAGE)   \
const struct _glapi_table * d = _glapi_Dispatch;\
if ( __builtin_expect( d == NULL, 0 ) ) \
d = get_dispatch(); \
(d-FUNC) ARGS
There is some extra cost in the non-threaded case, but it seems very 
minimal.  In the x86 assembly case, it's only a test and a conditional 
branch that is usually not taken.  Does this seem like a reasonable 
change to make across the board?


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-23 Thread Keith Whitwell
Ian Romanick wrote:
One thing about Jakub's patch is that, on x86, it eliminates the need 
for the specialized _ts_* versions of the dispatch functions.  It 
basically converts the DISPATCH macro (as used in 
src/mesa/main/dispatch.c) from:

#define DISPATCH(FUNC, ARGS, MESSAGE) \
(_glapi_Dispatch-FUNC) ARGS
to:
#define DISPATCH(FUNC, ARGS, MESSAGE) \
const struct _glapi_table * d = _glapi_Dispatch;\
if ( __builtin_expect( d == NULL, 0 ) )\
d = get_dispatch();\
(d-FUNC) ARGS
There is some extra cost in the non-threaded case, but it seems very 
minimal.  In the x86 assembly case, it's only a test and a conditional 
branch that is usually not taken.  Does this seem like a reasonable 
change to make across the board?
Hmm.  The _ts_* macros were introduced to eliminate exactly that sort of test 
- though we probably coded it up in a less optimal way than that.  Are you 
saying that the dispatch tables would really become compiled 'C'?  At the 
moment they are typically generated as assembly and use a jmp rather than 
calling a new function as in either of the examples above.

My feeling is that the non-threaded case should run as fast as possible, being 
the normal usage.  Maybe some timings would make things clearer.

Keith

---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-23 Thread Ian Romanick
Keith Whitwell wrote:
Ian Romanick wrote:
One thing about Jakub's patch is that, on x86, it eliminates the need 
for the specialized _ts_* versions of the dispatch functions.  It 
basically converts the DISPATCH macro (as used in 
src/mesa/main/dispatch.c) from:

#define DISPATCH(FUNC, ARGS, MESSAGE) \
(_glapi_Dispatch-FUNC) ARGS
to:
#define DISPATCH(FUNC, ARGS, MESSAGE) \
const struct _glapi_table * d = _glapi_Dispatch;\
if ( __builtin_expect( d == NULL, 0 ) )\
d = get_dispatch();\
(d-FUNC) ARGS
There is some extra cost in the non-threaded case, but it seems very 
minimal.  In the x86 assembly case, it's only a test and a conditional 
branch that is usually not taken.  Does this seem like a reasonable 
change to make across the board?
Hmm.  The _ts_* macros were introduced to eliminate exactly that sort of 
test - though we probably coded it up in a less optimal way than that.  
Are you saying that the dispatch tables would really become compiled 
'C'?  At the moment they are typically generated as assembly and use a 
jmp rather than calling a new function as in either of the examples above.
Assembly dispatch stubs are only generated for x86 and SPARC.  It looks 
like the #if test in dispatch.c is wrong, so that stubs don't even get 
used on SPARC.  In any case, Jakub's patch did modify the x86 assembly, 
not the C version.  I wasn't really clear about that before.  My 
proposal is to modify the C version, the x86 assembly version, and the 
SPARC assembly version.  I've worked up a patch to gl_x86_asm.py that I 
can post on Monday.

Some time in the next couple weeks I'm going to create PowerPC dispatch 
stubs.  The PPC ABI is a little odd, though, so it may not be trivial.

My feeling is that the non-threaded case should run as fast as possible, 
being the normal usage.  Maybe some timings would make things clearer.
Since the branch is going to be correctly predicted every time it's 
executed (in the non-threaded case), the performance hit should be on 
the order of a couple clock-cycles.  I should be able to get some 
timings on Monday or Tuesday.  I'll just do a loop of calling some GL 
function a million times or something.  Any idea which function would be 
likely to take the least time to execute?  I want to find the case where 
the dispatch functions have the most impact.


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-23 Thread Mike Mestnik

--- Ian Romanick [EMAIL PROTECTED] wrote:
 Assembly dispatch stubs are only generated for x86 and SPARC.  It looks 
 like the #if test in dispatch.c is wrong, so that stubs don't even get 
 used on SPARC.  In any case, Jakub's patch did modify the x86 assembly, 
 not the C version.  I wasn't really clear about that before.  My 
 proposal is to modify the C version, the x86 assembly version, and the 
 SPARC assembly version.  I've worked up a patch to gl_x86_asm.py that I 
 can post on Monday.
 
Just to get every one on the same page.  The SPARC assembly version is
only for solaris.  It dose not, and can not, even build on linux!  Last
time I checked DRI would not build on SPARC/Linux, crashing on the Solaris
ASM code.





__
Do you Yahoo!?
Yahoo! Domains – Claim yours for only $14.70/year
http://smallbusiness.promotions.yahoo.com/offer 


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Thread Local Storage libGL

2004-05-21 Thread Alan Hourihane
I emailed Keith regarding this a while back and he had some concerns
over the patches used, but I just wanted to bring to light both RedHat
and now Mandrake are shipping with the TLS versions of libGL and cause
the binary DRI packages to break.

Is there someone looking to integrate the TLS patches for libGL ??

We should certainly take a look soon and comment upon the patches used.

Alan.


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-21 Thread Keith Whitwell
Alan Hourihane wrote:
I emailed Keith regarding this a while back and he had some concerns
over the patches used, but I just wanted to bring to light both RedHat
and now Mandrake are shipping with the TLS versions of libGL and cause
the binary DRI packages to break.
Is there someone looking to integrate the TLS patches for libGL ??
We should certainly take a look soon and comment upon the patches used.
I'd really like to get good support for modern TLS into our libGL.  My big 
problem was that the patches were made against generated files and had rapidly 
become out-of-date.

Keith

---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-21 Thread Alan Hourihane
Attached is the patches from redhat source tree which are based on
XFree86 4.3.0 for review.

I think this code hits a lot of stuff that Ian has been working on, so
it'd be good if Ian has some comments.

Alan.


XFree86-4.3.0-redhat-libGL-opt-v2.patch.bz2
Description: BZip2 compressed data


XFree86-4.3.0-redhat-libGL-TLS-buildfix.patch.bz2
Description: BZip2 compressed data


Re: Thread Local Storage libGL

2004-05-21 Thread Brian Paul
Keith Whitwell wrote:
Alan Hourihane wrote:
I emailed Keith regarding this a while back and he had some concerns
over the patches used, but I just wanted to bring to light both RedHat
and now Mandrake are shipping with the TLS versions of libGL and cause
the binary DRI packages to break.
Is there someone looking to integrate the TLS patches for libGL ??
We should certainly take a look soon and comment upon the patches used.

I'd really like to get good support for modern TLS into our libGL.  My 
big problem was that the patches were made against generated files and 
had rapidly become out-of-date.
I'm interested in looking at this.  Where might I find the modified 
TLS sources/patches?

-Brian

---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-21 Thread Ian Romanick
Keith Whitwell wrote:
Alan Hourihane wrote:
I emailed Keith regarding this a while back and he had some concerns
over the patches used, but I just wanted to bring to light both RedHat
and now Mandrake are shipping with the TLS versions of libGL and cause
the binary DRI packages to break.
Is there someone looking to integrate the TLS patches for libGL ??
We should certainly take a look soon and comment upon the patches used.
I'd really like to get good support for modern TLS into our libGL.  My 
big problem was that the patches were made against generated files and 
had rapidly become out-of-date.
Getting some of the TLS related functionality in to our libGL has bee on 
my (unmanageably long) to-do list for some time now.  It's part of the 
reason I started creating new API generator scripts.

The other technical problem with Jakub's original patch is the GL_CALL 
macro.  We *cannot* directly call gl* functions from a driver.  We just 
go directly through the dispatch table.  Basically, this is because apps 
do stupid things like 'glMultiTexCoord2fARB = 
glXGetProcAddress(glMultiTexCoord2fARB).  In the driver, if we call 
glMultiTexCoordARB directly, we jump to the app's pointer and instantly 
crash.

We can cut down the size of the patch by converting all the 
'glDispatch-' calls to use GL_CALL.  For now, we need to just define 
GL_CALL to be 'glDispatch- func'.  My guess is that the final version 
will need to be an inline assembly stub.


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-21 Thread Alan Hourihane
On Fri, May 21, 2004 at 03:08:41PM +0100, Keith Whitwell wrote:
 Alan Hourihane wrote:
 I emailed Keith regarding this a while back and he had some concerns
 over the patches used, but I just wanted to bring to light both RedHat
 and now Mandrake are shipping with the TLS versions of libGL and cause
 the binary DRI packages to break.
 
 Is there someone looking to integrate the TLS patches for libGL ??
 
 We should certainly take a look soon and comment upon the patches used.
 
 I'd really like to get good support for modern TLS into our libGL.  My big 
 problem was that the patches were made against generated files and had 
 rapidly become out-of-date.

O.k. I'll try and muster up a patch for people to review and post it here.

Alan.


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-21 Thread Brian Paul
Ian Romanick wrote:
Keith Whitwell wrote:
Alan Hourihane wrote:
I emailed Keith regarding this a while back and he had some concerns
over the patches used, but I just wanted to bring to light both RedHat
and now Mandrake are shipping with the TLS versions of libGL and cause
the binary DRI packages to break.
Is there someone looking to integrate the TLS patches for libGL ??
We should certainly take a look soon and comment upon the patches used.

I'd really like to get good support for modern TLS into our libGL.  My 
big problem was that the patches were made against generated files and 
had rapidly become out-of-date.

Getting some of the TLS related functionality in to our libGL has bee on 
my (unmanageably long) to-do list for some time now.  It's part of the 
reason I started creating new API generator scripts.

The other technical problem with Jakub's original patch is the GL_CALL 
macro.  We *cannot* directly call gl* functions from a driver.  We just 
go directly through the dispatch table.  Basically, this is because apps 
do stupid things like 'glMultiTexCoord2fARB = 
glXGetProcAddress(glMultiTexCoord2fARB).  In the driver, if we call 
glMultiTexCoordARB directly, we jump to the app's pointer and instantly 
crash.

We can cut down the size of the patch by converting all the 
'glDispatch-' calls to use GL_CALL.  For now, we need to just define 
GL_CALL to be 'glDispatch- func'.  My guess is that the final version 
will need to be an inline assembly stub.

The patches appear to be against a 4.0.x version of Mesa.  Bringing 
the changes forwared to the current code is going to take some work. 
I'm willing to work on it, but I'm not going to have much/any time for 
it over the next week or so.

-Brian

---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149alloc_id=8166op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Thread Local Storage libGL

2004-05-21 Thread Ian Romanick
Alan Hourihane wrote:
Is there someone looking to integrate the TLS patches for libGL ??
We should certainly take a look soon and comment upon the patches used.
Here is a patch that covers part of what's in the Redhat patch.  This 
convert the static_functions table to a list of offsets instead of a 
list of pointers.  According to 'objdump -R' on the Mesa libGL, it cuts 
out about 1800 R_386_RELATIVE relocs.  However, the size of the library 
*increases* by about 24k.  That doesn't make sense to me.

Is there some other tool that should be used for this type of thing? 
Does anyone know what Jakub used for the analysis included with his 
original patch?  Is there any good documentation available on this 
stuff?  I haven't worked on this level since the last 40k intro I did on 
the Amiga.

http://marc.theaimsgroup.com/?l=dri-develm=105187260618425w=2
Index: src/mesa/sources
===
RCS file: /cvs/mesa/Mesa/src/mesa/sources,v
retrieving revision 1.11
diff -u -d -r1.11 sources
--- a/src/mesa/sources  27 Apr 2004 13:39:20 -  1.11
+++ b/src/mesa/sources  21 May 2004 22:51:16 -
@@ -167,8 +167,9 @@
x86/sse_xform2.S\
x86/sse_xform3.S\
x86/sse_xform4.S\
-   x86/sse_normal.S \
-   tnl/t_vtx_x86_gcc.S
+   x86/sse_normal.S\
+   tnl/t_vtx_x86_gcc.S \
+   glapi/glprocs_x86.S
 
 SPARC_SOURCES =\
sparc/clip.S\
Index: src/mesa/glapi/gl_procs.py
===
RCS file: /cvs/mesa/Mesa/src/mesa/glapi/gl_procs.py,v
retrieving revision 1.1
diff -u -d -r1.1 gl_procs.py
--- a/src/mesa/glapi/gl_procs.py18 May 2004 18:33:40 -  1.1
+++ b/src/mesa/glapi/gl_procs.py21 May 2004 22:51:17 -
@@ -61,23 +61,70 @@
% (f.name, f.name, f.real_name)
 
 
+class PrintGlProcsX86(gl_XML.FilterGLAPISpecBase):
+   name = gl_procs.py (from Mesa)
+
+   def __init__(self):
+   gl_XML.FilterGLAPISpecBase.__init__(self)
+   self.license = license.bsd_license_template % ((C) Copyright IBM 
Corporation 2004, IBM)
+
+   def printRealHeader(self):
+   print ''
+   print '#include glapioffsets.h'
+   print ''
+   print '/* This file is only linked with glapi.c and is used for'
+   print ' * the GetProcAddress() function'
+   print ' */'
+   print ''
+   print '\t\t.section .rodata.str1.1'
+   print '\t\t.globl gl_string_table_start'
+   print 'gl_string_table_start:'
+   print '\t\t.section .rodata.gl_static_functions'
+   print '\t\t.globl static_functions_x86'
+   print 'static_functions_x86:'
+   return
+
+   def printRealFooter(self):
+   print '\t\t.section .rodata.gl_static_functions'
+   print '\t\t.long\t-1, 0'
+   return
+
+   def printFunction(self, f):
+   print ''
+   print '\t\t.section .rodata.str1.1'
+   print 'gl%s_string:\t.string gl%s' % (f.name, f.name)
+   print '\t\t.section .rodata.gl_static_functions'
+   print '\t\t.long\tgl%s_string - gl_string_table_start, _gloffset_%s' % 
(f.name, f.real_name)
+
+
 def show_usage():
-   print Usage: %s [-f input_file_name] % sys.argv[0]
+   print Usage: %s [-f input_file_name] [-m mode] % sys.argv[0]
+   print 'mode can be one of:'
+   print 'h   - emit a C header file (default)'
+   print 'x86 - emit an x86 assembly file'
sys.exit(1)
 
 if __name__ == '__main__':
file_name = gl_API.xml
 
try:
-   (args, trail) = getopt.getopt(sys.argv[1:], f:)
+   (args, trail) = getopt.getopt(sys.argv[1:], f:m:)
except Exception,e:
show_usage()
 
+   mode = h
for (arg,val) in args:
if arg == -f:
file_name = val
+   elif arg == -m:
+   mode = val
 
-   dh = PrintGlProcs()
+   if mode == h:
+   dh = PrintGlProcs()
+   elif mode == x86:
+   dh = PrintGlProcsX86()
+   else:
+   show_usage()
 
parser = make_parser()
parser.setFeature(feature_namespaces, 0)
Index: src/mesa/glapi/gl_x86_asm.py
===
RCS file: /cvs/mesa/Mesa/src/mesa/glapi/gl_x86_asm.py,v
retrieving revision 1.1
diff -u -d -r1.1 gl_x86_asm.py
--- a/src/mesa/glapi/gl_x86_asm.py  18 May 2004 18:33:40 -  1.1
+++ b/src/mesa/glapi/gl_x86_asm.py  21 May 2004 22:51:17 -
@@ -87,6 +87,9 @@
print 'SEG_TEXT'
print 'EXTERN GLNAME(_glapi_Dispatch)'
print ''
+   print