-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nicholas Miell wrote:
> On Mon, 27 Feb 2006 15:25:03 -0800, Ian Romanick wrote:
> 
>>After listening to a couple fairly vocal people squawk about the x86-64
>>dispatch stubs, I spent some time investigating the raised issues.  The
>>primary issue is that the TLS versions of the stubs contains an
>>unnecessary function call to get the dispatch pointer.
> 
> I wasn't "squawking", I was complaining that your stated objections to the
> patch were based on erroneous facts. I would've been perfectly happy with
> "the advertised performance benefit isn't worth the effort involved"
> (although, there wasn't all that much effort).

You also weren't the only one, just the most recent. :)  I've received
e-mails from several people on this and closely related subjects.
Besides, I had to say *something* at least semi-inflamitory to make sure
I got people's attention. >:)

>>The results are not impressive.  The libGL.so with the modified dispatch
>>routines is 13KiB larger.  
> 
> That's odd. The dispatch routines are 16-byte aligned and the inlining
> doesn't grow the size of the routine above 16-bytes. Did actual .text size
> change, or just the library on-disk size?

Disk size.  I did 'strip -g libGL.so.1.2 ; ls -l libGL.so.1.2'.  I can
check the .text size tomorrow.  I don't have access to the x86-64 system
right now.

>>                          The measured API overhead was, at best, 1 clock
>>cycle faster.  In most cases the measured overhead was much, much less
>>than the resolution of the measurement apparatus (e.g., glFogCoordfEXT
>>scored 71.284420 for the original vs. 71.280840 for the modified).
>>
>>Given these results, I'm inclined to leave the x86-64 assembly dispatch
>>stubs as they are.  Evidence showing either a benchmark where the
>>modified dispatch stubs are faster or showing some flaw in my testing
>>methodology would, naturally, give me reason to revisit this issue.  In
>>the mean time, I am considering it closed.
> 
> Does the benchmark test the effects of the return address stack
> overflowing? I don't know how deep call chains are typically in
> high-performance GL applications, but that extra entry on the function
> call stack might cause mispredictions on return. (Of course, if the call
> depth below the dispatch routine already exceeds the size of the RAS, this
> is irrelevant.)

There are basically two classes of functions in the GL API.  The class
where the dispatch overhead has the most impact are the functions like
glVertex3f.  In these cases, most drivers will plug an optimized
function directly into the dispatch table that stores the incoming data
and returns.  They tend to be very short "leaf" functions.

Some state change functions may fall into this category but to a lesser
degree.  Most of the state change functions aren't intended to be on the
performance path (i.e., you don't call them in an inner loop), so I
don't consider them relevant in this case.

The other class of functions are those that trigger a significant amount
of work in the driver or on the GPU.  glDrawArrays or glTexImage2D fall
into this category.  In all these cases the overhead incurred by the
dispatch is just noise.  A dispatch overhead of even 300 cycles (it's
actually more like 70 in the TLS case on x86-64) make little difference
to a function that takes 10,000 cycles to execute.

So, the short answer is that the call depth shouldn't incur any
additional penalty on the functions where the dispatch overhead has an
impact.

>>If someone is really excited about improving the state of things on
>>x86-64, they might choose to investigate adding code to dynamically
>>generate dispatch functions for newly registered (by a DRI driver at
>>run-time) extension functions.  This is currently done for x86, SPARC,
>>and Alpha, but not for x86-64, PowerPC, or IA-64.
> 
> How does dynamically generated dispatch functions improve performance? Are
> the routines different depending on whether or not the app is threaded?

This isn't an performance issue.  It's a functionality issue.  Right now
Mesa has no implementation (dispatch or otherwise) for some existing GL
extensions (e.g., GL_ARB_vertex_blend) or any yet-to-be-invented
extensions.  Dynamic dispatch support is needed so that a loaded DRI
driver (e.g., fglx_dri.so) can request a new function (e.g.,
glVertexBlendARB) be made available to applications.

Even without considering binary drivers, it allows us to expose new
extensions in open drivers without requiring that the user update their
libGL.  At some point in the future it will also allow us to *remove*
static dispatch stubs for functions that have never been implemeneted in
any DRI driver (e.g., glTexImage4DSGIX).

> For that matter, why does Mesa have it's own reimplementation of dlsym()
> (or the equivalent for your platform of choice)?

glXGetProcAddress is part of the GLX API.  AFAIK, it was created
/before/ most platforms had dlsym.  It's intended to be a cross-platform
way to probe for extension function addresses.  It's there because it's
part of the API that we implement.

> (Also, there doesn't seem to be anything Alpha-related in Mesa.)

That's weird.  I could have sworn that generate_entrypoint
(src/mesa/glapi/glapi.c) had support for Alpha.  I guess I was mistaken.
 I guess it doesn't matter at this point.  For Mesa, ARM or MIPS is
probably a more relevant platform than Alpha. :(
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEA7a6X1gOwKyEAw8RAtvEAJ9mSlyRvmUX/9J1B+Kwblzvx3DehQCaAuNk
rRRHWNMvwPgDE4u3gknvZms=
=Qxh0
-----END PGP SIGNATURE-----


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to