Keith Whitwell wrote:
Ian Romanick wrote:

One thing about Jakub's patch is that, on x86, it eliminates the need for the specialized _ts_* versions of the dispatch functions. It basically converts the DISPATCH macro (as used in src/mesa/main/dispatch.c) from:

#define DISPATCH(FUNC, ARGS, MESSAGE) \
    (_glapi_Dispatch->FUNC) ARGS

to:

#define DISPATCH(FUNC, ARGS, MESSAGE)                 \
    const struct _glapi_table * d = _glapi_Dispatch;    \
    if ( __builtin_expect( d == NULL, 0 ) )            \
        d = get_dispatch();                \
    (d->FUNC) ARGS

There is some extra cost in the non-threaded case, but it seems very minimal. In the x86 assembly case, it's only a test and a conditional branch that is usually not taken. Does this seem like a reasonable change to make across the board?

Hmm. The _ts_* macros were introduced to eliminate exactly that sort of test - though we probably coded it up in a less optimal way than that. Are you saying that the dispatch tables would really become compiled 'C'? At the moment they are typically generated as assembly and use a jmp rather than calling a new function as in either of the examples above.

Assembly dispatch stubs are only generated for x86 and SPARC. It looks like the #if test in dispatch.c is wrong, so that stubs don't even get used on SPARC. In any case, Jakub's patch did modify the x86 assembly, not the C version. I wasn't really clear about that before. My proposal is to modify the C version, the x86 assembly version, and the SPARC assembly version. I've worked up a patch to gl_x86_asm.py that I can post on Monday.


Some time in the next couple weeks I'm going to create PowerPC dispatch stubs. The PPC ABI is a little odd, though, so it may not be trivial.

My feeling is that the non-threaded case should run as fast as possible, being the normal usage. Maybe some timings would make things clearer.

Since the branch is going to be correctly predicted every time it's executed (in the non-threaded case), the performance hit should be on the order of a couple clock-cycles. I should be able to get some timings on Monday or Tuesday. I'll just do a loop of calling some GL function a million times or something. Any idea which function would be likely to take the least time to execute? I want to find the case where the dispatch functions have the most impact.





-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
--
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to