Roland Scheidegger wrote: > Keith Whitwell wrote: >> Brian Paul wrote: >>> src/mesa/main/context.c | 8 ++++---- >>> src/mesa/shader/slang/slang_emit.c | 23 +++++++++++++++++++---- >>> src/mesa/tnl/t_vb_arbprogram.c | 5 ++++- >>> 3 files changed, 27 insertions(+), 9 deletions(-) >>> >>> New commits: >>> diff-tree 64e8088667d000a70beff93e8c300ac0bd261a60 (from >>> 3dfcd48469b63c601010ea43e0d5e9ea1dc5dfab) >>> Author: Brian <[EMAIL PROTECTED]> >>> Date: Mon Apr 16 10:36:28 2007 -0600 >>> >>> Use generic program limits instead of NV-specific ones to init program >>> constants. >>> >>> Previously, this limited us to 12 temp regs for vertex programs. Many >>> vertex >>> shaders could exceed that. This forces us to stop using >>> t_vb_arbprogram.c >>> for now because of its particular register indexing scheme. Need to >>> increase >>> bits allocated for register indexing, etc. >> That code is utterly dead - feel free to remove it. > > The demise of the sse path though is a pity. It was an order of > magnitude faster than t_vb_arbprogram, and still is compared to the new > code of course. Granted, nothing prevents anyone from implementing a sse > backend... > Out of curiousity, I did some quick profiling (single timedemo run of > doom3) to see where the time is actually spent (compiled with -O1, there > were lots of visual glitches due to the trouble with ftransform/position > invariant programs not being invariant when using -ffast-math, but it > shouldn't make a difference for that). > > CPU: AMD64 processors, speed 2002.84 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a > unit mask of 0x00 (No unit mask) count 100000 > samples % image name app name symbol name > 2847855 29.7411 r200_dri.so r200_dri.so fetch_vector4 > 1303158 13.6093 r200_dri.so r200_dri.so _mesa_execute_program > 1138472 11.8894 r200_dri.so r200_dri.so store_vector4 > 903745 9.4381 r200_dri.so r200_dri.so run_vp > 577698 6.0331 doom.x86 doom.x86 (no symbols) > So, it maybe shouldn't come as news, but it's not the actual math which > is really slow - that's only 14% above. The real killer is the fetch / > store of values, which is 51% in this example (if you count run_vp too, > which spends its time most likely just for another round of copying > input/output values around).
Yeah, it's not as fast but we'll see what we can do about that. A replacment SSE module should also handle fragment programs. Just to elaborate on my check-in message, the t_vb_arbprogram[sse] code was lacking a few things that are needed now too: - conditionals/branching - looping - subroutines - texture fetches -Brian ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev