Roland Scheidegger wrote:
> Keith Whitwell wrote:
>> Brian Paul wrote:
>>>  src/mesa/main/context.c            |    8 ++++----
>>>  src/mesa/shader/slang/slang_emit.c |   23 +++++++++++++++++++----
>>>  src/mesa/tnl/t_vb_arbprogram.c     |    5 ++++-
>>>  3 files changed, 27 insertions(+), 9 deletions(-)
>>>
>>> New commits:
>>> diff-tree 64e8088667d000a70beff93e8c300ac0bd261a60 (from 
>>> 3dfcd48469b63c601010ea43e0d5e9ea1dc5dfab)
>>> Author: Brian <[EMAIL PROTECTED]>
>>> Date:   Mon Apr 16 10:36:28 2007 -0600
>>>
>>>     Use generic program limits instead of NV-specific ones to init program 
>>> constants.
>>>     
>>>     Previously, this limited us to 12 temp regs for vertex programs.  Many 
>>> vertex
>>>     shaders could exceed that.  This forces us to stop using 
>>> t_vb_arbprogram.c
>>>     for now because of its particular register indexing scheme.  Need to 
>>> increase
>>>     bits allocated for register indexing, etc.
>> That code is utterly dead - feel free to remove it.
> 
> The demise of the sse path though is a pity. It was an order of
> magnitude faster than t_vb_arbprogram, and still is compared to the new
> code of course. Granted, nothing prevents anyone from implementing a sse
> backend...
> Out of curiousity, I did some quick profiling (single timedemo run of
> doom3) to see where the time is actually spent (compiled with -O1, there
> were lots of visual glitches due to the trouble with ftransform/position
> invariant programs not being invariant when using -ffast-math, but it
> shouldn't make a difference for that).
> 
> CPU: AMD64 processors, speed 2002.84 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
> unit mask of 0x00 (No unit mask) count 100000
> samples  %        image name    app name         symbol name
> 2847855  29.7411  r200_dri.so   r200_dri.so      fetch_vector4
> 1303158  13.6093  r200_dri.so   r200_dri.so      _mesa_execute_program
> 1138472  11.8894  r200_dri.so   r200_dri.so      store_vector4
> 903745    9.4381  r200_dri.so   r200_dri.so      run_vp
> 577698    6.0331  doom.x86      doom.x86         (no symbols)
> So, it maybe shouldn't come as news, but it's not the actual math which
> is really slow - that's only 14% above. The real killer is the fetch /
> store of values, which is 51% in this example (if you count run_vp too,
> which spends its time most likely just for another round of copying
> input/output values around).

Yeah, it's not as fast but we'll see what we can do about that.  A 
replacment SSE module should also handle fragment programs.

Just to elaborate on my check-in message, the t_vb_arbprogram[sse] code 
was lacking a few things that are needed now too:

- conditionals/branching
- looping
- subroutines
- texture fetches

-Brian

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to