Mårten Björkman wrote:
> 
> I'm trying to estimate the amount of work required in order to
> optimize the transformation functions for Pentium III. Since the
> functions are many and some of them won't benefit much from using the
> new SIMD instructions, it's probably preferable to spend most energy
> on the most frequently used functions and forget about the rest.
> 
> Does anyone know which functions one ought to concentrate on?
> 

Here are the top entries for my application:
gl_x86_transform_points3_3d_raw
gl_x86_cliptest_points4
gl_x86_transform_points3_3d_no_rot_raw
gl_x86_transform_points3_perspective_raw

I have tested these in a Windows95/Pentium Pro/Vtune environment

The cliptest_points contains a fdiv instruction which seems
(according to Vtune) to be responsible for
at least half of the CPU consumption of the whole function.
The heavy CPU penalty may be caused partly by the fact that
the instruction also gets the datacache miss penalties,
as it is the first instruction which accesses vertex data
in the loop.

Quite interestingly Vtune also places almost half of the
CPU hit of the transform_points3_3d_raw operation to the
first multiply instruction. I think this means that at least
in my setup it is not the arithmetic which is expensive
but fetching the data from the main memory to the FPU.

I am using quite a lot of CVA stuff so I suspect
that the transformation engines are fed directly
from my application data. If I would use vertex3f
commands it might be that the cache miss penalty would
be in my application instead of the opengl library.


In any case issues like "pipelining for cache"
should be considered in addition to the minimizing
of the CPU cycles for the actual arithmetic.


                        Eero


_______________________________________________
Mesa-dev maillist  -  [EMAIL PROTECTED]
http://lists.mesa3d.org/mailman/listinfo/mesa-dev

Reply via email to