Re: [Mesa-dev] Transformation functions
On Sun, 12 Sep 1999, Eero Pajarre wrote: Here are the top entries for my application: gl_x86_transform_points3_3d_raw gl_x86_cliptest_points4 gl_x86_transform_points3_3d_no_rot_raw gl_x86_transform_points3_perspective_raw Thank you Eero and Keith for you information. It seems that the most critical functions are quite limited in number after all. My problem right now is that I haven't really got the PIII operations running at all. The compilation works alright, but I get exceptions as soon as I try the XMM ops. Maybe the kernel I'm using hasn't set up the proper control bits to allow XMM. The only PIII in our lab is actually an autonomous robot and people probably do not like me to install some unofficial patches on that machine. The cliptest_points contains a fdiv instruction which seems (according to Vtune) to be responsible for at least half of the CPU consumption of the whole function. This function should probably be quite easy to parallelize and I suppose the projections should be kept within the function. It's hard to see them elsewhere. Quite interestingly Vtune also places almost half of the CPU hit of the transform_points3_3d_raw operation to the first multiply instruction. I think this means that at least in my setup it is not the arithmetic which is expensive but fetching the data from the main memory to the FPU. It's probably because the data is aligned and cache misses (when they occur) always occur in the beginning of the loop. I did some tests before and concluded that cache misses (on our 450MHz PIII) cost as follows: L1 read miss: 7 cycles L1 write miss: 37 cycles L2 read miss: 26 cycles L2 write miss: 80 cycles Yes, those number are hard to believe indeed, but you can be quite sure that they are in that neighbourhood. If all the coordinate data won't fit into the caches, one could expect misses to occur once for every two coordinates (2x16 bytes). So, the data reads (and writes) will surely dominate. Lots of speed is probably to be gained by careful use of the new prefetching operations. In any case issues like "pipelining for cache" should be considered in addition to the minimizing of the CPU cycles for the actual arithmetic. I definitely agree! One should also try to store as little temporary data as possible in dedicated memory. It's better reusing the same memory locations for different kinds of data. This can, however, be a little meesy and almost impossible to read. Eero Pajarre? Sounds much like an old friend from my years at Mentor Graphics. :-) Sorry for not responding earlier. Since my back and ribs have prevented me from sitting too much infront of the screen, I've been a little off grid for a while. Crayfish parties are very popular in Sweden and this years party was no exception. This year I managed to break a number of ribs, falling down from a pier. ___ Mesa-dev maillist - [EMAIL PROTECTED] http://lists.mesa3d.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Transformation functions
Mårten Björkman wrote: I'm trying to estimate the amount of work required in order to optimize the transformation functions for Pentium III. Since the functions are many and some of them won't benefit much from using the new SIMD instructions, it's probably preferable to spend most energy on the most frequently used functions and forget about the rest. Does anyone know which functions one ought to concentrate on? Here are the top entries for my application: gl_x86_transform_points3_3d_raw gl_x86_cliptest_points4 gl_x86_transform_points3_3d_no_rot_raw gl_x86_transform_points3_perspective_raw I have tested these in a Windows95/Pentium Pro/Vtune environment The cliptest_points contains a fdiv instruction which seems (according to Vtune) to be responsible for at least half of the CPU consumption of the whole function. The heavy CPU penalty may be caused partly by the fact that the instruction also gets the datacache miss penalties, as it is the first instruction which accesses vertex data in the loop. Quite interestingly Vtune also places almost half of the CPU hit of the transform_points3_3d_raw operation to the first multiply instruction. I think this means that at least in my setup it is not the arithmetic which is expensive but fetching the data from the main memory to the FPU. I am using quite a lot of CVA stuff so I suspect that the transformation engines are fed directly from my application data. If I would use vertex3f commands it might be that the cache miss penalty would be in my application instead of the opengl library. In any case issues like "pipelining for cache" should be considered in addition to the minimizing of the CPU cycles for the actual arithmetic. Eero ___ Mesa-dev maillist - [EMAIL PROTECTED] http://lists.mesa3d.org/mailman/listinfo/mesa-dev
[Mesa-dev] Transformation functions
I'm trying to estimate the amount of work required in order to optimize the transformation functions for Pentium III. Since the functions are many and some of them won't benefit much from using the new SIMD instructions, it's probably preferable to spend most energy on the most frequently used functions and forget about the rest. Does anyone know which functions one ought to concentrate on? / Mårten Björkman ___ Mesa-dev maillist - [EMAIL PROTECTED] http://lists.mesa3d.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Transformation functions
Mårten Björkman wrote: I'm trying to estimate the amount of work required in order to optimize the transformation functions for Pentium III. Since the functions are many and some of them won't benefit much from using the new SIMD instructions, it's probably preferable to spend most energy on the most frequently used functions and forget about the rest. Does anyone know which functions one ought to concentrate on? The functions with cullmask are less used than those without. Functions for 1 and 2 vertices are less used than the 3 and 4 cases. Probably the top few are: cliptest-points-4 transform_points3_general_raw transform_points3_3d_no_rot_raw I'd also add transform_points3_3d_raw and have a look at the ones used by the fx/mga fast paths in vertices.c. These are the ones used by q3. Keith ___ Mesa-dev maillist - [EMAIL PROTECTED] http://lists.mesa3d.org/mailman/listinfo/mesa-dev