On 03.12.2009 19:46, Matt Turner wrote: > Most of the functions in imports.c are very small, to the function > call overhead is large relative to their size. Can't we do something > like in the attached patch and move them to imports.h and mark them > static inline? Things like memcpy and memset are often optimized out > by the compiler, and by hiding them in these wrapper functions, we're > probably losing any benefits we'd otherwise see. ++ from me, at least for the very simple wrappers. _mesa_memcpy especially I think can be very nicely used for array assignments and the like, and in case of (very) small amounts of data to copy call overhead might be significant.
> Similarly, if we're > going to use a magic sqrtf algorithm (apparently for speed) then > shouldn't we also let the compiler properly inline the function? Not sure here, the function is still quite complex, I don't think call overhead will make any difference. I've looked at the code though when it wasn't using the fast path (with -O3 but DEBUG - why is this different?) This version though adds a lot of overhead: - call overhead for _mesa_sqrtf - overhead converting to double - overhead converting back In the generated code the actual sqrtf code was a single assembly instruction (sqrtsd %xmm0, %xmm0) - granted that's SSE2 only, and it requires quite a few cycles. Still, I guess the overhead is significant, not to mention that if we'd just use a float instead of double not only we wouldn't have to convert the type but the compiler would actually issue sqrtss %xmm0 %xmm0 instead, which is (depending on the cpu) twice as fast. Not sure why we use double there, are there platforms where sqrtf (float x) isn't supported? So really, call overhead is a tiny fraction of the optimization potential for this function. When not using DEBUG (and USE_IEEE is defined) the function is still quite a few cycles, so call overhead doesn't look that bad neither. I don't actually know which version is faster (or more accurate - I think though sqrtss is actually fully accurate). Of course using sqrtf(x) will only be fast if the cpu supports some kind of fast float unit (and the compiler knows how to use it). If you'd want to do some more optimization, there's for instance _mesa_inv_sqrtf - it is supposedly fast, but sse2 offers rsqrtss, which is really fast. However, I remember we got some bugs some time ago when gcc actually used that, because precision wasn't enough - it will do this if you enable -funsafe-math-optimizations, -mrecip or similar. I've just seen though actually that at least gcc 4.4 does an additional newton-raphson step when you do 1.0/sqrtf(float x) (so it will issue rsqrtss plus a couple muls and adds), which might still be less or even more accurate, and almost certainly be faster than the manual version. So there's probably far more optimization potential than the call overhead. Most of those functions are probably never used in any performance critical path anyway. > > I also don't quite understand wrapper functions like > double > _mesa_pow(double x, double y) > { > return pow(x, y); > } > > Maybe at one time these had #ifdefs in them like _mesa_memcpy, but I > can't see any reason not to remove it now. > > Someone enlighten me. I guess there might have been indeed #ifdefs in the past. In any case, using wrapper would make it easier to implement such optimizations in the future if anyone wants to, not that this is something which you probably want to do (that stuff is probably better left up to the compiler). So, at least if they are inlined, they shouldn't really hurt neither. ------------------------------------------------------------------------------ Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev