On 03.12.2009 19:46, Matt Turner wrote:
> Most of the functions in imports.c are very small, to the function
> call overhead is large relative to their size. Can't we do something
> like in the attached patch and move them to imports.h and mark them
> static inline? Things like memcpy and memset are often optimized out
> by the compiler, and by hiding them in these wrapper functions, we're
> probably losing any benefits we'd otherwise see.
++ from me, at least for the very simple wrappers. _mesa_memcpy
especially I think can be very nicely used for array assignments and the
like, and in case of (very) small amounts of data to copy call overhead
might be significant.

> Similarly, if we're
> going to use a magic sqrtf algorithm (apparently for speed) then
> shouldn't we also let the compiler properly inline the function?
Not sure here, the function is still quite complex, I don't think call
overhead will make any difference. I've looked at the code though when
it wasn't using the fast path (with -O3 but DEBUG - why is this different?)
This version though adds a lot of overhead:
- call overhead for _mesa_sqrtf
- overhead converting to double
- overhead converting back
In the generated code the actual sqrtf code was a single assembly
instruction (sqrtsd %xmm0, %xmm0) - granted that's SSE2 only, and it
requires quite a few cycles. Still, I guess the overhead is significant,
not to mention that if we'd just use a float instead of double not only
we wouldn't have to convert the type but the compiler would actually
issue sqrtss %xmm0 %xmm0 instead, which is (depending on the cpu) twice
as fast. Not sure why we use double there, are there platforms where
sqrtf (float x) isn't supported?
So really, call overhead is a tiny fraction of the optimization
potential for this function. When not using DEBUG (and USE_IEEE is
defined) the function is still quite a few cycles, so call overhead
doesn't look that bad neither. I don't actually know which version is
faster (or more accurate - I think though sqrtss is actually fully
accurate). Of course using sqrtf(x) will only be fast if the cpu
supports some kind of fast float unit (and the compiler knows how to use
it).
If you'd want to do some more optimization, there's for instance
_mesa_inv_sqrtf - it is supposedly fast, but sse2 offers rsqrtss, which
is really fast. However, I remember we got some bugs some time ago when
gcc actually used that, because precision wasn't enough - it will do
this if you enable -funsafe-math-optimizations, -mrecip or similar. I've
just seen though actually that at least gcc 4.4 does an additional
newton-raphson step when you do 1.0/sqrtf(float x) (so it will issue
rsqrtss plus a couple muls and adds), which might still be less or even
more accurate, and almost certainly be faster than the manual version.
So there's probably far more optimization potential than the call
overhead. Most of those functions are probably never used in any
performance critical path anyway.


> 
> I also don't quite understand wrapper functions like
> double
> _mesa_pow(double x, double y)
> {
>    return pow(x, y);
> }
> 
> Maybe at one time these had #ifdefs in them like _mesa_memcpy, but I
> can't see any reason not to remove it now.
> 
> Someone enlighten me.
I guess there might have been indeed #ifdefs in the past. In any case,
using wrapper would make it easier to implement such optimizations in
the future if anyone wants to, not that this is something which you
probably want to do (that stuff is probably better left up to the
compiler). So, at least if they are inlined, they shouldn't really hurt
neither.


------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to