Matt Turner <matts...@gmail.com> writes:
> On Sat, Nov 28, 2009 at 2:13 PM, Yang Zhao <y...@yangman.ca> wrote:
> > The speed-up is definitely there, but __builtin_popcount()
> > will still be drastically faster when architecture-specific
> > optimizations are enabled:
>
> I don't think this is the case (except for with SSE4's popcnt
> instruction, which your CFLAGS seem to be enabling.)
>
> Even when compiling with the Intel CC, which can undoubtedly
> can optimize code for Core 2 better than gcc, fast_bitcount is
> significantly faster.

IMHO, the best long term solution is to use gcc's builtin, and ping
the gcc folks with the implementation and benchmark results of
fast_bitcount.  If it's really better, they'll eventually adopt it on
the platforms where that makes sense.

This would benefit a larger community, and would hopefully mean that
nobody needs to come across this again in <N> years when common CPUs
might be completely different -- Mesa would just benefit transparently
from someone in the gcc community noticing the need for a new tuning.

-tom

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to