Matt Turner <matts...@gmail.com> writes: > On Sat, Nov 28, 2009 at 2:13 PM, Yang Zhao <y...@yangman.ca> wrote: > > The speed-up is definitely there, but __builtin_popcount() > > will still be drastically faster when architecture-specific > > optimizations are enabled: > > I don't think this is the case (except for with SSE4's popcnt > instruction, which your CFLAGS seem to be enabling.) > > Even when compiling with the Intel CC, which can undoubtedly > can optimize code for Core 2 better than gcc, fast_bitcount is > significantly faster.
IMHO, the best long term solution is to use gcc's builtin, and ping the gcc folks with the implementation and benchmark results of fast_bitcount. If it's really better, they'll eventually adopt it on the platforms where that makes sense. This would benefit a larger community, and would hopefully mean that nobody needs to come across this again in <N> years when common CPUs might be completely different -- Mesa would just benefit transparently from someone in the gcc community noticing the need for a new tuning. -tom ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev