On Sat, Nov 28, 2009 at 2:13 PM, Yang Zhao <y...@yangman.ca> wrote:
> The speed-up is definitely there, but __builtin_popcount() will still
> be drastically faster when architecture-specific optimizations are
> enabled:

I don't think this is the case (except for with SSE4's popcnt
instruction, which your CFLAGS seem to be enabling.)

Even when compiling with the Intel CC, which can undoubtedly can
optimize code for Core 2 better than gcc, fast_bitcount is
significantly faster.

$ icc -O3 -ipo -march=core2 bc.c -o bc
ipo: remark #11001: performing single-file optimizations
ipo: remark #11005: generating object file /tmp/ipo_icce1aegt.o
bc.c(61): (col. 5) remark: LOOP WAS VECTORIZED.
$ ./bc
1 billion of __builtin_popcount(), fast_bitcount(), and naive() (in that order)
__builtin_popcount(): 5.361 seconds
fast_bitcount(): 1.274 seconds
kr_bitcount(): 20.302 seconds
naive(): 34.547 seconds

Matt

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to