On Tue, Nov 26, 2013 at 04:54:22PM -0800, Jarno Rajahalme wrote:
>
> >> On Nov 18, 2013, at 1:19 PM, Ben Pfaff <[email protected]> wrote:
> >>
> >> This also restores use, in practice, of the optimized implementation of
> >> population count. (As the comment on popcount32() says, this version is
> >> 2x faster than __builtin_popcount().)
> >>
>
> I just tested the builtin popcountll with -march=native on i7. It is
> about 4x faster than our current version and about 8x faster than the
> builtin on a generic build.
-march=native produces nonportable code so we can't use that for generic
builds, see the GCC manual:
_native_
This selects the CPU to tune for at compilation time by
determining the processor type of the compiling machine.
Using `-mtune=native' will produce code optimized for the
local machine under the constraints of the selected
instruction set. Using `-march=native' will enable all
instruction subsets supported by the local machine (hence the
result might not run on different machines).
(It probably uses the POPCNT instruction, did you check?)
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev