Patches 1-4 remove the use of ifunc from the implementation. Patch 6 adjusts the x86 implementation a bit more to take advantage of ptest (in sse4.1) and unaligned accesses (in avx1).
Patches 3 and 7 are the result of my conversation with Vijaya Kumar with respect to ThunderX. Patch 8 is the result of seeing some really really horrible code produced for ppc64le (gcc 4.9 and mainline). This has had limited testing. What I don't know is the best way to benchmark this -- the only way I know to trigger this is via the console, by hand, which doesn't make for reasonable timing. Changes v1-v2: * Add patch 1, moving everything to a new file. * Fix a typo or two, which had the wrong sense of zero test. These had mostly beed fixed in the intermediate patches, but it wouldn't have helped bisection. r~ Richard Henderson (8): cutils: Move buffer_is_zero and subroutines to a new file cutils: Remove SPLAT macro cutils: Export only buffer_is_zero cutils: Rearrange buffer_is_zero acceleration cutils: Add generic prefetch cutils: Rewrite x86 buffer zero checking cutils: Rewrite aarch64 buffer zero checking cutils: Rewrite ppc buffer zero checking configure | 21 +-- include/qemu/cutils.h | 2 - migration/ram.c | 2 +- migration/rdma.c | 5 +- util/Makefile.objs | 1 + util/bufferiszero.c | 432 ++++++++++++++++++++++++++++++++++++++++++++++++++ util/cutils.c | 244 ---------------------------- 7 files changed, 441 insertions(+), 266 deletions(-) create mode 100644 util/bufferiszero.c -- 2.7.4