Patches 1-4 remove the use of ifunc from the implementation.

Patch 6 adjusts the x86 implementation a bit more to take
advantage of ptest (in sse4.1) and unaligned accesses (in avx1).

Patches 3 and 7 are the result of my conversation with Vijaya
Kumar with respect to ThunderX.

Patch 8 is the result of seeing some really really horrible code
produced for ppc64le (gcc 4.9 and mainline).

This has had limited testing.  What I don't know is the best way
to benchmark this -- the only way I know to trigger this is via
the console, by hand, which doesn't make for reasonable timing.

Changes v1-v2:
  * Add patch 1, moving everything to a new file.
  * Fix a typo or two, which had the wrong sense of zero test.
    These had mostly beed fixed in the intermediate patches,
    but it wouldn't have helped bisection.


r~


Richard Henderson (8):
  cutils: Move buffer_is_zero and subroutines to a new file
  cutils: Remove SPLAT macro
  cutils: Export only buffer_is_zero
  cutils: Rearrange buffer_is_zero acceleration
  cutils: Add generic prefetch
  cutils: Rewrite x86 buffer zero checking
  cutils: Rewrite aarch64 buffer zero checking
  cutils: Rewrite ppc buffer zero checking

 configure             |  21 +--
 include/qemu/cutils.h |   2 -
 migration/ram.c       |   2 +-
 migration/rdma.c      |   5 +-
 util/Makefile.objs    |   1 +
 util/bufferiszero.c   | 432 ++++++++++++++++++++++++++++++++++++++++++++++++++
 util/cutils.c         | 244 ----------------------------
 7 files changed, 441 insertions(+), 266 deletions(-)
 create mode 100644 util/bufferiszero.c

-- 
2.7.4


Reply via email to