This patch optimises csum_partial_copy_generic() by making use of cache instructions (dcbt/dcbz) just like copy_tofrom_user() does
On a TCP benchmark using socklib on the loopback interface on which checksum offload and scatter/gather have been deactivated, we get about 20% performance increase. Christophe Leroy (2): powerpc32: checksum_wrappers_64 becomes checksum_wrappers powerpc32: rewrite of csum_partial_copy_generic based of copy_tofrom_user arch/powerpc/include/asm/checksum.h | 9 - arch/powerpc/lib/Makefile | 3 +- arch/powerpc/lib/checksum_32.S | 320 +++++++++++++++++++++----------- arch/powerpc/lib/checksum_wrappers.c | 102 ++++++++++ arch/powerpc/lib/checksum_wrappers_64.c | 102 ---------- 5 files changed, 312 insertions(+), 224 deletions(-) create mode 100644 arch/powerpc/lib/checksum_wrappers.c delete mode 100644 arch/powerpc/lib/checksum_wrappers_64.c -- 2.1.0 _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev