On Sun, 23 Nov 2014 at 22:08:24 +0000, Simon McVittie wrote:
> I've prepared an NMU for lzo2 (versioned as 2.08-1.1) based on
> feedback on debian-devel and further testing

I have cancelled this delayed upload, because Julian Taylor pointed out
on debian-devel that there's a much simpler way to do this:
we know that the compiler used on Debian optimizes memcpy() calls
with constant n into optimal inlined instructions for the appropriate
CPU, without actually emitting a function call. So we can just define
LZO_MEMOPS_COPY2(dd, ss) to be memcpy(dd, ss, 1) and so on.

As a patch for upstream this would probably have to be guarded by
-DLZO_CFG_TRUST_THE_STDLIB or something, because they seem to be targeting
liblzo2 to be functional and fast even on the most naive compiler/libc
combinations possible... but on Debian, where we control libc and the
compiler, we know they're good.

It might be desirable to replace LZO_MEMOPS_SETn, LZO_MEMOPS_MOVEn
with memset and memmove calls too, since they also seem to violate
strict aliasing in ways that I suspect might lead an optimizing compiler
to emit the wrong code.

Sorry for the noise, I'll put together another patch shortly.

Using memcpy() like the attached does seem to work and have
equivalent performance, but I'm slightly concerned that
lzo2 might not guarantee not to pass overlapping arguments to
LZO_MEMOPS_COPYn; if they can overlap, memmove() would be
necessary. Some of the fallback implementations for the LZO_MEMOPS_MOVEn
family are not overlapping-argument-safe in any case, so memcpy()
is *probably* OK, but that might just mean nobody has noticed
the bug before.

    S
From: Simon McVittie <s...@debian.org>
Date: Sun, 23 Nov 2014 22:50:33 +0000
Subject: Use memcpy() instead of reinventing it

gcc inlines memcpy() with results as fast as handwritten code (at
least in my brief testing with lzop), and knows the alignment
constraints for our architectures.

Change suggested by Julian Taylor.

Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=757037
---
 minilzo/minilzo.c | 14 ++++++++++++++
 src/lzo_func.h    | 14 ++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/minilzo/minilzo.c b/minilzo/minilzo.c
index ab2be5f..6913c2f 100644
--- a/minilzo/minilzo.c
+++ b/minilzo/minilzo.c
@@ -3523,6 +3523,20 @@ LZO_COMPILE_TIME_ASSERT_HEADER(sizeof(*(lzo_memops_TU8p)0)==8)
     if ((void)0, n__n > 0) do { *d__n++ = *s__n++; } while (--n__n > 0); \
     LZO_BLOCK_END
 
+/* Debian-specific change: we know that our compiler inlines memcpy() with
+ * constant n to be as fast as handwritten code, and knows which architectures
+ * need things correctly aligned. */
+#undef LZO_MEMOPS_COPY1
+#undef LZO_MEMOPS_COPY2
+#undef LZO_MEMOPS_COPY4
+#undef LZO_MEMOPS_COPY8
+#undef LZO_MEMOPS_COPYN
+#define LZO_MEMOPS_COPY1(dd,ss) memcpy(dd, ss, 1)
+#define LZO_MEMOPS_COPY2(dd,ss) memcpy(dd, ss, 2)
+#define LZO_MEMOPS_COPY4(dd,ss) memcpy(dd, ss, 4)
+#define LZO_MEMOPS_COPY8(dd,ss) memcpy(dd, ss, 8)
+#define LZO_MEMOPS_COPYN(dd,ss,nn) memcpy(dd, ss, nn)
+
 __lzo_static_forceinline lzo_uint16_t lzo_memops_get_le16(const lzo_voidp ss)
 {
     lzo_uint16_t v;
diff --git a/src/lzo_func.h b/src/lzo_func.h
index dfaa676..1cc1b53 100644
--- a/src/lzo_func.h
+++ b/src/lzo_func.h
@@ -333,6 +333,20 @@ LZO_COMPILE_TIME_ASSERT_HEADER(sizeof(*(lzo_memops_TU8p)0)==8)
     if ((void)0, n__n > 0) do { *d__n++ = *s__n++; } while (--n__n > 0); \
     LZO_BLOCK_END
 
+/* Debian-specific change: we know that our compiler inlines memcpy() with
+ * constant n to be as fast as handwritten code, and knows which architectures
+ * need things correctly aligned. */
+#undef LZO_MEMOPS_COPY1
+#undef LZO_MEMOPS_COPY2
+#undef LZO_MEMOPS_COPY4
+#undef LZO_MEMOPS_COPY8
+#undef LZO_MEMOPS_COPYN
+#define LZO_MEMOPS_COPY1(dd,ss) memcpy(dd, ss, 1)
+#define LZO_MEMOPS_COPY2(dd,ss) memcpy(dd, ss, 2)
+#define LZO_MEMOPS_COPY4(dd,ss) memcpy(dd, ss, 4)
+#define LZO_MEMOPS_COPY8(dd,ss) memcpy(dd, ss, 8)
+#define LZO_MEMOPS_COPYN(dd,ss,nn) memcpy(dd, ss, nn)
+
 __lzo_static_forceinline lzo_uint16_t lzo_memops_get_le16(const lzo_voidp ss)
 {
     lzo_uint16_t v;

Reply via email to