On Mon, 20 Jun 2022 at 07:27, Willy Tarreau <[email protected]> wrote:
>
> Hi David,
>
> On Sat, Jun 18, 2022 at 12:52:23PM +0100, David CARLIER wrote:
> > From 9d7b6448a2407451c3115b701c51f97ab2bf6a59 Mon Sep 17 00:00:00 2001
> > From: David Carlier <[email protected]>
> > Date: Sat, 18 Jun 2022 12:41:11 +0100
> > Subject: [PATCH] MINOR: compiler __builtin_memcpy_inline usage introduction.
> >
> > Optimised version of the existing __builtin_memcpy builtin, differences
> > reside by the fact it works only with constant time sizes and does
> >  generate extra calls.
> > At the moment, is clang exclusive even tough GCC does not seem to
> >  want to implement it, the comments try not to reflect this current
> > state.
> > Usage can be expanded, used purposely only in few places for starter.
> > ---
> >  include/haproxy/compiler.h | 12 ++++++++++++
> >  src/haproxy.c              |  4 ++--
> >  src/log.c                  |  2 +-
> >  src/proxy.c                |  2 +-
> >  4 files changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/haproxy/compiler.h b/include/haproxy/compiler.h
> > index 66b5f5835..dca9f6aef 100644
> > --- a/include/haproxy/compiler.h
> > +++ b/include/haproxy/compiler.h
> > @@ -192,6 +192,18 @@
> >  #endif
> >  #endif
> >
> > +/*
> > + * This builtin works only with compile time lengths unlike 
> > __builtin_memcpy.
> > + * Also guarantees no external call is generated.
> > + * We could `replicate` the aforementioned constraint with a
> > + * _Static_assert/__builtin_constant_p theoretically, but that might be
> > + * too much trouble to make it happens (C11 min)
> > + * so here we trust the proper usage with other compilers (and the CI 
> > infrastructure).
> > + */
> > +#if !defined(__clang__) || __clang_major__ < 11
> > +#define __builtin_memcpy_inline(x, y, s) memcpy(x, y, s)
> > +#endif
> > +
>
> Hmmm please no, this is not the right way to do it for two reasons:
>   - it will encourage use of __builtin_memcpy_inline() making one believe
>     it's the expected one when it isn't ;
>
>   - developing on non-clang systems will not report the problem with the
>     non-constant size that __builtin_memcpy_inline() expects, resulting
>     in breakage from time to time.
>
> I'd suggest that instead you create a macro named ha_memcpy_inline() that
> maps to __builtin_memcpy_inline() when present and s is a builtin constant,
> or maps to __builtin_memcpy() for the rest. Please note, by the way, that
> gcc since at least 3.4 has been emitting the same instructions (rep movsb)
> as clang's __builtin_memcpy_inline(), but only when optimizing for size.
> When optimizing for anything else, it can emit ton of inlined garbage^Wvector
> instructions.
>
> Thus I suspect we could achieve the same result by doing a clever
> arrangement of #pragma GCC push_options / #pragma GCC optimize("Os,inline")
> around an inline function definition that does the memcpy, and that is
> called by the macro. I have not tried, but probably something like this
> would do it:
>
> #pragma GCC push_options
> #pragma GCC optimize("Os,inline")
> static inline void _ha_memcpy_inline(void *restrict dest, const void 
> *restrict src, size_t n)
> {
>         __builtin_memcpy(dest, src, n);
> }
> #pragma GCC pop_options
>
> #if defined(clang...)
> #define ha_memcpy_inline(d,s,n) do { if (__builtin_constant_p(n)) 
> __builtin_memcpy_inline(d,s,n); else _ha_memcpy_inline(d,s,n); } while (0)
> #else
> #define ha_memcpy_inline(d,s,n) _ha_memcpy_inline(d,s,n)
> #endif
>
> That's not even compile-tested and I haven't looked at the result, but
> you get the idea.

Alright thanks here a newer version.
>
> Now regarding where to use it... I'd say it depends on a lot of factors,
> size, speed, undesired dependency on external libs. I think that each and
> every call place does warrant an individual commit with a justification
> and a check that the benefit matches expectations. There could be some
> value in using this in various low-level protocol layers (QUIC, HTX, SPOE).

Exactly.

> Maybe more, I don't know.

Over time its value will display more.

>
> Thanks,
> Willy
From 90c951d29c32a345cf38d6c0b43ee904f12b8ac7 Mon Sep 17 00:00:00 2001
From: David Carlier <[email protected]>
Date: Sat, 18 Jun 2022 12:41:11 +0100
Subject: [PATCH] MINOR: compiler __builtin_memcpy_inline usage introduction.

Optimised version of the existing __builtin_memcpy builtin, differences
reside by the fact it works only with constant time sizes and does
 not generate extra calls.
At the moment, is clang exclusive even tough GCC does not seem to
 want to implement it, the comments try not to reflect this current
state.
Usage can be expanded, used purposely only in few places for starter.
---
 include/haproxy/compiler.h | 32 ++++++++++++++++++++++++++++++++
 src/haproxy.c              |  4 ++--
 src/log.c                  |  2 +-
 src/proxy.c                |  2 +-
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/include/haproxy/compiler.h b/include/haproxy/compiler.h
index 66b5f5835..9584b71cb 100644
--- a/include/haproxy/compiler.h
+++ b/include/haproxy/compiler.h
@@ -130,6 +130,38 @@
 #endif
 #endif
 
+#ifdef __GNUC__
+#ifndef __clang__
+#pragma GCC push_options
+#pragma GCC optimize("Os,inline")
+#else
+__attribute__((minsize, always_inline))
+#endif
+static inline void _ha_memcpy_inline(void *restrict dest, const void *restrict src, size_t n)
+{
+	__builtin_memcpy(dest, src, n);
+}
+#ifndef __clang__
+#pragma GCC pop_options
+#endif
+
+/*
+ * __builtin_memcpy_inline works only with compile time lengths unlike __builtin_memcpy.
+ * Also guarantees no external call is generated.
+ */
+#if defined(__clang__) && __clang_major__ >= 11
+#define ha_memcpy_inline(dest, src, n)				\
+	do {							\
+		if (__builtin_constant_p(n))			\
+			__builtin_memcpy_inline(dest, src, n);	\
+		else						\
+			__builtin_memcpy(dest, src, n);		\
+	} while(0)
+#else
+#define ha_memcpy_inline(dest, src, n) _ha_memcpy_inline(dest, src, n)
+#endif
+#endif
+
 /* This macro may be used to block constant propagation that lets the compiler
  * detect a possible NULL dereference on a variable resulting from an explicit
  * assignment in an impossible check. Sometimes a function is called which does
diff --git a/src/haproxy.c b/src/haproxy.c
index 32eb4bdc6..e278f89b3 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -1265,11 +1265,11 @@ static void ha_random_boot(char *const *argv)
 
 	/* stack address (benefit form operating system's ASLR) */
 	l = (unsigned long)&m;
-	memcpy(m, &l, sizeof(l)); m += sizeof(l);
+	ha_memcpy_inline(m, &l, sizeof(l)); m += sizeof(l);
 
 	/* argv address (benefit form operating system's ASLR) */
 	l = (unsigned long)&argv;
-	memcpy(m, &l, sizeof(l)); m += sizeof(l);
+	ha_memcpy_inline(m, &l, sizeof(l)); m += sizeof(l);
 
 	/* use tv_usec again after all the operations above */
 	gettimeofday(&tv, NULL);
diff --git a/src/log.c b/src/log.c
index 304e3cb68..a0ab472a4 100644
--- a/src/log.c
+++ b/src/log.c
@@ -808,7 +808,7 @@ int parse_logsrv(char **args, struct list *logsrvs, int do_del, const char *file
 			}
 
 			node = malloc(sizeof(*node));
-			memcpy(node, logsrv, sizeof(struct logsrv));
+			ha_memcpy_inline(node, logsrv, sizeof(struct logsrv));
 			node->ref = logsrv;
 			LIST_INIT(&node->list);
 			LIST_APPEND(logsrvs, &node->list);
diff --git a/src/proxy.c b/src/proxy.c
index c10a8cf83..3cc669179 100644
--- a/src/proxy.c
+++ b/src/proxy.c
@@ -1818,7 +1818,7 @@ static int proxy_defproxy_cpy(struct proxy *curproxy, const struct proxy *defpro
 			memprintf(errmsg, "proxy '%s': out of memory", curproxy->id);
 			return 1;
 		}
-		memcpy(node, tmplogsrv, sizeof(struct logsrv));
+		ha_memcpy_inline(node, tmplogsrv, sizeof(struct logsrv));
 		node->ref = tmplogsrv->ref;
 		LIST_INIT(&node->list);
 		LIST_APPEND(&curproxy->logsrvs, &node->list);
-- 
2.34.1

Reply via email to