Hi David,

On Sat, Jun 18, 2022 at 12:52:23PM +0100, David CARLIER wrote:
> From 9d7b6448a2407451c3115b701c51f97ab2bf6a59 Mon Sep 17 00:00:00 2001
> From: David Carlier <devne...@gmail.com>
> Date: Sat, 18 Jun 2022 12:41:11 +0100
> Subject: [PATCH] MINOR: compiler __builtin_memcpy_inline usage introduction.
> 
> Optimised version of the existing __builtin_memcpy builtin, differences
> reside by the fact it works only with constant time sizes and does
>  generate extra calls.
> At the moment, is clang exclusive even tough GCC does not seem to
>  want to implement it, the comments try not to reflect this current
> state.
> Usage can be expanded, used purposely only in few places for starter.
> ---
>  include/haproxy/compiler.h | 12 ++++++++++++
>  src/haproxy.c              |  4 ++--
>  src/log.c                  |  2 +-
>  src/proxy.c                |  2 +-
>  4 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/include/haproxy/compiler.h b/include/haproxy/compiler.h
> index 66b5f5835..dca9f6aef 100644
> --- a/include/haproxy/compiler.h
> +++ b/include/haproxy/compiler.h
> @@ -192,6 +192,18 @@
>  #endif
>  #endif
>  
> +/*
> + * This builtin works only with compile time lengths unlike __builtin_memcpy.
> + * Also guarantees no external call is generated.
> + * We could `replicate` the aforementioned constraint with a
> + * _Static_assert/__builtin_constant_p theoretically, but that might be
> + * too much trouble to make it happens (C11 min)
> + * so here we trust the proper usage with other compilers (and the CI 
> infrastructure).
> + */
> +#if !defined(__clang__) || __clang_major__ < 11
> +#define __builtin_memcpy_inline(x, y, s) memcpy(x, y, s)
> +#endif
> +

Hmmm please no, this is not the right way to do it for two reasons:
  - it will encourage use of __builtin_memcpy_inline() making one believe
    it's the expected one when it isn't ;

  - developing on non-clang systems will not report the problem with the
    non-constant size that __builtin_memcpy_inline() expects, resulting
    in breakage from time to time.

I'd suggest that instead you create a macro named ha_memcpy_inline() that
maps to __builtin_memcpy_inline() when present and s is a builtin constant,
or maps to __builtin_memcpy() for the rest. Please note, by the way, that
gcc since at least 3.4 has been emitting the same instructions (rep movsb)
as clang's __builtin_memcpy_inline(), but only when optimizing for size.
When optimizing for anything else, it can emit ton of inlined garbage^Wvector
instructions.

Thus I suspect we could achieve the same result by doing a clever
arrangement of #pragma GCC push_options / #pragma GCC optimize("Os,inline")
around an inline function definition that does the memcpy, and that is
called by the macro. I have not tried, but probably something like this
would do it:

#pragma GCC push_options
#pragma GCC optimize("Os,inline")
static inline void _ha_memcpy_inline(void *restrict dest, const void *restrict 
src, size_t n)
{
        __builtin_memcpy(dest, src, n);
}
#pragma GCC pop_options

#if defined(clang...)
#define ha_memcpy_inline(d,s,n) do { if (__builtin_constant_p(n)) 
__builtin_memcpy_inline(d,s,n); else _ha_memcpy_inline(d,s,n); } while (0)
#else
#define ha_memcpy_inline(d,s,n) _ha_memcpy_inline(d,s,n)
#endif

That's not even compile-tested and I haven't looked at the result, but
you get the idea.

Now regarding where to use it... I'd say it depends on a lot of factors,
size, speed, undesired dependency on external libs. I think that each and
every call place does warrant an individual commit with a justification
and a check that the benefit matches expectations. There could be some
value in using this in various low-level protocol layers (QUIC, HTX, SPOE).
Maybe more, I don't know.

Thanks,
Willy

Reply via email to