On Fri, May 30, 2025 at 04:59:35PM +0200, Alexander Lobakin wrote: > Unfortunately this can hurt performance on my setup. > In XDP, we usually place &xdp_buff on the stack. It's 56 bytes. We > initialize it only during the packet processing, not in advance. > > In libeth_xdp, see [1], I provide a small extension for this struct. > This extension is 64 byte, plus I added a definition (see > `__LIBETH_XDP_ONSTACK_BUFF()`) to be able to define a bigger one in case > a driver might need more fields there. > The same as with &xdp_buff, it shouldn't be initialized in advance, only > during the packet processing. Otherwise, it can really decrease > performance, you might've seen recent Mellanox report that even > CONFIG_INIT_STACK_ZERO provokes this.
FYI, you can use __uninitialized to force CONFIG_INIT_STACK_ZERO to leave an automatic variable uninitialized. > What would be the best option to resolve this? This flex XDP buff on the > stack is fully safe, there are a couple checks that its size does not > exceed the maximum (defined by xdp_buff_xsk) etc. And we really need to > initialize it field-by-field in a loop on a per-packet basis -- we are > sure that there will be no garbage. But yes, this is suddenly not available for _DEFINE_FLEX after the referenced patch. > It's even worse that most drivers will most likely not want to add any > additional fields, i.e. this flex array at the end will be empty, IOW > they just want a plain libeth_xdp_buff, but I made a unified definition, > with which you can declare them on the stack both with and without > additional fields. So, even if the drivers doesn't want any additional > fields and the flex array is empty, the struct will be zero-initialized > and the same perf hit will apply. > [...] > [1] > https://lore.kernel.org/netdev/[email protected] Okay, so it sounds like you need the old behavior of _DEFINE_FLEX, *and* a way to apply attributes (like __uninitialized). How about replacing _DEFINE_FLEX with: #define __DEFINE_FLEX(type, name, member, count, trailer...) \ _Static_assert(__builtin_constant_p(count), \ "onstack flex array members require compile-time const count"); \ union { \ u8 bytes[struct_size_t(type, member, count)]; \ type obj; \ } name##_u trailer; \ type *name = (type *)&name##_u #define _DEFINE_FLEX(type, name, member, count, initializer...) \ __DEFINE_FLEX(type, name, member, count, = { .obj = initializer }) Which would yield this for ___LIBETH_XDP_ONSTACK_BUFF: #define ___LIBETH_XDP_ONSTACK_BUFF(name, ...) \ __DEFINE_FLEX(struct libeth_xdp_buff, name, priv, \ LIBETH_XDP_PRIV_SZ(__VA_ARGS__ + 0), \ __uninitialized); \ LIBETH_XDP_ASSERT_PRIV_SZ(__VA_ARGS__ + 0) Does that look like what you'd want? (Note I didn't actually build this; I want to make sure the concept is workable...) -- Kees Cook
