On Tue, Oct 28, 2025 at 05:29:30PM +0100, Alexander Lobakin wrote: > From: Nathan Chancellor <[email protected]> > Date: Mon, 27 Oct 2025 13:54:09 -0700 > > > On Mon, Oct 27, 2025 at 03:59:51PM +0100, Alexander Lobakin wrote: > >> Hmmm, > >> > >> For this patch: > >> > >> Acked-by: Alexander Lobakin <[email protected]> > > > > Thanks a lot for taking a look, even if it seems like we might not > > actually go the route of working around this. > > > >> However, > >> > >> The XSk metadata infra in the kernel relies on that when we call > >> xsk_tx_metadata_request(), we pass a static const struct with our > >> callbacks and then the compiler makes all these calls direct. > >> This is not limited to libeth (although I realize that it triggered > >> this build failure due to the way how I pass these callbacks), every > >> driver which implements XSk Tx metadata and calls > >> xsk_tx_metadata_request() relies on that these calls will be direct, > >> otherwise there'll be such performance penalty that is unacceptable > >> for XSk speeds. > > > > Hmmmm, I am not really sure how you could guarantee that these calls are > > turned direct from indirect aside from placing compile time assertions > > around like this... when you say "there'll be such performance penalty > > You mean in case of CFI or in general? Because currently on both GCC and > Clang with both OPTIMIZE_FOR_{SIZE,SPEED} they get inlined in every driver.
I mean in general but obviously that sort of optimization is high value for the compiler to perform so I would only expect it not to occur in extreme cases like sanitizers being enabled; I would expect no issues when using a backend CFI implementation > > that is unacceptable for XSk speeds", does that mean that everything > > will function correctly but slower than expected or does the lack of > > proper speed result in functionality degredation? > > Nothing would break, just work way slower than expected. > xsk_tx_metadata_request() is called for each Tx packet (when Tx metadata > is enabled). Average XSK Tx perf is ~35-40 Mpps (millions of packets per > second), often [much] higher. Having an indirect call there would divide > it by n. Ah okay. > >> Maybe xsk_tx_metadata_request() should be __nocfi as well? Or all > >> the callers of it? > > > > I would only expect __nocfi_generic to be useful for avoiding a problem > > such as this. __nocfi would be too big of a hammer because it would > > Yep, sorry, I actually meant __nocfi_generic... I figured, just wanted to make sure! This series needs to go to mainline sooner rather than later, so maybe xsk_tx_metadata_request() could pick up __nocfi_generic as a future change to net-next since there is no obvious breakage? 32-bit ARM is the only architecture affected by this change since all other architectures that support kCFI have a backend specific lowering and I am guessing very few people would actually notice this problem in practice. Thanks again for chiming in and taking a look! Cheers, Nathan
