On Tue, Oct 28, 2025 at 2:54 AM David Laight
<[email protected]> wrote:
>
> On Tue, 28 Oct 2025 05:32:13 +0000
> Kuniyuki Iwashima <[email protected]> wrote:
>
> ....
> > I rebased on 19ab0a22efbd and tested 4 versions on
> > AMD EPYC 7B12 machine:
>
> That is zen5 which I believe has much faster clac/stac than anything else.
> (It might also have a faster lfence - not sure.)

This is the Zen 2 platform, so probably the stac/clac cost will be
more expensive than you expect on Zen 5.

>
> Getting a 3% change for that diff also seems unlikely.
> Even if you halved the execution time of that code the system would have
> to be spending 6% of the time in that loop.
> Even your original post only shows 1% in ep_try_send_events().

We saw a similar improvement on the same platform by
1fb0e471611d ("net: remove one stac/clac pair from
move_addr_to_user()").


>
> An 'interesting' test is to replicate the code you are optimising
> to see how much slower it goes - you can't gain more than the slowdown.
>
> What is more likely is that breathing on the code changes the cache
> line layout and that causes a larger performance change.
>
> A better test for epoll_put_event would be to create 1000 fd (pipes or 
> events).
> Then time calls epoll_wait() that return lots of events.
>
>         David

Reply via email to