On Tue, Jan 13, 2026 at 08:23:29PM +0530, Shrikanth Hegde wrote: > Hi. > > On 1/13/26 8:02 PM, Joel Fernandes wrote: > > > > > > > > > > > > > Another way to make it in-kernel would be to make the RCU > > > > > > > > normal wake from GP optimization enabled for > 16 CPUs by > > > > > > > > default. > > > > > > > > > > > > > > > > I was considering this, but I did not bring it up because I did > > > > > > > > not know that there are large systems that might benefit from > > > > > > > > it until now. > > > > > > > > > > > > > > > IMO, we can increase that threshold. 512/1024 is not a problem at > > > > > > > all. > > > > > > > But as Paul mentioned, we should consider scalability > > > > > > > enhancement. From > > > > > > > the other hand it is also probably worth to get into the state > > > > > > > when we > > > > > > > really see them :) > > > > > > > > > > > > Instead of pegging to number of CPUs, perhaps the optimization > > > > > > should be dynamic? That is, default to it unless synchronize_rcu > > > > > > load is high, default to the sr_normal wake-up optimization. Of > > > > > > course carefully considering all corner cases, adequate testing and > > > > > > all that ;-) > > > > > > > > > > > Honestly i do not see use cases when we are not up to speed to process > > > > > all callbacks in time keeping in mind that it is blocking context > > > > > call. > > > > > > > > > > How many of them should be in flight(blocked contexts) to make it > > > > > starve... :) > > > > > According to my last evaluation it was ~64K. > > > > > > > > > > Note i do not say that it should not be scaled. > > > > > > > > But you did not test that on large system with 1000s of CPUs right? > > > > > > > No, no. I do not have access to such systems. > > > > > > > > > > > So the options I see are: either default to always using the > > > > optimization, > > > > not just for less than 17 CPUs (what you are saying above). Or, do what > > > > I said > > > > above (safer for system with 1000s of CPUs and less risky). > > > > > > > You mean introduce threshold and count how many nodes are in queue? > > > > Yes. > > > > > To me it sounds not optimal and looks like a temporary solution. > > > > Not more sub-optimal than the existing 16 CPU hard-coded solution I suppose. > > > > > > > > Long term wise, it is better to split it, i mean to scale. > > > > But the scalable solution is already there: the !synchronize_rcu_normal > > path, > > right? And splitting the list won't help this use case anyway. > > > > > > > > Do you know who can test it on ~1000 CPUs system? So we have some figures. > > > > I don't have such systems either. The most I can go is ~200+ CPUs. Perhaps > > the > > folks on this thread have such systems as they mentioned 1900+ CPU systems. > > They > > should be happy to test. > > > > Do you have a patch to try out? We can test it on these systems. > > > Note: Might take a while to test it, as those systems are bit tricky to > get. > Let me prepare something. I will come back.
-- Uladzislau Rezki

