On Thu, Sep 03, 2020 at 03:23:59PM -0300, Marcelo Tosatti wrote: > On Tue, Sep 01, 2020 at 12:46:41PM +0200, Frederic Weisbecker wrote: > > == Unbound affinity == > > > > Restore kernel threads, workqueue, timers, etc... wide affinity. But take > > care of cpumasks that have been set through other > > interfaces: sysfs, procfs, etc... > > We were looking at a userspace interface: what would be a proper > (unified, similar to isolcpus= interface) and its implementation: > > The simplest idea for interface seemed to be exposing the integer list of > CPUs and isolation flags to userspace (probably via sysfs). > > The scheme would allow flags to be separately enabled/disabled, > with not all flags being necessary toggable (could for example > disallow nohz_full= toggling until it is implemented, but allow for > other isolation features to be toggable). > > This would require per flag housekeeping_masks (instead of a single).
Right, I think cpusets provide exactly. > Back to the userspace interface, you mentioned earlier that cpusets > was a possibility for it. However: > > "Cpusets provide a Linux kernel mechanism to constrain which CPUs and > Memory Nodes are used by a process or set of processes. > > The Linux kernel already has a pair of mechanisms to specify on which > CPUs a task may be scheduled (sched_setaffinity) and on which Memory > Nodes it may obtain memory (mbind, set_mempolicy). > > Cpusets extends these two mechanisms as follows:" > > The isolation flags do not necessarily have anything to do with > tasks, but with CPUs: a given feature is disabled or enabled on a > given CPU. > No? When cpusets are set as exclusive, they become strict CPU properties. I think we'll need to enforce the exclusive property to set the isolated flags. Then you're free to move the tasks you like into any isolated cpusets. > Regarding locking of the masks, since housekeeping_masks can be called > from hot paths (eg: get_nohz_timer_target) it seems RCU is a natural > fit, so userspace would: > > 1) use interface to change cpumask for a given feature: > > -> set_rcu_pointer > -> wait for grace period Yep, could be a solution. > 2) proceed to trigger actions that rely on housekeeping_cpumask, > to validate the cpumask at 1) is being used. Exactly. I guess we can simply call directly to subsystems (timers, workqueue, kthreads, ...) from the isolation code upon cpumask update. This way we avoid ordering surprises that would come with a notifier. > Regarding nohz_full=, a way to get an immediate implementation > (without handling the issues you mention above) would be to boot > with a set of CPUs as "nohz_full toggable" and others not. For > the nohz_full toggable ones, you'd introduce a per-CPU tick > dependency that is enabled/disabled on runtime. Probably better > to avoid this one if possible... Right but you would still have all the overhead that comes with nohz full (kernel entry/exit tracking, RCU userspace extended grace period, RCU callbacks offloaded, vtime accounting, ...). It will become really interesting once we can switch all that overhead off. Thanks.