On Thu, Sep 03, 2020 at 03:23:59PM -0300, Marcelo Tosatti wrote:
> On Tue, Sep 01, 2020 at 12:46:41PM +0200, Frederic Weisbecker wrote:
> > == Unbound affinity ==
> > 
> > Restore kernel threads, workqueue, timers, etc... wide affinity. But take 
> > care of cpumasks that have been set through other
> > interfaces: sysfs, procfs, etc...
> 
> We were looking at a userspace interface: what would be a proper
> (unified, similar to isolcpus= interface) and its implementation:
> 
> The simplest idea for interface seemed to be exposing the integer list of
> CPUs and isolation flags to userspace (probably via sysfs).
> 
> The scheme would allow flags to be separately enabled/disabled, 
> with not all flags being necessary toggable (could for example
> disallow nohz_full= toggling until it is implemented, but allow for
> other isolation features to be toggable).
> 
> This would require per flag housekeeping_masks (instead of a single).

Right, I think cpusets provide exactly.

> Back to the userspace interface, you mentioned earlier that cpusets
> was a possibility for it. However:
> 
> "Cpusets provide a Linux kernel mechanism to constrain which CPUs and
> Memory Nodes are used by a process or set of processes.
> 
> The Linux kernel already has a pair of mechanisms to specify on which
> CPUs a task may be scheduled (sched_setaffinity) and on which Memory
> Nodes it may obtain memory (mbind, set_mempolicy).
> 
> Cpusets extends these two mechanisms as follows:"
> 
> The isolation flags do not necessarily have anything to do with
> tasks, but with CPUs: a given feature is disabled or enabled on a
> given CPU. 
> No?

When cpusets are set as exclusive, they become strict CPU properties.
I think we'll need to enforce the exclusive property to set the isolated
flags.

Then you're free to move the tasks you like into any isolated cpusets.

> Regarding locking of the masks, since housekeeping_masks can be called
> from hot paths (eg: get_nohz_timer_target) it seems RCU is a natural
> fit, so userspace would:
> 
> 1) use interface to change cpumask for a given feature:
> 
>       -> set_rcu_pointer
>       -> wait for grace period

Yep, could be a solution.

> 2) proceed to trigger actions that rely on housekeeping_cpumask, 
> to validate the cpumask at 1) is being used.

Exactly. I guess we can simply call directly to subsystems (timers,
workqueue, kthreads, ...) from the isolation code upon cpumask update.
This way we avoid ordering surprises that would come with a notifier.

> Regarding nohz_full=, a way to get an immediate implementation 
> (without handling the issues you mention above) would be to boot
> with a set of CPUs as "nohz_full toggable" and others not. For 
> the nohz_full toggable ones, you'd introduce a per-CPU tick
> dependency that is enabled/disabled on runtime. Probably better
> to avoid this one if possible...

Right but you would still have all the overhead that comes with nohz full
(kernel entry/exit tracking, RCU userspace extended grace period, RCU callbacks
offloaded, vtime accounting, ...). It will become really interesting once we
can switch all that overhead off.

Thanks.

Reply via email to