Re: [EXT] Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
On 3/4/21 4:13 PM, Alex Belits wrote: > On 3/4/21 10:15, Nitesh Narayan Lal wrote: >> External Email >> >> -- >> >> On 2/11/21 10:55 AM, Nitesh Narayan Lal wrote: >>> On 2/6/21 7:43 PM, Nitesh Narayan Lal wrote: On 2/5/21 5:23 PM, Thomas Gleixner wrote: > On Thu, Feb 04 2021 at 14:17, Nitesh Narayan Lal wrote: >> On 2/4/21 2:06 PM, Marcelo Tosatti wrote: > How about adding a new flag for isolcpus instead? > Do you mean a flag based on which we can switch the affinity mask to housekeeping for all the devices at the time of IRQ distribution? >>> Yes a new flag for isolcpus. HK_FLAG_IRQ_SPREAD or some better name. >> Does sounds like a nice idea to explore, lets see what Thomas thinks >> about it. >>> >>> >> When the affinity mask of the interrupt at the time when it is actually >> requested contains an isolated CPU then nothing prevents the kernel from >> steering it at an isolated CPU. But that has absolutely nothing to do >> with that spreading thingy. >> >> The only difference which this change makes is the fact that the >> affinity hint changes. Nothing else. >> Thanks for the detailed explanation. Before I posted this patch, I was doing some debugging on a setup where I was observing some latency issues due to the iavf IRQs that were pinned on the isolated CPUs. Based on some initial traces I had this impression that the affinity hint or cpumask_local_spread was somehow playing a role in deciding the affinity mask of these IRQs. Although, that does look incorrect after going through your explanation. For some reason, with a kernel that had this patch when I tried creating VFs iavf IRQs always ended up on the HK CPUs. The reasoning for the above is still not very clear to me. I will investigate this further to properly understand this behavior. >>> After a little more digging, I found out why cpumask_local_spread change >>> affects the general/initial smp_affinity for certain device IRQs. >>> >>> After the introduction of the commit: >>> >>> e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint() >>> >> >> Continuing the conversation about the above commit and adding Jesse. >> I was trying to understand the problem that the commit message explains >> "The default behavior of the kernel is somewhat undesirable as all >> requested interrupts end up on CPU0 after registration.", I have also been >> trying to reproduce this behavior without the patch but I failed in doing >> so, maybe because I am missing something here. >> >> @Jesse Can you please explain? FWIU IRQ affinity should be decided based on >> the default affinity mask. >> >> The problem with the commit is that when we overwrite the affinity mask >> based on the hinting mask we completely ignore the default SMP affinity >> mask. If we do want to overwrite the affinity based on the hint mask we >> should atleast consider the default SMP affinity. >> > > cpumask_local_spread() is used by a small number of drivers, mostly for > Ethernet and cryptographic devices, however it includes Cavium and Marvell > devices that were included in every piece of hardware that I and Yury Norov > worked on. Without my patch (or previous, later replaced, Yury's patch that > was developed before there were housekeeping CPUs), driver would completely > break any attempts to configure task isolation, because it would distribute > processing over CPUs regardless of any masks related to isolation (and later > housekeeping). This is why it was created, and it just happens that it also > makes sense for CPU isolation in general. Of course, none of it would be > experienced on hardware that does not include those devices, possibly > creating some wrong impression about its effect and purpose. > > It may be that my patch can be criticized for not accommodating CPU hotplug > and other runtime changes of masks. Or drivers can be criticized for their > behavior that relies on calling cpumask_local_spread() once on > initialization and then assuming that all CPUs are configured forever. > However as far as I can tell, currently we have no other means of > controlling the behavior of drivers that manage their own interrupt or > thread to CPU mapping, and no way to communicate any of those changes to > them while they are running. Drivers may have legitimate reasons for > maintaining permanent or semi-permanent CPU core to interrupt mapping, > especially on devices with very large numbers of CPU cores and built-in > support for parallel processing of network packets. > > If we want it to be done in some manner that accommodates current demands, > we should simply replace cpumask_local_spread() with something else, or, > maybe, add some means that will allow dynamic changes. Thankfully, there are > very few (IIRC, 19) pla
Re: [EXT] Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
On Fri, Jan 29, 2021 at 07:41:27AM -0800, Alex Belits wrote: > On 1/28/21 07:56, Thomas Gleixner wrote: > > External Email > > > > -- > > On Wed, Jan 27 2021 at 10:09, Marcelo Tosatti wrote: > > > On Wed, Jan 27, 2021 at 12:36:30PM +, Robin Murphy wrote: > > > > > > > /** > > > > > > > * cpumask_next - get the next cpu in a cpumask > > > > > > > @@ -205,22 +206,27 @@ void __init > > > > > > > free_bootmem_cpumask_var(cpumask_var_t mask) > > > > > > > */ > > > > > > > unsigned int cpumask_local_spread(unsigned int i, int node) > > > > > > > { > > > > > > > - int cpu; > > > > > > > + int cpu, hk_flags; > > > > > > > + const struct cpumask *mask; > > > > > > > + hk_flags = HK_FLAG_DOMAIN | HK_FLAG_MANAGED_IRQ; > > > > > > > + mask = housekeeping_cpumask(hk_flags); > > > > > > > > > > > > AFAICS, this generally resolves to something based on > > > > > > cpu_possible_mask > > > > > > rather than cpu_online_mask as before, so could now potentially > > > > > > return an > > > > > > offline CPU. Was that an intentional change? > > > > > > > > > > Robin, > > > > > > > > > > AFAICS online CPUs should be filtered. > > > > > > > > Apologies if I'm being thick, but can you explain how? In the case of > > > > isolation being disabled or compiled out, housekeeping_cpumask() is > > > > literally just "return cpu_possible_mask;". If we then iterate over that > > > > with for_each_cpu() and just return the i'th possible CPU (e.g. in the > > > > NUMA_NO_NODE case), what guarantees that CPU is actually online? > > > > > > > > Robin. > > > > > > Nothing, but that was the situation before > > > 1abdfe706a579a702799fce465bceb9fb01d407c > > > as well. > > > > > > cpumask_local_spread() should probably be disabling CPU hotplug. > > > > It can't unless all callers are from preemtible code. > > > > Aside of that this whole frenzy to sprinkle housekeeping_cpumask() all > > over the kernel is just wrong, really. > > > > As I explained several times before there are very valid reasons for > > having queues and interrupts on isolated CPUs. Just optimizing for the > > usecases some people care about is not making anything better. > > However making it mandatory for isolated CPUs to allow interrupts is not a > good idea, either. Providing an environment free of disturbances is a valid > goal, so we can't do something that will make it impossible to achieve. We > know that both there is a great amount of demand for this feature and > implementing it is doable, so cutting off the possibility of development in > this direction would be bad. > > Before there was housekeeping mask, I had to implement another, more > cumbersome model that ended up being more intrusive than I wanted. That was > one of the reasons why I have spent some time working on it in, please > forgive me the pun, isolation. > > I was relieved when housekeeping mask appeared, and I was able to remove a > large chunk of code that distinguished between CPUs that "are there" and > CPUs "available to run work". Housekeeping is supposed to define the set of > CPUs that are intended to run work that is not specifically triggered by > anything running on those CPUs. "CPUs that are there" are CPUs that are > being maintained as a part of the system, so they are usable for running > things on them. > > My idea at the time was that we can separate this into two directions of > development: > > 1. Make sure that housekeeping mask applies to all kinds of work that > appears on CPUs, so nothing random will end up running there. Because this > is very much in line of what it does. Its easier to specify "all members of set" rather than having to specify each individual member. Thinking of set as a description of types of activities that should not have a given CPU as a target. > 2. Rely on housekeeping mask to exclude anything not specifically intended > to run on isolated CPUs, and concentrate efforts on making sure that things > that are intended to [eventually] happen on those CPUs are handled properly > -- in case of my recent proposals, delayed until synchronization event at > the next kernel entry. Your point makes sense, but the "isolated CPU + one interrupt to this CPU" use-case is a valid one. "target CPU" is the CPU we are interested in executing the application (and handling the interrupt). So isolation happens today in the following components: 1) setup kernel boot parameters for isolation (suppose the goal is to move these to happen dynamically in userspace, so they would move to step 2 in the future). 2) disable certain activities from happening at target CPU: "exclude ALL activities for target CPU which can be performed by housekeeping CPUs" (which might involve a number of steps in userspace such as the RPS mask configuration, network queue configuration
Re: [EXT] Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs
On 1/29/21 06:23, Marcelo Tosatti wrote: External Email -- On Fri, Jan 29, 2021 at 08:55:20AM -0500, Nitesh Narayan Lal wrote: On 1/28/21 3:01 PM, Thomas Gleixner wrote: On Thu, Jan 28 2021 at 13:59, Marcelo Tosatti wrote: The whole pile wants to be reverted. It's simply broken in several ways. I was asking for your comments on interaction with CPU hotplug :-) Which I answered in an seperate mail :) So housekeeping_cpumask has multiple meanings. In this case: ... So as long as the meaning of the flags are respected, seems alright. Yes. Stuff like the managed interrupts preference for housekeeping CPUs when a affinity mask spawns housekeeping and isolated is perfectly fine. It's well thought out and has no limitations. Nitesh, is there anything preventing this from being fixed in userspace ? (as Thomas suggested previously). Everything with is not managed can be steered by user space. Thanks, tglx So, I think the conclusion here would be to revert the change made in cpumask_local_spread via the patch: - lib: Restrict cpumask_local_spread to housekeeping CPUs Also, a similar case can be made for the rps patch that went in with this: - net: Restrict receive packets queuing to housekeeping CPUs Yes, this is the userspace solution: https://lkml.org/lkml/2021/1/22/815 Should have a kernel document with this info and examples (the network queue configuration as well). Will send something. + net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus I am not sure about the PCI patch as I don't think we can control that from the userspace or maybe I am wrong? You mean "lib: Restrict cpumask_local_spread to housekeeping CPUs" ? If we want to do it from userspace, we should have something that triggers it in userspace. Should we use udev for this purpose? -- Alex