Re: [EXT] Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs

2021-03-04 Thread Nitesh Narayan Lal


On 3/4/21 4:13 PM, Alex Belits wrote:
> On 3/4/21 10:15, Nitesh Narayan Lal wrote:
>> External Email
>>
>> --
>>
>> On 2/11/21 10:55 AM, Nitesh Narayan Lal wrote:
>>> On 2/6/21 7:43 PM, Nitesh Narayan Lal wrote:
 On 2/5/21 5:23 PM, Thomas Gleixner wrote:
> On Thu, Feb 04 2021 at 14:17, Nitesh Narayan Lal wrote:
>> On 2/4/21 2:06 PM, Marcelo Tosatti wrote:
> How about adding a new flag for isolcpus instead?
>
 Do you mean a flag based on which we can switch the affinity mask to
 housekeeping for all the devices at the time of IRQ distribution?
>>> Yes a new flag for isolcpus. HK_FLAG_IRQ_SPREAD or some better name.
>> Does sounds like a nice idea to explore, lets see what Thomas thinks
>> about it.
>>> 
>>>
>> When the affinity mask of the interrupt at the time when it is actually
>> requested contains an isolated CPU then nothing prevents the kernel from
>> steering it at an isolated CPU. But that has absolutely nothing to do
>> with that spreading thingy.
>>
>> The only difference which this change makes is the fact that the
>> affinity hint changes. Nothing else.
>>
 Thanks for the detailed explanation.

 Before I posted this patch, I was doing some debugging on a setup where I
 was observing some latency issues due to the iavf IRQs that were pinned on
 the isolated CPUs.

 Based on some initial traces I had this impression that the affinity hint
 or cpumask_local_spread was somehow playing a role in deciding the affinity
 mask of these IRQs. Although, that does look incorrect after going through
 your explanation.
 For some reason, with a kernel that had this patch when I tried creating
 VFs iavf IRQs always ended up on the HK CPUs.

 The reasoning for the above is still not very clear to me. I will
 investigate
 this further to properly understand this behavior.


>>> After a little more digging, I found out why cpumask_local_spread change
>>> affects the general/initial smp_affinity for certain device IRQs.
>>>
>>> After the introduction of the commit:
>>>
>>>  e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()
>>>
>>
>> Continuing the conversation about the above commit and adding Jesse.
>> I was trying to understand the problem that the commit message explains
>> "The default behavior of the kernel is somewhat undesirable as all
>> requested interrupts end up on CPU0 after registration.", I have also been
>> trying to reproduce this behavior without the patch but I failed in doing
>> so, maybe because I am missing something here.
>>
>> @Jesse Can you please explain? FWIU IRQ affinity should be decided based on
>> the default affinity mask.
>>
>> The problem with the commit is that when we overwrite the affinity mask
>> based on the hinting mask we completely ignore the default SMP affinity
>> mask. If we do want to overwrite the affinity based on the hint mask we
>> should atleast consider the default SMP affinity.
>>
>
> cpumask_local_spread() is used by a small number of drivers, mostly for
> Ethernet and cryptographic devices, however it includes Cavium and Marvell
> devices that were included in every piece of hardware that I and Yury Norov
> worked on. Without my patch (or previous, later replaced, Yury's patch that
> was developed before there were housekeeping CPUs), driver would completely
> break any attempts to configure task isolation, because it would distribute
> processing over CPUs regardless of any masks related to isolation (and later
> housekeeping). This is why it was created, and it just happens that it also
> makes sense for CPU isolation in general. Of course, none of it would be
> experienced on hardware that does not include those devices, possibly
> creating some wrong impression about its effect and purpose.
>
> It may be that my patch can be criticized for not accommodating CPU hotplug
> and other runtime changes of masks. Or drivers can be criticized for their
> behavior that relies on calling cpumask_local_spread() once on
> initialization and then assuming that all CPUs are configured forever.
> However as far as I can tell, currently we have no other means of
> controlling the behavior of drivers that manage their own interrupt or
> thread to CPU mapping, and no way to communicate any of those changes to
> them while they are running. Drivers may have legitimate reasons for
> maintaining permanent or semi-permanent CPU core to interrupt mapping,
> especially on devices with very large numbers of CPU cores and built-in
> support for parallel processing of network packets.
>
> If we want it to be done in some manner that accommodates current demands,
> we should simply replace cpumask_local_spread() with something else, or,
> maybe, add some means that will allow dynamic changes. Thankfully, there are
> very few (IIRC, 19) pla

Re: [EXT] Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs

2021-02-01 Thread Marcelo Tosatti
On Fri, Jan 29, 2021 at 07:41:27AM -0800, Alex Belits wrote:
> On 1/28/21 07:56, Thomas Gleixner wrote:
> > External Email
> > 
> > --
> > On Wed, Jan 27 2021 at 10:09, Marcelo Tosatti wrote:
> > > On Wed, Jan 27, 2021 at 12:36:30PM +, Robin Murphy wrote:
> > > > > > > /**
> > > > > > >  * cpumask_next - get the next cpu in a cpumask
> > > > > > > @@ -205,22 +206,27 @@ void __init 
> > > > > > > free_bootmem_cpumask_var(cpumask_var_t mask)
> > > > > > >  */
> > > > > > > unsigned int cpumask_local_spread(unsigned int i, int node)
> > > > > > > {
> > > > > > > - int cpu;
> > > > > > > + int cpu, hk_flags;
> > > > > > > + const struct cpumask *mask;
> > > > > > > + hk_flags = HK_FLAG_DOMAIN | HK_FLAG_MANAGED_IRQ;
> > > > > > > + mask = housekeeping_cpumask(hk_flags);
> > > > > > 
> > > > > > AFAICS, this generally resolves to something based on 
> > > > > > cpu_possible_mask
> > > > > > rather than cpu_online_mask as before, so could now potentially 
> > > > > > return an
> > > > > > offline CPU. Was that an intentional change?
> > > > > 
> > > > > Robin,
> > > > > 
> > > > > AFAICS online CPUs should be filtered.
> > > > 
> > > > Apologies if I'm being thick, but can you explain how? In the case of
> > > > isolation being disabled or compiled out, housekeeping_cpumask() is
> > > > literally just "return cpu_possible_mask;". If we then iterate over that
> > > > with for_each_cpu() and just return the i'th possible CPU (e.g. in the
> > > > NUMA_NO_NODE case), what guarantees that CPU is actually online?
> > > > 
> > > > Robin.
> > > 
> > > Nothing, but that was the situation before 
> > > 1abdfe706a579a702799fce465bceb9fb01d407c
> > > as well.
> > > 
> > > cpumask_local_spread() should probably be disabling CPU hotplug.
> > 
> > It can't unless all callers are from preemtible code.
> > 
> > Aside of that this whole frenzy to sprinkle housekeeping_cpumask() all
> > over the kernel is just wrong, really.
> > 
> > As I explained several times before there are very valid reasons for
> > having queues and interrupts on isolated CPUs. Just optimizing for the
> > usecases some people care about is not making anything better.
> 
> However making it mandatory for isolated CPUs to allow interrupts is not a
> good idea, either. Providing an environment free of disturbances is a valid
> goal, so we can't do something that will make it impossible to achieve. We
> know that both there is a great amount of demand for this feature and
> implementing it is doable, so cutting off the possibility of development in
> this direction would be bad.
> 
> Before there was housekeeping mask, I had to implement another, more
> cumbersome model that ended up being more intrusive than I wanted. That was
> one of the reasons why I have spent some time working on it in, please
> forgive me the pun, isolation.
> 
> I was relieved when housekeeping mask appeared, and I was able to remove a
> large chunk of code that distinguished between CPUs that "are there" and
> CPUs "available to run work". Housekeeping is supposed to define the set of
> CPUs that are intended to run work that is not specifically triggered by
> anything running on those CPUs. "CPUs that are there" are CPUs that are
> being maintained as a part of the system, so they are usable for running
> things on them.
> 
> My idea at the time was that we can separate this into two directions of
> development:
> 
> 1. Make sure that housekeeping mask applies to all kinds of work that
> appears on CPUs, so nothing random will end up running there. Because this
> is very much in line of what it does.

Its easier to specify "all members of set" rather than having to specify each
individual member. Thinking of set as a description of types of
activities that should not have a given CPU as a target.

> 2. Rely on housekeeping mask to exclude anything not specifically intended
> to run on isolated CPUs, and concentrate efforts on making sure that things
> that are intended to [eventually] happen on those CPUs are handled properly
> -- in case of my recent proposals, delayed until synchronization event at
> the next kernel entry.

Your point makes sense, but the "isolated CPU + one interrupt to this CPU"
use-case is a valid one.

"target CPU" is the CPU we are interested in executing the application
(and handling the interrupt). So isolation happens today in the
following components:

1)  setup kernel boot parameters for isolation (suppose
the goal is to move these to happen dynamically in
userspace, so they would move to step 2 in the future).

2)  disable certain activities from happening at target CPU: "exclude 
ALL
activities for target CPU which can be performed by
housekeeping CPUs" (which might involve a
number of steps in userspace such as
the RPS mask configuration, network queue configuration
   

Re: [EXT] Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping CPUs

2021-01-29 Thread Alex Belits

On 1/29/21 06:23, Marcelo Tosatti wrote:

External Email

--
On Fri, Jan 29, 2021 at 08:55:20AM -0500, Nitesh Narayan Lal wrote:


On 1/28/21 3:01 PM, Thomas Gleixner wrote:

On Thu, Jan 28 2021 at 13:59, Marcelo Tosatti wrote:

The whole pile wants to be reverted. It's simply broken in several ways.

I was asking for your comments on interaction with CPU hotplug :-)

Which I answered in an seperate mail :)


So housekeeping_cpumask has multiple meanings. In this case:

...


So as long as the meaning of the flags are respected, seems
alright.

Yes. Stuff like the managed interrupts preference for housekeeping CPUs
when a affinity mask spawns housekeeping and isolated is perfectly
fine. It's well thought out and has no limitations.


Nitesh, is there anything preventing this from being fixed
in userspace ? (as Thomas suggested previously).

Everything with is not managed can be steered by user space.

Thanks,

 tglx



So, I think the conclusion here would be to revert the change made in
cpumask_local_spread via the patch:
  - lib: Restrict cpumask_local_spread to housekeeping CPUs

Also, a similar case can be made for the rps patch that went in with
this:
 - net: Restrict receive packets queuing to housekeeping CPUs


Yes, this is the userspace solution:

https://lkml.org/lkml/2021/1/22/815

Should have a kernel document with this info and examples
(the network queue configuration as well). Will
send something.


 + net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus

I am not sure about the PCI patch as I don't think we can control that from
the userspace or maybe I am wrong?


You mean "lib: Restrict cpumask_local_spread to housekeeping CPUs" ?



If we want to do it from userspace, we should have something that 
triggers it in userspace. Should we use udev for this purpose?


--
Alex