On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen <jsoren...@fb.com> wrote: > On 11/01/2017 01:21 PM, Sagi Grimberg wrote: >>> Hi, >> >> Hi Jes, >> >>> The below patch seems to have broken PCI IRQ affinity assignments for >>> mlx5. >> >> I wouldn't call it breaking IRQ affinity assignments. It just makes >> them automatic. > > Well it explicitly breaks the option for an admin to assign them > manually, which contradicts what is written in > Documentation/IRQ-affinity.txt > >>> Prior to this patch I could echo a value to /proc/irq/<x>/smp_affinity >>> and it would get assigned. With this patch applied I get -EIO >> >> Adding Christoph, >> >> Ideally the affinity assignments would be better left alone, but >> I wasn't aware that they are now immutable. Christoph? >> >>> The actual affinity assignments seem to have changed too, but I assume >>> this is a result of the generic allocation? >> >> The affinity assignments should be giving you perfect linear assignment >> of the rx/tx rings completion vectors to CPU cores: >> first -> 0 >> second -> 1 >> third -> 2 >> ... >> >> Before, mlx5 spread affinity starting from the local numa node as it >> relied on that when constructing RSS indirection table only to the local >> numa node (which as a side effect hurt applications running on the far >> node as the traffic was guaranteed to arrive rx rings located in the >> "wrong" node). >> >> Now the RSS indirection table is linear across both numa nodes just like >> the irq affinity settings. Another thing that was improved was the >> pre/post vectors which blacklisted any non rx/tx completion vectors from >> the affinity assignments (like port events completion vectors from >> example). > > I am all in favor of making the automatic setup better, but assuming an > automatic setup is always right seems problematic. There could be > workloads where you may want to assign affinity explicitly. > > Jes
I vaguely remember Nacking Sagi's patch as we knew it would break mlx5e netdev affinity assumptions. Anyway we already submitted the mlx5e patch that removed such assumption https://patchwork.ozlabs.org/patch/809196/ ("net/mlx5e: Distribute RSS table among all RX rings") Jes please confirm you have it. And I agree here that user should be able to read /proc/irq/x/smp_affinity and even modify it if required.