On 10/30/25 10:37, Michal Swiatkowski wrote:
On Thu, Oct 30, 2025 at 10:10:32AM +0100, Paul Menzel wrote:
Dear Michal,


Thank you for your patch. For the summary, I’d add:

ice: Use netif_get_num_default_rss_queues() to decrease queue number

I would instead just say:
ice: cap the default number of queues to 64

as this is exactly what happens. Then next paragraph could be:
Use netif_get_num_default_rss_queues() as a better base (instead of
the number of CPU cores), but still cap it to 64 to avoid excess IRQs
assigned to PF (what would leave, in some cases, nothing for VFs).

sorry for such late nitpicks
and, see below too


Am 30.10.25 um 09:30 schrieb Michal Swiatkowski:
On some high-core systems (like AMD EPYC Bergamo, Intel Clearwater
Forest) loading ice driver with default values can lead to queue/irq
exhaustion. It will result in no additional resources for SR-IOV.

Could you please elaborate how to make the queue/irq exhaustion visible?


What do you mean? On high core system, lets say num_online_cpus()
returns 288, on 8 ports card we have online 256 irqs per eqch PF (2k in
total). Driver will load with the 256 queues (and irqs) on each PF.
Any VFs creation command will fail due to no free irqs available.

this clearly means this is a -net material,
even if this commit will be rather unpleasant for backports to stable

(echo X > /sys/class/net/ethX/device/sriov_numvfs)

In most cases there is no performance reason for more than half
num_cpus(). Limit the default value to it using generic
netif_get_num_default_rss_queues().

Still, using ethtool the number of queues can be changed up to
num_online_cpus(). It can be done by calling:
$ethtool -L ethX combined $(nproc)

This change affects only the default queue amount.

How would you judge the regression potential, that means for people where
the defaults work good enough, and the queue number is reduced now?


You can take a look into commit that introduce /2 change in
netif_get_num_default_rss_queues() [1]. There is a good justification
for such situation. In short, heaving physical core number is just a
wasting of CPU resources.

[1] https://lore.kernel.org/netdev/[email protected]/

[...]

Reply via email to