On Wed, Mar 29, 2017 at 08:15:50PM +0300, Sagi Grimberg wrote: > > > The number of vectors to assign needs to be adjusted for each node such > > that it doesn't exceed the number of CPUs in that node. This patch > > recalculates the vector assignment per-node so that we don't try to > > assign more vectors than there are CPUs. When that previously happened, > > the cpus_per_vec was calculated to be 0, so many vectors had no CPUs > > assigned. This then goes on to fail to allocate descriptors due to > > empty masks, leading to an unoptimal spread. > > Can you give a specific (numeric) example where this happens? I'm having > a little trouble following the logical change here.
Sure, I have a 2-socket server with 16 threads each. I take one CPU offline in socket 2, so I've 16 threads on socket 1, 15 in socket 2. In total, 31 threads so requesting 31 vectors. Currently, vecs_per_node is calculated in the first iteration as 31 / 2, so 15. ncpus of socket 1 is 16. cpus_per_vec = 16 / 15, so 1 CPU per vector with one extra. When iterating the second socket, though, vecs_per_node is incremented from 15 to 16 (to account for the "extra" from before). However, the ncpus is only 15, so that iteration calculates: cpus_per_vec = 15 / 16 And since that's zero, the remaining 16 vectors are not assigned to any CPU, and the second socket has no vectors assigned to their CPUs.