Re: [ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation
Michael Santana writes: > On 7/4/22 09:46, Ilya Maximets wrote: >> Hi, Michael. Thanks for the new version! > Hi Ilya, thanks for the response >> On 6/6/22 20:59, Michael Santana wrote: >>> Additional threads are required to service upcalls when we have CPU >>> isolation (in per-cpu dispatch mode). The reason additional threads >>> are required is because it creates a more fair distribution. >> I think, the description above lacks the information on why we need >> to create additional threads, and why we choose the number to be a >> prime number. Yes, you said that it creates a more fair distribution, >> but we probably need to describe that more with an example. >> >>> With more >>> threads we decrease the load of each thread as more threads would >>> decrease the number of cores each threads is assigned. >> While that is true, it's not obvious why increasing the number of >> threads beyond the number of available cores is good for us. >> The key factor is more fair load distribution among threads, so all >> cores are utilized, but that is not highlighted. > Yes agreed. > > We want additional threads because we want to minimize the number of > cores assigned to individual threads. 75 handler threads servicing 100 > cores is more optimal than 50 threads servicing 100 cores. This is > because on the 75 handler case, each thread would have to service on > average 1.33 cores where as the 50 handler case each thread would have > to service 2 cores. Less cores assigned to individual threads less to > less work that thread has to do. This example ignores the number of > actual cores available to use for OVS userspace. > > On the flip side, we do not wish to create too many threads as we fear > we end up in a case where we have too many threads and not enough > active cores OVS can use > > Which brings me to my last point. The prime schema is arbitrary (or > good enough for now). We really just want more threads and there is no > reason why prime schema is better than the next (at least not without > knowing how many threads we can get away with adding before kernel > overhead hurts performance). I think you had mentioned it several > times but I think we need to do testing to figure out exactly how many > threads we can add before kernel overhead hurts performance. I > speculate the prime schema is on the low end and I think that we can > add more threads without hurting performance than the prime schema > will allow. We can look at how the upcalls/s rate changes based on the > number of handlers and the number of active cores. The prime schema is > an easy solution but it is a little driving blindly without knowing > more stats. Let's also not forget that a user tells OVS to be bound only to certain cores, and will be slightly confused when there are more handler threads than the number of isolated CPUs. This will be confusing no matter what schema is chosen - so it will be important to have some good documentation, and perhaps an INFO log that tells something about this. > On the other hand, stats might be different from one system to > another. What might be good on my test system doesn't necessarily > translate to another system. LMK what you think about this, if this is > worth the effort > > The prime schema is sufficient as is. I just think that we can improve > it a little bit. But we dont have to go down that road if need be > > > > >> >>> Adding as many >>> threads are there are cores could have a performance hit when the number >>> of active cores (which all threads have to share) is very low. For this >>> reason we avoid creating as many threads as there are cores and instead >>> meet somewhere in the middle. >>> >>> The formula used to calculate the number of handler threads to create >>> is as follows: >>> >>> handlers_n = min(next_prime(active_cores+1), total_cores) >>> >>> Where next_prime is a function that returns the next numeric prime >>> number that is strictly greater than active_cores >>> >>> Assume default behavior when total_cores <= 2, that is do not create >>> additional threads when we have less than 2 total cores on the system >>> >>> Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.") >>> Signed-off-by: Michael Santana >>> --- >>> lib/dpif-netlink.c | 86 -- >>> lib/ovs-thread.c | 16 + >>> lib/ovs-thread.h | 1 + >>> 3 files changed, 101 insertions(+), 2 deletions(-) >>> >>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c >>> index 06e1e8ca0..e948a829b 100644 >>> --- a/lib/dpif-netlink.c >>> +++ b/lib/dpif-netlink.c >>> @@ -31,6 +31,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include "bitmap.h" >>> #include "dpif-netlink-rtnl.h" >>> @@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler >>> *handler) >>> } >>> #endif >>> +/* >>> + * Returns 1 if num is a prime number, >>> + * Otherwise return 0 >>> + */ >>> +static uint32_t >
Re: [ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation
On 7/4/22 09:46, Ilya Maximets wrote: Hi, Michael. Thanks for the new version! Hi Ilya, thanks for the response On 6/6/22 20:59, Michael Santana wrote: Additional threads are required to service upcalls when we have CPU isolation (in per-cpu dispatch mode). The reason additional threads are required is because it creates a more fair distribution. I think, the description above lacks the information on why we need to create additional threads, and why we choose the number to be a prime number. Yes, you said that it creates a more fair distribution, but we probably need to describe that more with an example. With more threads we decrease the load of each thread as more threads would decrease the number of cores each threads is assigned. While that is true, it's not obvious why increasing the number of threads beyond the number of available cores is good for us. The key factor is more fair load distribution among threads, so all cores are utilized, but that is not highlighted. Yes agreed. We want additional threads because we want to minimize the number of cores assigned to individual threads. 75 handler threads servicing 100 cores is more optimal than 50 threads servicing 100 cores. This is because on the 75 handler case, each thread would have to service on average 1.33 cores where as the 50 handler case each thread would have to service 2 cores. Less cores assigned to individual threads less to less work that thread has to do. This example ignores the number of actual cores available to use for OVS userspace. On the flip side, we do not wish to create too many threads as we fear we end up in a case where we have too many threads and not enough active cores OVS can use Which brings me to my last point. The prime schema is arbitrary (or good enough for now). We really just want more threads and there is no reason why prime schema is better than the next (at least not without knowing how many threads we can get away with adding before kernel overhead hurts performance). I think you had mentioned it several times but I think we need to do testing to figure out exactly how many threads we can add before kernel overhead hurts performance. I speculate the prime schema is on the low end and I think that we can add more threads without hurting performance than the prime schema will allow. We can look at how the upcalls/s rate changes based on the number of handlers and the number of active cores. The prime schema is an easy solution but it is a little driving blindly without knowing more stats. On the other hand, stats might be different from one system to another. What might be good on my test system doesn't necessarily translate to another system. LMK what you think about this, if this is worth the effort The prime schema is sufficient as is. I just think that we can improve it a little bit. But we dont have to go down that road if need be Adding as many threads are there are cores could have a performance hit when the number of active cores (which all threads have to share) is very low. For this reason we avoid creating as many threads as there are cores and instead meet somewhere in the middle. The formula used to calculate the number of handler threads to create is as follows: handlers_n = min(next_prime(active_cores+1), total_cores) Where next_prime is a function that returns the next numeric prime number that is strictly greater than active_cores Assume default behavior when total_cores <= 2, that is do not create additional threads when we have less than 2 total cores on the system Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.") Signed-off-by: Michael Santana --- lib/dpif-netlink.c | 86 -- lib/ovs-thread.c | 16 + lib/ovs-thread.h | 1 + 3 files changed, 101 insertions(+), 2 deletions(-) diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index 06e1e8ca0..e948a829b 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -31,6 +31,7 @@ #include #include #include +#include #include "bitmap.h" #include "dpif-netlink-rtnl.h" @@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler *handler) } #endif +/* + * Returns 1 if num is a prime number, + * Otherwise return 0 + */ +static uint32_t This should just return bool. ACK, thanks +is_prime(uint32_t num) +{ +if ((num < 2) || ((num % 2) == 0)) { There is no need for so many parenthesis. And it might make sense to just write 3 if blocks here to make the code simpler, i.e. check for < 2, == 2 and % 2. ACK, thanks +return num == 2; +} + +uint32_t i; +uint32_t limit = sqrt((float) num); +for (i = 3; i <= limit; i += 2) { for (uint64_t i = 3; i * i <= num; i += 2) { There is no real need for sqrt. I don't think we should be concerned about 'i * i' performance here. ACK, I think I got carried way with optimization +if (n
Re: [ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation
Hi, Michael. Thanks for the new version! On 6/6/22 20:59, Michael Santana wrote: > Additional threads are required to service upcalls when we have CPU > isolation (in per-cpu dispatch mode). The reason additional threads > are required is because it creates a more fair distribution. I think, the description above lacks the information on why we need to create additional threads, and why we choose the number to be a prime number. Yes, you said that it creates a more fair distribution, but we probably need to describe that more with an example. > With more > threads we decrease the load of each thread as more threads would > decrease the number of cores each threads is assigned. While that is true, it's not obvious why increasing the number of threads beyond the number of available cores is good for us. The key factor is more fair load distribution among threads, so all cores are utilized, but that is not highlighted. > Adding as many > threads are there are cores could have a performance hit when the number > of active cores (which all threads have to share) is very low. For this > reason we avoid creating as many threads as there are cores and instead > meet somewhere in the middle. > > The formula used to calculate the number of handler threads to create > is as follows: > > handlers_n = min(next_prime(active_cores+1), total_cores) > > Where next_prime is a function that returns the next numeric prime > number that is strictly greater than active_cores > > Assume default behavior when total_cores <= 2, that is do not create > additional threads when we have less than 2 total cores on the system > > Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.") > Signed-off-by: Michael Santana > --- > lib/dpif-netlink.c | 86 -- > lib/ovs-thread.c | 16 + > lib/ovs-thread.h | 1 + > 3 files changed, 101 insertions(+), 2 deletions(-) > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > index 06e1e8ca0..e948a829b 100644 > --- a/lib/dpif-netlink.c > +++ b/lib/dpif-netlink.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > > #include "bitmap.h" > #include "dpif-netlink-rtnl.h" > @@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler > *handler) > } > #endif > > +/* > + * Returns 1 if num is a prime number, > + * Otherwise return 0 > + */ > +static uint32_t This should just return bool. > +is_prime(uint32_t num) > +{ > +if ((num < 2) || ((num % 2) == 0)) { There is no need for so many parenthesis. And it might make sense to just write 3 if blocks here to make the code simpler, i.e. check for < 2, == 2 and % 2. > +return num == 2; > +} > + > +uint32_t i; > +uint32_t limit = sqrt((float) num); > +for (i = 3; i <= limit; i += 2) { for (uint64_t i = 3; i * i <= num; i += 2) { There is no real need for sqrt. I don't think we should be concerned about 'i * i' performance here. > +if (num % i == 0) { > +return 0; > +} > +} > + > +return 1; > +} > + > +/* > + * Returns num if num is a prime number. Otherwise returns the next > + * prime greater than num. Search is limited by UINT32_MAX. 2 spaces between sentences, please. > + * > + * Returns 0 if no prime has been found between num and UINT32_MAX > + */ > +static uint32_t > +next_prime(uint32_t num) > +{ > +if (num < 2) { > +return 2; > +} > +if (num == 2) { > +return 3; This contradicts the description of the function. So, this check should probably be done in dpif_netlink_calculate_handlers_num. > +} > +if (num % 2 == 0) { > +num++; > +} > + > +uint32_t i; > +for (i = num; i < UINT32_MAX; i += 2) { We may just start this loop from 'num' and increment by 1 instead of 2. is_prime will return false for even numbers right away, so there is no real need for +=2 optimization here. And we can also remove 2 out of 3 'if's at the beginning of this function (except the incorrect one). > +if (is_prime(i)) { > +return i; > +} > +} > + > +return 0; > +} > + > +/* > + * Calcuales and returns the number of handler threads needed based s/Calcuales/Calculates/ > + * the following formula: > + * > + * handlers_n = min(next_prime(active_cores+1), total_cores) > + * > + * Where next_prime is a function that returns the next numeric prime > + * number that is strictly greater than active_cores > + */ > +static uint32_t > +dpif_netlink_calculate_handlers_num(void) Maybe dpif_netlink_calculate_n_handlers() ? > +{ > +uint32_t next_prime_num; > +uint32_t n_handlers = count_cpu_cores(); > +uint32_t total_cores = count_total_cores(); Reverse x-mass tree, please. > + > +/* > + * If we have isolated cores, add additional handler threads to > + * service inactive cores in the unlikely event that traffic goes > + * through inactive cores These core are not
Re: [ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation
On Fri, Jun 17, 2022 at 12:10 PM Mike Pattrick wrote: > > On Mon, Jun 6, 2022 at 3:00 PM Michael Santana wrote: > > > > Additional threads are required to service upcalls when we have CPU > > isolation (in per-cpu dispatch mode). The reason additional threads > > are required is because it creates a more fair distribution. With more > > threads we decrease the load of each thread as more threads would > > decrease the number of cores each threads is assigned. Adding as many > > threads are there are cores could have a performance hit when the number > > of active cores (which all threads have to share) is very low. For this > > reason we avoid creating as many threads as there are cores and instead > > meet somewhere in the middle. > > > > The formula used to calculate the number of handler threads to create > > is as follows: > > > > handlers_n = min(next_prime(active_cores+1), total_cores) > > Why are primes superior to just any odd number in between active and > total_cores? They aren't. Both strategies are valid and sufficient. The primes do provide a more or less even distribution. We dont want too many threads and not enough additional threads I had proposed the following but it again it is not better or worse than the prime approach. It mainly just scales linearly handlers_n = min(active_cores/4, (total_cores-active_cores)/4) > > I may just be missing something; but if there were 8 active cores and > 12 total cores, why would 11 handlers be better than 9? > > Cheers, > M > > > > > Where next_prime is a function that returns the next numeric prime > > number that is strictly greater than active_cores > > > > Assume default behavior when total_cores <= 2, that is do not create > > additional threads when we have less than 2 total cores on the system > > > > Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.") > > Signed-off-by: Michael Santana > > --- > > lib/dpif-netlink.c | 86 -- > > lib/ovs-thread.c | 16 + > > lib/ovs-thread.h | 1 + > > 3 files changed, 101 insertions(+), 2 deletions(-) > > > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > > index 06e1e8ca0..e948a829b 100644 > > --- a/lib/dpif-netlink.c > > +++ b/lib/dpif-netlink.c > > @@ -31,6 +31,7 @@ > > #include > > #include > > #include > > +#include > > > > #include "bitmap.h" > > #include "dpif-netlink-rtnl.h" > > @@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler > > *handler) > > } > > #endif > > > > +/* > > + * Returns 1 if num is a prime number, > > + * Otherwise return 0 > > + */ > > +static uint32_t > > +is_prime(uint32_t num) > > +{ > > +if ((num < 2) || ((num % 2) == 0)) { > > +return num == 2; > > +} > > + > > +uint32_t i; > > +uint32_t limit = sqrt((float) num); > > +for (i = 3; i <= limit; i += 2) { > > +if (num % i == 0) { > > +return 0; > > +} > > +} > > + > > +return 1; > > +} > > + > > +/* > > + * Returns num if num is a prime number. Otherwise returns the next > > + * prime greater than num. Search is limited by UINT32_MAX. > > + * > > + * Returns 0 if no prime has been found between num and UINT32_MAX > > + */ > > +static uint32_t > > +next_prime(uint32_t num) > > +{ > > +if (num < 2) { > > +return 2; > > +} > > +if (num == 2) { > > +return 3; > > +} > > +if (num % 2 == 0) { > > +num++; > > +} > > + > > +uint32_t i; > > +for (i = num; i < UINT32_MAX; i += 2) { > > +if (is_prime(i)) { > > +return i; > > +} > > +} > > + > > +return 0; > > +} > > + > > +/* > > + * Calcuales and returns the number of handler threads needed based > > + * the following formula: > > + * > > + * handlers_n = min(next_prime(active_cores+1), total_cores) > > + * > > + * Where next_prime is a function that returns the next numeric prime > > + * number that is strictly greater than active_cores > > + */ > > +static uint32_t > > +dpif_netlink_calculate_handlers_num(void) > > +{ > > +uint32_t next_prime_num; > > +uint32_t n_handlers = count_cpu_cores(); > > +uint32_t total_cores = count_total_cores(); > > + > > +/* > > + * If we have isolated cores, add additional handler threads to > > + * service inactive cores in the unlikely event that traffic goes > > + * through inactive cores > > + */ > > +if (n_handlers < total_cores && total_cores > 2) { > > +next_prime_num = next_prime(n_handlers + 1); > > +n_handlers = next_prime_num >= total_cores ? > > +total_cores : next_prime_num; > > +} > > + > > +return n_handlers; > > +} > > + > > static int > > dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *dpif) > > OVS_REQ_WRLOCK(dpif->upcall_lock) > > @@ -2515,7 +2597,7 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct > > dpif_netlink *dpif) > > uint32_
Re: [ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation
On Mon, Jun 6, 2022 at 3:00 PM Michael Santana wrote: > > Additional threads are required to service upcalls when we have CPU > isolation (in per-cpu dispatch mode). The reason additional threads > are required is because it creates a more fair distribution. With more > threads we decrease the load of each thread as more threads would > decrease the number of cores each threads is assigned. Adding as many > threads are there are cores could have a performance hit when the number > of active cores (which all threads have to share) is very low. For this > reason we avoid creating as many threads as there are cores and instead > meet somewhere in the middle. > > The formula used to calculate the number of handler threads to create > is as follows: > > handlers_n = min(next_prime(active_cores+1), total_cores) Why are primes superior to just any odd number in between active and total_cores? I may just be missing something; but if there were 8 active cores and 12 total cores, why would 11 handlers be better than 9? Cheers, M > > Where next_prime is a function that returns the next numeric prime > number that is strictly greater than active_cores > > Assume default behavior when total_cores <= 2, that is do not create > additional threads when we have less than 2 total cores on the system > > Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.") > Signed-off-by: Michael Santana > --- > lib/dpif-netlink.c | 86 -- > lib/ovs-thread.c | 16 + > lib/ovs-thread.h | 1 + > 3 files changed, 101 insertions(+), 2 deletions(-) > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > index 06e1e8ca0..e948a829b 100644 > --- a/lib/dpif-netlink.c > +++ b/lib/dpif-netlink.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > > #include "bitmap.h" > #include "dpif-netlink-rtnl.h" > @@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler > *handler) > } > #endif > > +/* > + * Returns 1 if num is a prime number, > + * Otherwise return 0 > + */ > +static uint32_t > +is_prime(uint32_t num) > +{ > +if ((num < 2) || ((num % 2) == 0)) { > +return num == 2; > +} > + > +uint32_t i; > +uint32_t limit = sqrt((float) num); > +for (i = 3; i <= limit; i += 2) { > +if (num % i == 0) { > +return 0; > +} > +} > + > +return 1; > +} > + > +/* > + * Returns num if num is a prime number. Otherwise returns the next > + * prime greater than num. Search is limited by UINT32_MAX. > + * > + * Returns 0 if no prime has been found between num and UINT32_MAX > + */ > +static uint32_t > +next_prime(uint32_t num) > +{ > +if (num < 2) { > +return 2; > +} > +if (num == 2) { > +return 3; > +} > +if (num % 2 == 0) { > +num++; > +} > + > +uint32_t i; > +for (i = num; i < UINT32_MAX; i += 2) { > +if (is_prime(i)) { > +return i; > +} > +} > + > +return 0; > +} > + > +/* > + * Calcuales and returns the number of handler threads needed based > + * the following formula: > + * > + * handlers_n = min(next_prime(active_cores+1), total_cores) > + * > + * Where next_prime is a function that returns the next numeric prime > + * number that is strictly greater than active_cores > + */ > +static uint32_t > +dpif_netlink_calculate_handlers_num(void) > +{ > +uint32_t next_prime_num; > +uint32_t n_handlers = count_cpu_cores(); > +uint32_t total_cores = count_total_cores(); > + > +/* > + * If we have isolated cores, add additional handler threads to > + * service inactive cores in the unlikely event that traffic goes > + * through inactive cores > + */ > +if (n_handlers < total_cores && total_cores > 2) { > +next_prime_num = next_prime(n_handlers + 1); > +n_handlers = next_prime_num >= total_cores ? > +total_cores : next_prime_num; > +} > + > +return n_handlers; > +} > + > static int > dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *dpif) > OVS_REQ_WRLOCK(dpif->upcall_lock) > @@ -2515,7 +2597,7 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct > dpif_netlink *dpif) > uint32_t n_handlers; > uint32_t *upcall_pids; > > -n_handlers = count_cpu_cores(); > +n_handlers = dpif_netlink_calculate_handlers_num(); > if (dpif->n_handlers != n_handlers) { > VLOG_DBG("Dispatch mode(per-cpu): initializing %d handlers", > n_handlers); > @@ -2755,7 +2837,7 @@ dpif_netlink_number_handlers_required(struct dpif > *dpif_, uint32_t *n_handlers) > struct dpif_netlink *dpif = dpif_netlink_cast(dpif_); > > if (dpif_netlink_upcall_per_cpu(dpif)) { > -*n_handlers = count_cpu_cores(); > +*n_handlers = dpif_netlink_calculate_handlers_num(); > return true; > } > > diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
[ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation
Additional threads are required to service upcalls when we have CPU isolation (in per-cpu dispatch mode). The reason additional threads are required is because it creates a more fair distribution. With more threads we decrease the load of each thread as more threads would decrease the number of cores each threads is assigned. Adding as many threads are there are cores could have a performance hit when the number of active cores (which all threads have to share) is very low. For this reason we avoid creating as many threads as there are cores and instead meet somewhere in the middle. The formula used to calculate the number of handler threads to create is as follows: handlers_n = min(next_prime(active_cores+1), total_cores) Where next_prime is a function that returns the next numeric prime number that is strictly greater than active_cores Assume default behavior when total_cores <= 2, that is do not create additional threads when we have less than 2 total cores on the system Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.") Signed-off-by: Michael Santana --- lib/dpif-netlink.c | 86 -- lib/ovs-thread.c | 16 + lib/ovs-thread.h | 1 + 3 files changed, 101 insertions(+), 2 deletions(-) diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index 06e1e8ca0..e948a829b 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -31,6 +31,7 @@ #include #include #include +#include #include "bitmap.h" #include "dpif-netlink-rtnl.h" @@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler *handler) } #endif +/* + * Returns 1 if num is a prime number, + * Otherwise return 0 + */ +static uint32_t +is_prime(uint32_t num) +{ +if ((num < 2) || ((num % 2) == 0)) { +return num == 2; +} + +uint32_t i; +uint32_t limit = sqrt((float) num); +for (i = 3; i <= limit; i += 2) { +if (num % i == 0) { +return 0; +} +} + +return 1; +} + +/* + * Returns num if num is a prime number. Otherwise returns the next + * prime greater than num. Search is limited by UINT32_MAX. + * + * Returns 0 if no prime has been found between num and UINT32_MAX + */ +static uint32_t +next_prime(uint32_t num) +{ +if (num < 2) { +return 2; +} +if (num == 2) { +return 3; +} +if (num % 2 == 0) { +num++; +} + +uint32_t i; +for (i = num; i < UINT32_MAX; i += 2) { +if (is_prime(i)) { +return i; +} +} + +return 0; +} + +/* + * Calcuales and returns the number of handler threads needed based + * the following formula: + * + * handlers_n = min(next_prime(active_cores+1), total_cores) + * + * Where next_prime is a function that returns the next numeric prime + * number that is strictly greater than active_cores + */ +static uint32_t +dpif_netlink_calculate_handlers_num(void) +{ +uint32_t next_prime_num; +uint32_t n_handlers = count_cpu_cores(); +uint32_t total_cores = count_total_cores(); + +/* + * If we have isolated cores, add additional handler threads to + * service inactive cores in the unlikely event that traffic goes + * through inactive cores + */ +if (n_handlers < total_cores && total_cores > 2) { +next_prime_num = next_prime(n_handlers + 1); +n_handlers = next_prime_num >= total_cores ? +total_cores : next_prime_num; +} + +return n_handlers; +} + static int dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *dpif) OVS_REQ_WRLOCK(dpif->upcall_lock) @@ -2515,7 +2597,7 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *dpif) uint32_t n_handlers; uint32_t *upcall_pids; -n_handlers = count_cpu_cores(); +n_handlers = dpif_netlink_calculate_handlers_num(); if (dpif->n_handlers != n_handlers) { VLOG_DBG("Dispatch mode(per-cpu): initializing %d handlers", n_handlers); @@ -2755,7 +2837,7 @@ dpif_netlink_number_handlers_required(struct dpif *dpif_, uint32_t *n_handlers) struct dpif_netlink *dpif = dpif_netlink_cast(dpif_); if (dpif_netlink_upcall_per_cpu(dpif)) { -*n_handlers = count_cpu_cores(); +*n_handlers = dpif_netlink_calculate_handlers_num(); return true; } diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c index 805cba622..2172b3d3f 100644 --- a/lib/ovs-thread.c +++ b/lib/ovs-thread.c @@ -663,6 +663,22 @@ count_cpu_cores(void) return n_cores > 0 ? n_cores : 0; } +/* Returns the total number of cores on the system, or 0 if the + * number cannot be determined. */ +int +count_total_cores(void) { +long int n_cores; + +#ifndef _WIN32 +n_cores = sysconf(_SC_NPROCESSORS_CONF); +#else +n_cores = 0; +errno = ENOTSUP; +#endif + +return n_cores > 0 ? n_cores : 0; +} + /* Returns 'true' if current thread is PMD thread. */ bool thread_is_pmd(vo