Re: [ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation

2022-07-06 Thread Aaron Conole
Michael Santana  writes:

> On 7/4/22 09:46, Ilya Maximets wrote:
>> Hi, Michael.  Thanks for the new version!
> Hi Ilya, thanks for the response
>> On 6/6/22 20:59, Michael Santana wrote:
>>> Additional threads are required to service upcalls when we have CPU
>>> isolation (in per-cpu dispatch mode). The reason additional threads
>>> are required is because it creates a more fair distribution.
>> I think, the description above lacks the information on why we need
>> to create additional threads, and why we choose the number to be a
>> prime number.  Yes, you said that it creates a more fair distribution,
>> but we probably need to describe that more with an example.
>> 
>>> With more
>>> threads we decrease the load of each thread as more threads would
>>> decrease the number of cores each threads is assigned.
>> While that is true, it's not obvious why increasing the number of
>> threads beyond the number of available cores is good for us.
>> The key factor is more fair load distribution among threads, so all
>> cores are utilized, but that is not highlighted.
> Yes agreed.
>
> We want additional threads because we want to minimize the number of
> cores assigned to individual threads. 75 handler threads servicing 100 
> cores is more optimal than 50 threads servicing 100 cores. This is
> because on the 75 handler case, each thread would have to service on 
> average 1.33 cores where as the 50 handler case each thread would have
> to service 2 cores. Less cores assigned to individual threads less to 
> less work that thread has to do. This example ignores the number of
> actual cores available to use for OVS userspace.
>
> On the flip side, we do not wish to create too many threads as we fear
> we end up in a case where we have too many threads and not enough
> active cores OVS can use
>
> Which brings me to my last point. The prime schema is arbitrary (or
> good enough for now). We really just want more threads and there is no
> reason why prime schema is better than the next (at least not without
> knowing how many threads we can get away with adding before kernel
> overhead hurts performance). I think you had mentioned it several
> times but I think we need to do testing to figure out exactly how many
> threads we can add before kernel overhead hurts performance. I
> speculate the prime schema is on the low end and I think that we can
> add more threads without hurting performance than the prime schema
> will allow. We can look at how the upcalls/s rate changes based on the
> number of handlers and the number of active cores. The prime schema is
> an easy solution but it is a little driving blindly without knowing
> more stats.

Let's also not forget that a user tells OVS to be bound only to certain
cores, and will be slightly confused when there are more handler threads
than the number of isolated CPUs.  This will be confusing no matter
what schema is chosen - so it will be important to have some good
documentation, and perhaps an INFO log that tells something about this.

> On the other hand, stats might be different from one system to
> another. What might be good on my test system doesn't necessarily
> translate to another system. LMK what you think about this, if this is
> worth the effort
>
> The prime schema is sufficient as is. I just think that we can improve
> it a little bit. But we dont have to go down that road if need be
>
>
>
>
>> 
>>> Adding as many
>>> threads are there are cores could have a performance hit when the number
>>> of active cores (which all threads have to share) is very low. For this
>>> reason we avoid creating as many threads as there are cores and instead
>>> meet somewhere in the middle.
>>>
>>> The formula used to calculate the number of handler threads to create
>>> is as follows:
>>>
>>> handlers_n = min(next_prime(active_cores+1), total_cores)
>>>
>>> Where next_prime is a function that returns the next numeric prime
>>> number that is strictly greater than active_cores
>>>
>>> Assume default behavior when total_cores <= 2, that is do not create
>>> additional threads when we have less than 2 total cores on the system
>>>
>>> Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
>>> Signed-off-by: Michael Santana 
>>> ---
>>>   lib/dpif-netlink.c | 86 --
>>>   lib/ovs-thread.c   | 16 +
>>>   lib/ovs-thread.h   |  1 +
>>>   3 files changed, 101 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>>> index 06e1e8ca0..e948a829b 100644
>>> --- a/lib/dpif-netlink.c
>>> +++ b/lib/dpif-netlink.c
>>> @@ -31,6 +31,7 @@
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>> #include "bitmap.h"
>>>   #include "dpif-netlink-rtnl.h"
>>> @@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler 
>>> *handler)
>>>   }
>>>   #endif
>>>   +/*
>>> + * Returns 1 if num is a prime number,
>>> + * Otherwise return 0
>>> + */
>>> +static uint32_t

Re: [ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation

2022-07-05 Thread Michael Santana




On 7/4/22 09:46, Ilya Maximets wrote:

Hi, Michael.  Thanks for the new version!

Hi Ilya, thanks for the response


On 6/6/22 20:59, Michael Santana wrote:

Additional threads are required to service upcalls when we have CPU
isolation (in per-cpu dispatch mode). The reason additional threads
are required is because it creates a more fair distribution.


I think, the description above lacks the information on why we need
to create additional threads, and why we choose the number to be a
prime number.  Yes, you said that it creates a more fair distribution,
but we probably need to describe that more with an example.


With more
threads we decrease the load of each thread as more threads would
decrease the number of cores each threads is assigned.


While that is true, it's not obvious why increasing the number of
threads beyond the number of available cores is good for us.
The key factor is more fair load distribution among threads, so all
cores are utilized, but that is not highlighted.

Yes agreed.

We want additional threads because we want to minimize the number of 
cores assigned to individual threads. 75 handler threads servicing 100 
cores is more optimal than 50 threads servicing 100 cores. This is 
because on the 75 handler case, each thread would have to service on 
average 1.33 cores where as the 50 handler case each thread would have 
to service 2 cores. Less cores assigned to individual threads less to 
less work that thread has to do. This example ignores the number of 
actual cores available to use for OVS userspace.


On the flip side, we do not wish to create too many threads as we fear 
we end up in a case where we have too many threads and not enough active 
cores OVS can use


Which brings me to my last point. The prime schema is arbitrary (or good 
enough for now). We really just want more threads and there is no reason 
why prime schema is better than the next (at least not without knowing 
how many threads we can get away with adding before kernel overhead 
hurts performance). I think you had mentioned it several times but I 
think we need to do testing to figure out exactly how many threads we 
can add before kernel overhead hurts performance. I speculate the prime 
schema is on the low end and I think that we can add more threads 
without hurting performance than the prime schema will allow. We can 
look at how the upcalls/s rate changes based on the number of handlers 
and the number of active cores. The prime schema is an easy solution but 
it is a little driving blindly without knowing more stats.


On the other hand, stats might be different from one system to another. 
What might be good on my test system doesn't necessarily translate to 
another system. LMK what you think about this, if this is worth the effort


The prime schema is sufficient as is. I just think that we can improve 
it a little bit. But we dont have to go down that road if need be








Adding as many
threads are there are cores could have a performance hit when the number
of active cores (which all threads have to share) is very low. For this
reason we avoid creating as many threads as there are cores and instead
meet somewhere in the middle.

The formula used to calculate the number of handler threads to create
is as follows:

handlers_n = min(next_prime(active_cores+1), total_cores)

Where next_prime is a function that returns the next numeric prime
number that is strictly greater than active_cores

Assume default behavior when total_cores <= 2, that is do not create
additional threads when we have less than 2 total cores on the system

Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
Signed-off-by: Michael Santana 
---
  lib/dpif-netlink.c | 86 --
  lib/ovs-thread.c   | 16 +
  lib/ovs-thread.h   |  1 +
  3 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 06e1e8ca0..e948a829b 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -31,6 +31,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "bitmap.h"

  #include "dpif-netlink-rtnl.h"
@@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler *handler)
  }
  #endif
  
+/*

+ * Returns 1 if num is a prime number,
+ * Otherwise return 0
+ */
+static uint32_t


This should just return bool.

ACK, thanks



+is_prime(uint32_t num)
+{
+if ((num < 2) || ((num % 2) == 0)) {


There is no need for so many parenthesis.

And it might make sense to just write 3 if blocks
here to make the code simpler, i.e. check for
< 2, == 2 and % 2.

ACK, thanks



+return num == 2;
+}
+
+uint32_t i;
+uint32_t limit = sqrt((float) num);
+for (i = 3; i <= limit; i += 2) {


for (uint64_t i = 3; i * i <= num; i += 2) {

There is no real need for sqrt.  I don't think we should be concerned
about 'i * i' performance here.

ACK, I think I got carried way with optimization



+if 

Re: [ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation

2022-07-04 Thread Ilya Maximets
Hi, Michael.  Thanks for the new version!

On 6/6/22 20:59, Michael Santana wrote:
> Additional threads are required to service upcalls when we have CPU
> isolation (in per-cpu dispatch mode). The reason additional threads
> are required is because it creates a more fair distribution.

I think, the description above lacks the information on why we need
to create additional threads, and why we choose the number to be a
prime number.  Yes, you said that it creates a more fair distribution,
but we probably need to describe that more with an example.

> With more
> threads we decrease the load of each thread as more threads would
> decrease the number of cores each threads is assigned.

While that is true, it's not obvious why increasing the number of
threads beyond the number of available cores is good for us.
The key factor is more fair load distribution among threads, so all
cores are utilized, but that is not highlighted.

> Adding as many
> threads are there are cores could have a performance hit when the number
> of active cores (which all threads have to share) is very low. For this
> reason we avoid creating as many threads as there are cores and instead
> meet somewhere in the middle.
> 
> The formula used to calculate the number of handler threads to create
> is as follows:
> 
> handlers_n = min(next_prime(active_cores+1), total_cores)
> 
> Where next_prime is a function that returns the next numeric prime
> number that is strictly greater than active_cores
> 
> Assume default behavior when total_cores <= 2, that is do not create
> additional threads when we have less than 2 total cores on the system
> 
> Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
> Signed-off-by: Michael Santana 
> ---
>  lib/dpif-netlink.c | 86 --
>  lib/ovs-thread.c   | 16 +
>  lib/ovs-thread.h   |  1 +
>  3 files changed, 101 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index 06e1e8ca0..e948a829b 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "bitmap.h"
>  #include "dpif-netlink-rtnl.h"
> @@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler 
> *handler)
>  }
>  #endif
>  
> +/*
> + * Returns 1 if num is a prime number,
> + * Otherwise return 0
> + */
> +static uint32_t

This should just return bool.

> +is_prime(uint32_t num)
> +{
> +if ((num < 2) || ((num % 2) == 0)) {

There is no need for so many parenthesis.

And it might make sense to just write 3 if blocks
here to make the code simpler, i.e. check for
< 2, == 2 and % 2.

> +return num == 2;
> +}
> +
> +uint32_t i;
> +uint32_t limit = sqrt((float) num);
> +for (i = 3; i <= limit; i += 2) {

for (uint64_t i = 3; i * i <= num; i += 2) {

There is no real need for sqrt.  I don't think we should be concerned
about 'i * i' performance here.

> +if (num % i == 0) {
> +return 0;
> +}
> +}
> +
> +return 1;
> +}
> +
> +/*
> + * Returns num if num is a prime number. Otherwise returns the next
> + * prime greater than num. Search is limited by UINT32_MAX.

2 spaces between sentences, please.

> + *
> + * Returns 0 if no prime has been found between num and UINT32_MAX
> + */
> +static uint32_t
> +next_prime(uint32_t num)
> +{
> +if (num < 2) {
> +return 2;
> +}
> +if (num == 2) {
> +return 3;

This contradicts the description of the function.  So, this check
should probably be done in dpif_netlink_calculate_handlers_num.

> +}
> +if (num % 2 == 0) {
> +num++;
> +}
> +
> +uint32_t i;
> +for (i = num; i < UINT32_MAX; i += 2) {

We may just start this loop from 'num' and increment by 1 instead
of 2.  is_prime will return false for even numbers right away, so
there is no real need for +=2 optimization here.  And we can also
remove 2 out of 3 'if's at the beginning of this function (except
the incorrect one).

> +if (is_prime(i)) {
> +return i;
> +}
> +}
> +
> +return 0;
> +}
> +
> +/*
> + * Calcuales and returns the number of handler threads needed based

s/Calcuales/Calculates/

> + * the following formula:
> + *
> + * handlers_n = min(next_prime(active_cores+1), total_cores)
> + *
> + * Where next_prime is a function that returns the next numeric prime
> + * number that is strictly greater than active_cores
> + */
> +static uint32_t
> +dpif_netlink_calculate_handlers_num(void)

Maybe dpif_netlink_calculate_n_handlers() ?

> +{
> +uint32_t next_prime_num;
> +uint32_t n_handlers = count_cpu_cores();
> +uint32_t total_cores = count_total_cores();

Reverse x-mass tree, please.

> +
> +/*
> + * If we have isolated cores, add additional handler threads to
> + * service inactive cores in the unlikely event that traffic goes
> + * through inactive cores

These core are not 

Re: [ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation

2022-06-21 Thread Michael Santana
On Fri, Jun 17, 2022 at 12:10 PM Mike Pattrick  wrote:
>
> On Mon, Jun 6, 2022 at 3:00 PM Michael Santana  wrote:
> >
> > Additional threads are required to service upcalls when we have CPU
> > isolation (in per-cpu dispatch mode). The reason additional threads
> > are required is because it creates a more fair distribution. With more
> > threads we decrease the load of each thread as more threads would
> > decrease the number of cores each threads is assigned. Adding as many
> > threads are there are cores could have a performance hit when the number
> > of active cores (which all threads have to share) is very low. For this
> > reason we avoid creating as many threads as there are cores and instead
> > meet somewhere in the middle.
> >
> > The formula used to calculate the number of handler threads to create
> > is as follows:
> >
> > handlers_n = min(next_prime(active_cores+1), total_cores)
>
> Why are primes superior to just any odd number in between active and
> total_cores?
They aren't. Both strategies are valid and sufficient. The primes do
provide a more or less even distribution. We dont want too many
threads and not enough additional threads

I had proposed the following but it again it is not better or worse
than the prime approach. It mainly just scales linearly

handlers_n = min(active_cores/4, (total_cores-active_cores)/4)

>
> I may just be missing something; but if there were 8 active cores and
> 12 total cores, why would 11 handlers be better than 9?
>
> Cheers,
> M
>
> >
> > Where next_prime is a function that returns the next numeric prime
> > number that is strictly greater than active_cores
> >
> > Assume default behavior when total_cores <= 2, that is do not create
> > additional threads when we have less than 2 total cores on the system
> >
> > Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
> > Signed-off-by: Michael Santana 
> > ---
> >  lib/dpif-netlink.c | 86 --
> >  lib/ovs-thread.c   | 16 +
> >  lib/ovs-thread.h   |  1 +
> >  3 files changed, 101 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> > index 06e1e8ca0..e948a829b 100644
> > --- a/lib/dpif-netlink.c
> > +++ b/lib/dpif-netlink.c
> > @@ -31,6 +31,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include "bitmap.h"
> >  #include "dpif-netlink-rtnl.h"
> > @@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler 
> > *handler)
> >  }
> >  #endif
> >
> > +/*
> > + * Returns 1 if num is a prime number,
> > + * Otherwise return 0
> > + */
> > +static uint32_t
> > +is_prime(uint32_t num)
> > +{
> > +if ((num < 2) || ((num % 2) == 0)) {
> > +return num == 2;
> > +}
> > +
> > +uint32_t i;
> > +uint32_t limit = sqrt((float) num);
> > +for (i = 3; i <= limit; i += 2) {
> > +if (num % i == 0) {
> > +return 0;
> > +}
> > +}
> > +
> > +return 1;
> > +}
> > +
> > +/*
> > + * Returns num if num is a prime number. Otherwise returns the next
> > + * prime greater than num. Search is limited by UINT32_MAX.
> > + *
> > + * Returns 0 if no prime has been found between num and UINT32_MAX
> > + */
> > +static uint32_t
> > +next_prime(uint32_t num)
> > +{
> > +if (num < 2) {
> > +return 2;
> > +}
> > +if (num == 2) {
> > +return 3;
> > +}
> > +if (num % 2 == 0) {
> > +num++;
> > +}
> > +
> > +uint32_t i;
> > +for (i = num; i < UINT32_MAX; i += 2) {
> > +if (is_prime(i)) {
> > +return i;
> > +}
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +/*
> > + * Calcuales and returns the number of handler threads needed based
> > + * the following formula:
> > + *
> > + * handlers_n = min(next_prime(active_cores+1), total_cores)
> > + *
> > + * Where next_prime is a function that returns the next numeric prime
> > + * number that is strictly greater than active_cores
> > + */
> > +static uint32_t
> > +dpif_netlink_calculate_handlers_num(void)
> > +{
> > +uint32_t next_prime_num;
> > +uint32_t n_handlers = count_cpu_cores();
> > +uint32_t total_cores = count_total_cores();
> > +
> > +/*
> > + * If we have isolated cores, add additional handler threads to
> > + * service inactive cores in the unlikely event that traffic goes
> > + * through inactive cores
> > + */
> > +if (n_handlers < total_cores && total_cores > 2) {
> > +next_prime_num = next_prime(n_handlers + 1);
> > +n_handlers = next_prime_num >= total_cores ?
> > +total_cores : next_prime_num;
> > +}
> > +
> > +return n_handlers;
> > +}
> > +
> >  static int
> >  dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *dpif)
> >  OVS_REQ_WRLOCK(dpif->upcall_lock)
> > @@ -2515,7 +2597,7 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct 
> > dpif_netlink *dpif)
> >  

Re: [ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation

2022-06-17 Thread Mike Pattrick
On Mon, Jun 6, 2022 at 3:00 PM Michael Santana  wrote:
>
> Additional threads are required to service upcalls when we have CPU
> isolation (in per-cpu dispatch mode). The reason additional threads
> are required is because it creates a more fair distribution. With more
> threads we decrease the load of each thread as more threads would
> decrease the number of cores each threads is assigned. Adding as many
> threads are there are cores could have a performance hit when the number
> of active cores (which all threads have to share) is very low. For this
> reason we avoid creating as many threads as there are cores and instead
> meet somewhere in the middle.
>
> The formula used to calculate the number of handler threads to create
> is as follows:
>
> handlers_n = min(next_prime(active_cores+1), total_cores)

Why are primes superior to just any odd number in between active and
total_cores?

I may just be missing something; but if there were 8 active cores and
12 total cores, why would 11 handlers be better than 9?

Cheers,
M

>
> Where next_prime is a function that returns the next numeric prime
> number that is strictly greater than active_cores
>
> Assume default behavior when total_cores <= 2, that is do not create
> additional threads when we have less than 2 total cores on the system
>
> Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
> Signed-off-by: Michael Santana 
> ---
>  lib/dpif-netlink.c | 86 --
>  lib/ovs-thread.c   | 16 +
>  lib/ovs-thread.h   |  1 +
>  3 files changed, 101 insertions(+), 2 deletions(-)
>
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index 06e1e8ca0..e948a829b 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "bitmap.h"
>  #include "dpif-netlink-rtnl.h"
> @@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler 
> *handler)
>  }
>  #endif
>
> +/*
> + * Returns 1 if num is a prime number,
> + * Otherwise return 0
> + */
> +static uint32_t
> +is_prime(uint32_t num)
> +{
> +if ((num < 2) || ((num % 2) == 0)) {
> +return num == 2;
> +}
> +
> +uint32_t i;
> +uint32_t limit = sqrt((float) num);
> +for (i = 3; i <= limit; i += 2) {
> +if (num % i == 0) {
> +return 0;
> +}
> +}
> +
> +return 1;
> +}
> +
> +/*
> + * Returns num if num is a prime number. Otherwise returns the next
> + * prime greater than num. Search is limited by UINT32_MAX.
> + *
> + * Returns 0 if no prime has been found between num and UINT32_MAX
> + */
> +static uint32_t
> +next_prime(uint32_t num)
> +{
> +if (num < 2) {
> +return 2;
> +}
> +if (num == 2) {
> +return 3;
> +}
> +if (num % 2 == 0) {
> +num++;
> +}
> +
> +uint32_t i;
> +for (i = num; i < UINT32_MAX; i += 2) {
> +if (is_prime(i)) {
> +return i;
> +}
> +}
> +
> +return 0;
> +}
> +
> +/*
> + * Calcuales and returns the number of handler threads needed based
> + * the following formula:
> + *
> + * handlers_n = min(next_prime(active_cores+1), total_cores)
> + *
> + * Where next_prime is a function that returns the next numeric prime
> + * number that is strictly greater than active_cores
> + */
> +static uint32_t
> +dpif_netlink_calculate_handlers_num(void)
> +{
> +uint32_t next_prime_num;
> +uint32_t n_handlers = count_cpu_cores();
> +uint32_t total_cores = count_total_cores();
> +
> +/*
> + * If we have isolated cores, add additional handler threads to
> + * service inactive cores in the unlikely event that traffic goes
> + * through inactive cores
> + */
> +if (n_handlers < total_cores && total_cores > 2) {
> +next_prime_num = next_prime(n_handlers + 1);
> +n_handlers = next_prime_num >= total_cores ?
> +total_cores : next_prime_num;
> +}
> +
> +return n_handlers;
> +}
> +
>  static int
>  dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *dpif)
>  OVS_REQ_WRLOCK(dpif->upcall_lock)
> @@ -2515,7 +2597,7 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct 
> dpif_netlink *dpif)
>  uint32_t n_handlers;
>  uint32_t *upcall_pids;
>
> -n_handlers = count_cpu_cores();
> +n_handlers = dpif_netlink_calculate_handlers_num();
>  if (dpif->n_handlers != n_handlers) {
>  VLOG_DBG("Dispatch mode(per-cpu): initializing %d handlers",
> n_handlers);
> @@ -2755,7 +2837,7 @@ dpif_netlink_number_handlers_required(struct dpif 
> *dpif_, uint32_t *n_handlers)
>  struct dpif_netlink *dpif = dpif_netlink_cast(dpif_);
>
>  if (dpif_netlink_upcall_per_cpu(dpif)) {
> -*n_handlers = count_cpu_cores();
> +*n_handlers = dpif_netlink_calculate_handlers_num();
>  return true;
>  }
>
> diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c

[ovs-dev] [PATCH v5 1/2] handlers: Create additional handler threads when using CPU isolation

2022-06-06 Thread Michael Santana
Additional threads are required to service upcalls when we have CPU
isolation (in per-cpu dispatch mode). The reason additional threads
are required is because it creates a more fair distribution. With more
threads we decrease the load of each thread as more threads would
decrease the number of cores each threads is assigned. Adding as many
threads are there are cores could have a performance hit when the number
of active cores (which all threads have to share) is very low. For this
reason we avoid creating as many threads as there are cores and instead
meet somewhere in the middle.

The formula used to calculate the number of handler threads to create
is as follows:

handlers_n = min(next_prime(active_cores+1), total_cores)

Where next_prime is a function that returns the next numeric prime
number that is strictly greater than active_cores

Assume default behavior when total_cores <= 2, that is do not create
additional threads when we have less than 2 total cores on the system

Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
Signed-off-by: Michael Santana 
---
 lib/dpif-netlink.c | 86 --
 lib/ovs-thread.c   | 16 +
 lib/ovs-thread.h   |  1 +
 3 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 06e1e8ca0..e948a829b 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bitmap.h"
 #include "dpif-netlink-rtnl.h"
@@ -2506,6 +2507,87 @@ dpif_netlink_handler_uninit(struct dpif_handler *handler)
 }
 #endif
 
+/*
+ * Returns 1 if num is a prime number,
+ * Otherwise return 0
+ */
+static uint32_t
+is_prime(uint32_t num)
+{
+if ((num < 2) || ((num % 2) == 0)) {
+return num == 2;
+}
+
+uint32_t i;
+uint32_t limit = sqrt((float) num);
+for (i = 3; i <= limit; i += 2) {
+if (num % i == 0) {
+return 0;
+}
+}
+
+return 1;
+}
+
+/*
+ * Returns num if num is a prime number. Otherwise returns the next
+ * prime greater than num. Search is limited by UINT32_MAX.
+ *
+ * Returns 0 if no prime has been found between num and UINT32_MAX
+ */
+static uint32_t
+next_prime(uint32_t num)
+{
+if (num < 2) {
+return 2;
+}
+if (num == 2) {
+return 3;
+}
+if (num % 2 == 0) {
+num++;
+}
+
+uint32_t i;
+for (i = num; i < UINT32_MAX; i += 2) {
+if (is_prime(i)) {
+return i;
+}
+}
+
+return 0;
+}
+
+/*
+ * Calcuales and returns the number of handler threads needed based
+ * the following formula:
+ *
+ * handlers_n = min(next_prime(active_cores+1), total_cores)
+ *
+ * Where next_prime is a function that returns the next numeric prime
+ * number that is strictly greater than active_cores
+ */
+static uint32_t
+dpif_netlink_calculate_handlers_num(void)
+{
+uint32_t next_prime_num;
+uint32_t n_handlers = count_cpu_cores();
+uint32_t total_cores = count_total_cores();
+
+/*
+ * If we have isolated cores, add additional handler threads to
+ * service inactive cores in the unlikely event that traffic goes
+ * through inactive cores
+ */
+if (n_handlers < total_cores && total_cores > 2) {
+next_prime_num = next_prime(n_handlers + 1);
+n_handlers = next_prime_num >= total_cores ?
+total_cores : next_prime_num;
+}
+
+return n_handlers;
+}
+
 static int
 dpif_netlink_refresh_handlers_cpu_dispatch(struct dpif_netlink *dpif)
 OVS_REQ_WRLOCK(dpif->upcall_lock)
@@ -2515,7 +2597,7 @@ dpif_netlink_refresh_handlers_cpu_dispatch(struct 
dpif_netlink *dpif)
 uint32_t n_handlers;
 uint32_t *upcall_pids;
 
-n_handlers = count_cpu_cores();
+n_handlers = dpif_netlink_calculate_handlers_num();
 if (dpif->n_handlers != n_handlers) {
 VLOG_DBG("Dispatch mode(per-cpu): initializing %d handlers",
n_handlers);
@@ -2755,7 +2837,7 @@ dpif_netlink_number_handlers_required(struct dpif *dpif_, 
uint32_t *n_handlers)
 struct dpif_netlink *dpif = dpif_netlink_cast(dpif_);
 
 if (dpif_netlink_upcall_per_cpu(dpif)) {
-*n_handlers = count_cpu_cores();
+*n_handlers = dpif_netlink_calculate_handlers_num();
 return true;
 }
 
diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
index 805cba622..2172b3d3f 100644
--- a/lib/ovs-thread.c
+++ b/lib/ovs-thread.c
@@ -663,6 +663,22 @@ count_cpu_cores(void)
 return n_cores > 0 ? n_cores : 0;
 }
 
+/* Returns the total number of cores on the system, or 0 if the
+ * number cannot be determined. */
+int
+count_total_cores(void) {
+long int n_cores;
+
+#ifndef _WIN32
+n_cores = sysconf(_SC_NPROCESSORS_CONF);
+#else
+n_cores = 0;
+errno = ENOTSUP;
+#endif
+
+return n_cores > 0 ? n_cores : 0;
+}
+
 /* Returns 'true' if current thread is PMD thread. */
 bool