Re: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on non-local numa node.

2017-05-04 Thread O Mahony, Billy
Hi Ian,

> -Original Message-
> From: Stokes, Ian
> Sent: Thursday, May 4, 2017 2:19 PM
> To: O Mahony, Billy ; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on non-
> local numa node.
> 
> > Hi Ian,
> >
> > Sorry for delay responding. Some comment below.
> >
> > > -Original Message-
> > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > > boun...@openvswitch.org] On Behalf Of O Mahony, Billy
> > > Sent: Tuesday, April 25, 2017 11:15 AM
> > > To: Stokes, Ian ; d...@openvswitch.org
> > > Subject: Re: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds
> > > on
> > > non- local numa node.
> > >
> > > Thanks Ian,
> > >
> > > I'll incorporate these suggestions in a new patch version.
> > >
> > > Cheers,
> > > Billy.
> > >
> > > > -Original Message-
> > > > From: Stokes, Ian
> > > > Sent: Monday, April 24, 2017 5:33 PM
> > > > To: O Mahony, Billy ;
> > > > d...@openvswitch.org
> > > > Subject: RE: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to
> > > > pmds on
> > > > non- local numa node.
> > > >
> > > > > Previously if there is no available (non-isolated) pmd on the
> > > > > numa node for a port then the port is not polled at all. This
> > > > > can result in a non- operational system until such time as nics
> > > > > are physically repositioned. It is preferable to operate with a
> > > > > pmd on
> > the 'wrong'
> > > > > numa node albeit with lower performance. Local pmds are still
> > > > > chosen
> > > > when available.
> > > > >
> > > > > Signed-off-by: Billy O'Mahony 
> > > >
> > > > Thanks for the patch Billy, few comments below.
> > > >
> > > > > ---
> > > > >  Documentation/intro/install/dpdk.rst | 10 ++
> > > > >  lib/dpif-netdev.c| 36
> > > > > 
> > > > >  2 files changed, 42 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/intro/install/dpdk.rst
> > > > > b/Documentation/intro/install/dpdk.rst
> > > > > index b947bd5..0e0b9f0 100644
> > > > > --- a/Documentation/intro/install/dpdk.rst
> > > > > +++ b/Documentation/intro/install/dpdk.rst
> > > > > @@ -450,6 +450,16 @@ affinitized accordingly.
> > > > >  pmd thread on a NUMA node is only created if there is at
> > > > > least one DPDK
> > > > >  interface from that NUMA node added to OVS.
> > > > >
> > > > > +  .. note::
> > > > > +   On NUMA systems PCI devices are also local to a NUMA node.
> > > > > + Rx queues
> > > > > for
> > > > > +   PCI device will assigned to a pmd on it's local NUMA node if
> > > > > + pmd-cpu-
> > > > > mask
> > > > > +   has created a pmd thread on that NUMA node.  If not the
> > > > > + queue will
> > > be
> > > > > +   assigned to a pmd on a remote NUMA node.  This will result
> > > > > + in
> > reduced
> > > > > +   maximum throughput on that device.  In the case such a queue
> > > > > assingment
> > > >
> > > > Typo in assignment.
> > [[BO'M]] Thanks.
> > > >
> > > > > +   is made a warning message will be logged: "There's no available
> > > > > +   (non isolated) pmd thread on numa node N. Queue Q on port P
> > > > > + will be
> > > > > assigned
> > > > > +   to a pmd on numa node M. Expect reduced performance."
> > > > > +
> > > > >  - QEMU vCPU thread Affinity
> > > > >
> > > > >A VM performing simple packet forwarding or running complex
> > > > > packet pipelines diff --git a/lib/dpif-netdev.c
> > > > > b/lib/dpif-netdev.c index a14a2eb..c6570ba 100644
> > > > > --- a/lib/dpif-netdev.c
> > > > > +++ b/lib/dpif-netdev.c
> > > > > @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list
> > > > > *rr, int
> > > > > numa_id)  }
> > > > >
> > > > >  

Re: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on non-local numa node.

2017-05-04 Thread Stokes, Ian
> Hi Ian,
> 
> Sorry for delay responding. Some comment below.
> 
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of O Mahony, Billy
> > Sent: Tuesday, April 25, 2017 11:15 AM
> > To: Stokes, Ian ; d...@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on
> > non- local numa node.
> >
> > Thanks Ian,
> >
> > I'll incorporate these suggestions in a new patch version.
> >
> > Cheers,
> > Billy.
> >
> > > -Original Message-
> > > From: Stokes, Ian
> > > Sent: Monday, April 24, 2017 5:33 PM
> > > To: O Mahony, Billy ; d...@openvswitch.org
> > > Subject: RE: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds
> > > on
> > > non- local numa node.
> > >
> > > > Previously if there is no available (non-isolated) pmd on the numa
> > > > node for a port then the port is not polled at all. This can
> > > > result in a non- operational system until such time as nics are
> > > > physically repositioned. It is preferable to operate with a pmd on
> the 'wrong'
> > > > numa node albeit with lower performance. Local pmds are still
> > > > chosen
> > > when available.
> > > >
> > > > Signed-off-by: Billy O'Mahony 
> > >
> > > Thanks for the patch Billy, few comments below.
> > >
> > > > ---
> > > >  Documentation/intro/install/dpdk.rst | 10 ++
> > > >  lib/dpif-netdev.c| 36
> > > > 
> > > >  2 files changed, 42 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/Documentation/intro/install/dpdk.rst
> > > > b/Documentation/intro/install/dpdk.rst
> > > > index b947bd5..0e0b9f0 100644
> > > > --- a/Documentation/intro/install/dpdk.rst
> > > > +++ b/Documentation/intro/install/dpdk.rst
> > > > @@ -450,6 +450,16 @@ affinitized accordingly.
> > > >  pmd thread on a NUMA node is only created if there is at
> > > > least one DPDK
> > > >  interface from that NUMA node added to OVS.
> > > >
> > > > +  .. note::
> > > > +   On NUMA systems PCI devices are also local to a NUMA node.  Rx
> > > > + queues
> > > > for
> > > > +   PCI device will assigned to a pmd on it's local NUMA node if
> > > > + pmd-cpu-
> > > > mask
> > > > +   has created a pmd thread on that NUMA node.  If not the queue
> > > > + will
> > be
> > > > +   assigned to a pmd on a remote NUMA node.  This will result in
> reduced
> > > > +   maximum throughput on that device.  In the case such a queue
> > > > assingment
> > >
> > > Typo in assignment.
> [[BO'M]] Thanks.
> > >
> > > > +   is made a warning message will be logged: "There's no available
> > > > +   (non isolated) pmd thread on numa node N. Queue Q on port P
> > > > + will be
> > > > assigned
> > > > +   to a pmd on numa node M. Expect reduced performance."
> > > > +
> > > >  - QEMU vCPU thread Affinity
> > > >
> > > >A VM performing simple packet forwarding or running complex
> > > > packet pipelines diff --git a/lib/dpif-netdev.c
> > > > b/lib/dpif-netdev.c index a14a2eb..c6570ba 100644
> > > > --- a/lib/dpif-netdev.c
> > > > +++ b/lib/dpif-netdev.c
> > > > @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list
> > > > *rr, int
> > > > numa_id)  }
> > > >
> > > >  static void
> > > > -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list
> > > > *rr)
> > > > +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list
> *rr,
> > > > +  int *all_numa_ids, unsigned all_numa_ids_sz,
> > > > +  int *num_ids_written)
> > > >  {
> > > >  struct dp_netdev_pmd_thread *pmd;
> > > >  struct rr_numa *numa;
> > > > +unsigned idx = 0;
> > > >
> > > >  hmap_init(&rr->numas);
> > > >
> > > > @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp,
> > > > struct rr_numa_list *rr)
> > > >  numa->n

Re: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on non-local numa node.

2017-05-03 Thread O Mahony, Billy
Hi Ian,

Sorry for delay responding. Some comment below.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of O Mahony, Billy
> Sent: Tuesday, April 25, 2017 11:15 AM
> To: Stokes, Ian ; d...@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on non-
> local numa node.
> 
> Thanks Ian,
> 
> I'll incorporate these suggestions in a new patch version.
> 
> Cheers,
> Billy.
> 
> > -Original Message-
> > From: Stokes, Ian
> > Sent: Monday, April 24, 2017 5:33 PM
> > To: O Mahony, Billy ; d...@openvswitch.org
> > Subject: RE: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on
> > non- local numa node.
> >
> > > Previously if there is no available (non-isolated) pmd on the numa
> > > node for a port then the port is not polled at all. This can result
> > > in a non- operational system until such time as nics are physically
> > > repositioned. It is preferable to operate with a pmd on the 'wrong'
> > > numa node albeit with lower performance. Local pmds are still chosen
> > when available.
> > >
> > > Signed-off-by: Billy O'Mahony 
> >
> > Thanks for the patch Billy, few comments below.
> >
> > > ---
> > >  Documentation/intro/install/dpdk.rst | 10 ++
> > >  lib/dpif-netdev.c| 36
> > > 
> > >  2 files changed, 42 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/intro/install/dpdk.rst
> > > b/Documentation/intro/install/dpdk.rst
> > > index b947bd5..0e0b9f0 100644
> > > --- a/Documentation/intro/install/dpdk.rst
> > > +++ b/Documentation/intro/install/dpdk.rst
> > > @@ -450,6 +450,16 @@ affinitized accordingly.
> > >  pmd thread on a NUMA node is only created if there is at least
> > > one DPDK
> > >  interface from that NUMA node added to OVS.
> > >
> > > +  .. note::
> > > +   On NUMA systems PCI devices are also local to a NUMA node.  Rx
> > > + queues
> > > for
> > > +   PCI device will assigned to a pmd on it's local NUMA node if
> > > + pmd-cpu-
> > > mask
> > > +   has created a pmd thread on that NUMA node.  If not the queue will
> be
> > > +   assigned to a pmd on a remote NUMA node.  This will result in reduced
> > > +   maximum throughput on that device.  In the case such a queue
> > > assingment
> >
> > Typo in assignment.
[[BO'M]] Thanks.
> >
> > > +   is made a warning message will be logged: "There's no available
> > > +   (non isolated) pmd thread on numa node N. Queue Q on port P will
> > > + be
> > > assigned
> > > +   to a pmd on numa node M. Expect reduced performance."
> > > +
> > >  - QEMU vCPU thread Affinity
> > >
> > >A VM performing simple packet forwarding or running complex
> > > packet pipelines diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > index a14a2eb..c6570ba 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr,
> > > int
> > > numa_id)  }
> > >
> > >  static void
> > > -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list
> > > *rr)
> > > +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr,
> > > +  int *all_numa_ids, unsigned all_numa_ids_sz,
> > > +  int *num_ids_written)
> > >  {
> > >  struct dp_netdev_pmd_thread *pmd;
> > >  struct rr_numa *numa;
> > > +unsigned idx = 0;
> > >
> > >  hmap_init(&rr->numas);
> > >
> > > @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp,
> > > struct rr_numa_list *rr)
> > >  numa->n_pmds++;
> > >  numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof
> > > *numa-
> > > >pmds);
> > >  numa->pmds[numa->n_pmds - 1] = pmd;
> > > +
> > > +all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
> > > +idx++;
> > >  }
> > > +*num_ids_written = idx;
> > >  }
> > >
> > >  static struct dp_netdev_pmd_thread * @@ -3202,8 +3209,15 @@
> > > rxq_scheduling(struct dp_netdev *dp, b

Re: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on non-local numa node.

2017-04-25 Thread O Mahony, Billy
Thanks Ian,

I'll incorporate these suggestions in a new patch version. 

Cheers,
Billy. 

> -Original Message-
> From: Stokes, Ian
> Sent: Monday, April 24, 2017 5:33 PM
> To: O Mahony, Billy ; d...@openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on non-
> local numa node.
> 
> > Previously if there is no available (non-isolated) pmd on the numa
> > node for a port then the port is not polled at all. This can result in
> > a non- operational system until such time as nics are physically
> > repositioned. It is preferable to operate with a pmd on the 'wrong'
> > numa node albeit with lower performance. Local pmds are still chosen
> when available.
> >
> > Signed-off-by: Billy O'Mahony 
> 
> Thanks for the patch Billy, few comments below.
> 
> > ---
> >  Documentation/intro/install/dpdk.rst | 10 ++
> >  lib/dpif-netdev.c| 36
> > 
> >  2 files changed, 42 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/intro/install/dpdk.rst
> > b/Documentation/intro/install/dpdk.rst
> > index b947bd5..0e0b9f0 100644
> > --- a/Documentation/intro/install/dpdk.rst
> > +++ b/Documentation/intro/install/dpdk.rst
> > @@ -450,6 +450,16 @@ affinitized accordingly.
> >  pmd thread on a NUMA node is only created if there is at least
> > one DPDK
> >  interface from that NUMA node added to OVS.
> >
> > +  .. note::
> > +   On NUMA systems PCI devices are also local to a NUMA node.  Rx
> > + queues
> > for
> > +   PCI device will assigned to a pmd on it's local NUMA node if
> > + pmd-cpu-
> > mask
> > +   has created a pmd thread on that NUMA node.  If not the queue will be
> > +   assigned to a pmd on a remote NUMA node.  This will result in reduced
> > +   maximum throughput on that device.  In the case such a queue
> > assingment
> 
> Typo in assignment.
> 
> > +   is made a warning message will be logged: "There's no available
> > +   (non isolated) pmd thread on numa node N. Queue Q on port P will
> > + be
> > assigned
> > +   to a pmd on numa node M. Expect reduced performance."
> > +
> >  - QEMU vCPU thread Affinity
> >
> >A VM performing simple packet forwarding or running complex packet
> > pipelines diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > a14a2eb..c6570ba 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr,
> > int
> > numa_id)  }
> >
> >  static void
> > -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
> > +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr,
> > +  int *all_numa_ids, unsigned all_numa_ids_sz,
> > +  int *num_ids_written)
> >  {
> >  struct dp_netdev_pmd_thread *pmd;
> >  struct rr_numa *numa;
> > +unsigned idx = 0;
> >
> >  hmap_init(&rr->numas);
> >
> > @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp,
> > struct rr_numa_list *rr)
> >  numa->n_pmds++;
> >  numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof
> > *numa-
> > >pmds);
> >  numa->pmds[numa->n_pmds - 1] = pmd;
> > +
> > +all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
> > +idx++;
> >  }
> > +*num_ids_written = idx;
> >  }
> >
> >  static struct dp_netdev_pmd_thread *
> > @@ -3202,8 +3209,15 @@ rxq_scheduling(struct dp_netdev *dp, bool
> > pinned)
> > OVS_REQUIRES(dp->port_mutex)  {
> >  struct dp_netdev_port *port;
> >  struct rr_numa_list rr;
> > +int all_numa_ids [64];
> > +int all_numa_ids_sz = sizeof all_numa_ids / sizeof all_numa_ids[0];
> > +unsigned all_numa_ids_idx = 0;
> > +int all_numa_ids_max_idx = 0;
> > +int num_numa_ids = 0;
> >
> > -rr_numa_list_populate(dp, &rr);
> > +rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz,
> > +  &num_numa_ids);
> > +all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz -
> > + 1);
> >
> >  HMAP_FOR_EACH (port, node, &dp->ports) {
> >  struct rr_numa *numa;
> > @@ -3234,10 +3248,24 @@ rxq_scheduling(struct dp_netdev *dp, bool
> > pinned)
> > OVS_REQUIRES(dp->port_mutex)
> > 

Re: [ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on non-local numa node.

2017-04-24 Thread Stokes, Ian
> Previously if there is no available (non-isolated) pmd on the numa node
> for a port then the port is not polled at all. This can result in a non-
> operational system until such time as nics are physically repositioned. It
> is preferable to operate with a pmd on the 'wrong' numa node albeit with
> lower performance. Local pmds are still chosen when available.
> 
> Signed-off-by: Billy O'Mahony 

Thanks for the patch Billy, few comments below.

> ---
>  Documentation/intro/install/dpdk.rst | 10 ++
>  lib/dpif-netdev.c| 36
> 
>  2 files changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/intro/install/dpdk.rst
> b/Documentation/intro/install/dpdk.rst
> index b947bd5..0e0b9f0 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -450,6 +450,16 @@ affinitized accordingly.
>  pmd thread on a NUMA node is only created if there is at least one
> DPDK
>  interface from that NUMA node added to OVS.
> 
> +  .. note::
> +   On NUMA systems PCI devices are also local to a NUMA node.  Rx queues
> for
> +   PCI device will assigned to a pmd on it's local NUMA node if pmd-cpu-
> mask
> +   has created a pmd thread on that NUMA node.  If not the queue will be
> +   assigned to a pmd on a remote NUMA node.  This will result in reduced
> +   maximum throughput on that device.  In the case such a queue
> assingment

Typo in assignment.

> +   is made a warning message will be logged: "There's no available
> +   (non isolated) pmd thread on numa node N. Queue Q on port P will be
> assigned
> +   to a pmd on numa node M. Expect reduced performance."
> +
>  - QEMU vCPU thread Affinity
> 
>A VM performing simple packet forwarding or running complex packet
> pipelines diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> a14a2eb..c6570ba 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr, int
> numa_id)  }
> 
>  static void
> -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
> +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr,
> +  int *all_numa_ids, unsigned all_numa_ids_sz,
> +  int *num_ids_written)
>  {
>  struct dp_netdev_pmd_thread *pmd;
>  struct rr_numa *numa;
> +unsigned idx = 0;
> 
>  hmap_init(&rr->numas);
> 
> @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp, struct
> rr_numa_list *rr)
>  numa->n_pmds++;
>  numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa-
> >pmds);
>  numa->pmds[numa->n_pmds - 1] = pmd;
> +
> +all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
> +idx++;
>  }
> +*num_ids_written = idx;
>  }
> 
>  static struct dp_netdev_pmd_thread *
> @@ -3202,8 +3209,15 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
> OVS_REQUIRES(dp->port_mutex)  {
>  struct dp_netdev_port *port;
>  struct rr_numa_list rr;
> +int all_numa_ids [64];
> +int all_numa_ids_sz = sizeof all_numa_ids / sizeof all_numa_ids[0];
> +unsigned all_numa_ids_idx = 0;
> +int all_numa_ids_max_idx = 0;
> +int num_numa_ids = 0;
> 
> -rr_numa_list_populate(dp, &rr);
> +rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz,
> +  &num_numa_ids);
> +all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz - 1);
> 
>  HMAP_FOR_EACH (port, node, &dp->ports) {
>  struct rr_numa *numa;
> @@ -3234,10 +3248,24 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned)
> OVS_REQUIRES(dp->port_mutex)
>  }
>  } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
>  if (!numa) {
> +if (all_numa_ids_max_idx < 0) {
> +VLOG_ERR("There are no pmd threads. "
> + "Is pmd-cpu-mask set to zero?");

The VLOG_ERR is a bit general and misleading. A user could set a PMD on a numa 
node, add a DPDK device local to that node, isolate the PMD for that DPDK port, 
then add a second DPDK port that is local to the same numa node. This will 
trigger the ERR above.

Strictly speaking there is a PMD thread set on the numa node but it has been 
isolated. I would change the message above to include isolation. I would also 
add the name of the port searching for a PMD to the message so it is clear 
which port could not find a thread. Something like below

if (all_numa_ids_max_idx < 0) {
VLOG_ERR("There are no available pmd threads for %s. "
 "Is pmd-cpu-mask set to zero or isolated?", 
 netdev_get_name(port->netdev));

> +continue;
> +}
>  VLOG_WARN("There's no available (non isolated) pmd
> thread "
>"on numa node %d. Queue %d on port \'%s\'
> will "
> -  

[ovs-dev] [PATCH v2] dpif-netdev: Assign ports to pmds on non-local numa node.

2017-03-30 Thread Billy O'Mahony
From: billyom 

Previously if there is no available (non-isolated) pmd on the numa node
for a port then the port is not polled at all. This can result in a
non-operational system until such time as nics are physically
repositioned. It is preferable to operate with a pmd on the 'wrong' numa
node albeit with lower performance. Local pmds are still chosen when
available.

Signed-off-by: Billy O'Mahony 
---
 Documentation/intro/install/dpdk.rst | 10 ++
 lib/dpif-netdev.c| 36 
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/Documentation/intro/install/dpdk.rst 
b/Documentation/intro/install/dpdk.rst
index b947bd5..0e0b9f0 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -450,6 +450,16 @@ affinitized accordingly.
 pmd thread on a NUMA node is only created if there is at least one DPDK
 interface from that NUMA node added to OVS.
 
+  .. note::
+   On NUMA systems PCI devices are also local to a NUMA node.  Rx queues for
+   PCI device will assigned to a pmd on it's local NUMA node if pmd-cpu-mask
+   has created a pmd thread on that NUMA node.  If not the queue will be
+   assigned to a pmd on a remote NUMA node.  This will result in reduced
+   maximum throughput on that device.  In the case such a queue assingment
+   is made a warning message will be logged: "There's no available
+   (non isolated) pmd thread on numa node N. Queue Q on port P will be assigned
+   to a pmd on numa node M. Expect reduced performance."
+
 - QEMU vCPU thread Affinity
 
   A VM performing simple packet forwarding or running complex packet pipelines
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a14a2eb..c6570ba 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr, int 
numa_id)
 }
 
 static void
-rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
+rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr,
+  int *all_numa_ids, unsigned all_numa_ids_sz,
+  int *num_ids_written)
 {
 struct dp_netdev_pmd_thread *pmd;
 struct rr_numa *numa;
+unsigned idx = 0;
 
 hmap_init(&rr->numas);
 
@@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp, struct 
rr_numa_list *rr)
 numa->n_pmds++;
 numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa->pmds);
 numa->pmds[numa->n_pmds - 1] = pmd;
+
+all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
+idx++;
 }
+*num_ids_written = idx;
 }
 
 static struct dp_netdev_pmd_thread *
@@ -3202,8 +3209,15 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
 {
 struct dp_netdev_port *port;
 struct rr_numa_list rr;
+int all_numa_ids [64];
+int all_numa_ids_sz = sizeof all_numa_ids / sizeof all_numa_ids[0];
+unsigned all_numa_ids_idx = 0;
+int all_numa_ids_max_idx = 0;
+int num_numa_ids = 0;
 
-rr_numa_list_populate(dp, &rr);
+rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz,
+  &num_numa_ids);
+all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz - 1);
 
 HMAP_FOR_EACH (port, node, &dp->ports) {
 struct rr_numa *numa;
@@ -3234,10 +3248,24 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) 
OVS_REQUIRES(dp->port_mutex)
 }
 } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
 if (!numa) {
+if (all_numa_ids_max_idx < 0) {
+VLOG_ERR("There are no pmd threads. "
+ "Is pmd-cpu-mask set to zero?");
+continue;
+}
 VLOG_WARN("There's no available (non isolated) pmd thread "
   "on numa node %d. Queue %d on port \'%s\' will "
-  "not be polled.",
-  numa_id, qid, netdev_get_name(port->netdev));
+  "be assigned to a pmd on numa node %d. Expect "
+  "reduced performance.",
+  numa_id, qid, netdev_get_name(port->netdev),
+  all_numa_ids[all_numa_ids_idx]);
+numa_id = all_numa_ids[all_numa_ids_idx];
+numa = rr_numa_list_lookup(&rr, numa_id);
+q->pmd = rr_numa_get_pmd(numa);
+all_numa_ids_idx++;
+if (all_numa_ids_idx > all_numa_ids_max_idx) {
+all_numa_ids_idx = 0;
+}
 } else {
 q->pmd = rr_numa_get_pmd(numa);
 }
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev