Re: [ovs-dev] [PATCH] netdev-dpdk: Delay vhost mempool creation on multi-NUMA.
On 17/05/2022 10:12, David Marchand wrote: On Fri, May 13, 2022 at 5:35 PM Kevin Traynor wrote: A mempool is currently created for a vhost port when it is added. The NUMA info of the vhost port is not known until a device is added to the port, so on multi-NUMA systems the initial NUMA node for the mempool is a best guess based on vswitchd affinity. When a device is added to the vhost port, the NUMA info can be checked and if the guess was incorrect a mempool on the correct NUMA node created. The current scheme can have the effect of creating a mempool on a NUMA node that will not be needed and at least for a certain time period requires memory. It is also difficult for a user trying to provision memory on different NUMA nodes, if they are not sure which NUMA node the initial mempool for a vhost port will be on. This patch delays the creation of the mempool for a vhost port on multi-NUMA systems until the vhost NUMA info is known. I prefer having a single behavior for mono and multi numa (=> no question about which behavior resulted in mempool creation/attribution). Though I don't have a strong opinion against having this difference in behavior. Originally, I had the same behaviour for both single and multi-numa. I changed to only multi-numa as it was the multi-NUMA issue about switching mempools that I mostly wanted to remove. I don't see any downsides for having the same behaviour on single NUMA. It might be considered a bit of an improvement since the socket-limit defaults have been removed as it won't create the mempool until it is needed and if the device is never added, it won't create the mempool. So, wasn't the main focus of the patch, but yes, it seems like it can be an improvement for single NUMA too. I will change to do this (unless I find some issue during testing). Signed-off-by: Kevin Traynor Otherwise, this new behavior for multi numa and the patch lgtm. I tested with a dual numa system, ovs running with pmd threads on numa 1, one bridge with vhost-user ports serving one vm in numa0 and one vm in numa1. Running ovs-appctl netdev-dpdk/get-mempool-info | grep ovs, - before patch, no vm started mempool @0x17f703180 ^^ We can notice that a numa0 mempool was created even if PMD threads are on numa1 and no port uses this mempool. - before patch, once vm in numa1 is started, mempool @0x17f703180 mempool @0x11ffe01e40 - before patch, once vm in numa0 is started too, mempool @0x17f703180 mempool @0x11ffe01e40 - after patch, no vm started - after patch, once vm in numa1 is started, mempool @0x11ffe01e40 ^^ Only one mempool in numa1 - after patch, once vm in numa0 is started too, mempool @0x11ffe01e40 mempool @0x17f51dcc0 This is nice test to have multiple vm's using both NUMAs and make sure there is no confusion between them. Thanks for reviewing and testing. Reviewed-by: David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-dpdk: Delay vhost mempool creation on multi-NUMA.
On Fri, May 13, 2022 at 5:35 PM Kevin Traynor wrote: > > A mempool is currently created for a vhost port when it is added. > > The NUMA info of the vhost port is not known until a device is added to > the port, so on multi-NUMA systems the initial NUMA node for the mempool > is a best guess based on vswitchd affinity. > > When a device is added to the vhost port, the NUMA info can be checked > and if the guess was incorrect a mempool on the correct NUMA node > created. > > The current scheme can have the effect of creating a mempool on a NUMA > node that will not be needed and at least for a certain time period > requires memory. > > It is also difficult for a user trying to provision memory on different > NUMA nodes, if they are not sure which NUMA node the initial mempool > for a vhost port will be on. > > This patch delays the creation of the mempool for a vhost port on > multi-NUMA systems until the vhost NUMA info is known. I prefer having a single behavior for mono and multi numa (=> no question about which behavior resulted in mempool creation/attribution). Though I don't have a strong opinion against having this difference in behavior. > > Signed-off-by: Kevin Traynor Otherwise, this new behavior for multi numa and the patch lgtm. I tested with a dual numa system, ovs running with pmd threads on numa 1, one bridge with vhost-user ports serving one vm in numa0 and one vm in numa1. Running ovs-appctl netdev-dpdk/get-mempool-info | grep ovs, - before patch, no vm started mempool @0x17f703180 ^^ We can notice that a numa0 mempool was created even if PMD threads are on numa1 and no port uses this mempool. - before patch, once vm in numa1 is started, mempool @0x17f703180 mempool @0x11ffe01e40 - before patch, once vm in numa0 is started too, mempool @0x17f703180 mempool @0x11ffe01e40 - after patch, no vm started - after patch, once vm in numa1 is started, mempool @0x11ffe01e40 ^^ Only one mempool in numa1 - after patch, once vm in numa0 is started too, mempool @0x11ffe01e40 mempool @0x17f51dcc0 Reviewed-by: David Marchand -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] netdev-dpdk: Delay vhost mempool creation on multi-NUMA.
A mempool is currently created for a vhost port when it is added. The NUMA info of the vhost port is not known until a device is added to the port, so on multi-NUMA systems the initial NUMA node for the mempool is a best guess based on vswitchd affinity. When a device is added to the vhost port, the NUMA info can be checked and if the guess was incorrect a mempool on the correct NUMA node created. The current scheme can have the effect of creating a mempool on a NUMA node that will not be needed and at least for a certain time period requires memory. It is also difficult for a user trying to provision memory on different NUMA nodes, if they are not sure which NUMA node the initial mempool for a vhost port will be on. This patch delays the creation of the mempool for a vhost port on multi-NUMA systems until the vhost NUMA info is known. Signed-off-by: Kevin Traynor --- lib/netdev-dpdk.c | 30 +- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index f9535bfb4..95065060c 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -3927,5 +3927,6 @@ new_device(int vid) if (dev->requested_n_txq < qp_num || dev->requested_n_rxq < qp_num -|| dev->requested_socket_id != newnode) { +|| dev->requested_socket_id != newnode +|| dev->dpdk_mp == NULL) { dev->requested_socket_id = newnode; dev->requested_n_rxq = qp_num; @@ -4975,7 +4976,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { +int vid, n_numas; + dev->up.n_txq = dev->requested_n_txq; dev->up.n_rxq = dev->requested_n_rxq; -int err; /* Always keep RX queue 0 enabled for implementations that won't @@ -4995,12 +4997,22 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) netdev_dpdk_remap_txqs(dev); -err = netdev_dpdk_mempool_configure(dev); -if (!err) { -/* A new mempool was created or re-used. */ -netdev_change_seq_changed(&dev->up); -} else if (err != EEXIST) { -return err; +vid = netdev_dpdk_get_vid(dev); +n_numas = ovs_numa_get_n_numas(); + +if (n_numas == 1 +|| (n_numas > 1 && vid >= 0)) { + +int err; + +err = netdev_dpdk_mempool_configure(dev); +if (!err) { +/* A new mempool was created or re-used. */ +netdev_change_seq_changed(&dev->up); +} else if (err != EEXIST) { +return err; +} } -if (netdev_dpdk_get_vid(dev) >= 0) { + +if (vid >= 0) { if (dev->vhost_reconfigured == false) { dev->vhost_reconfigured = true; -- 2.34.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev