On 30.05.2017 16:53, Kevin Traynor wrote: > On 05/29/2017 04:37 PM, Ilya Maximets wrote: >> On 29.05.2017 18:00, Kevin Traynor wrote: >>> On 05/29/2017 02:31 PM, Ilya Maximets wrote: >>>> On 29.05.2017 16:26, Kevin Traynor wrote: >>>>> On 05/29/2017 01:22 PM, Ilya Maximets wrote: >>>>>> On 26.05.2017 20:14, Kevin Traynor wrote: >>>>>>> On 05/26/2017 03:55 PM, Ilya Maximets wrote: >>>>>>>> On 10.03.2017 07:27, Daniele Di Proietto wrote: >>>>>>>>> 2017-02-21 6:49 GMT-08:00 Ilya Maximets <i.maxim...@samsung.com>: >>>>>>>>>> Reconfiguration of HW NICs may lead to packet drops. >>>>>>>>>> In current model all physical ports will be reconfigured each >>>>>>>>>> time number of PMD threads changed. Since we not stopping >>>>>>>>>> threads on pmd-cpu-mask changes, this patch will help to further >>>>>>>>>> decrease port's downtime by setting the maximum possible number >>>>>>>>>> of wanted tx queues to avoid unnecessary reconfigurations. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com> >>>>>>>>> >>>>>>>>> I haven't thought this through a lot, but the last big series we >>>>>>>>> pushed >>>>>>>>> on master went exactly in the opposite direction, i.e. created one txq >>>>>>>>> for each thread in the datapath. >>>>>>>>> >>>>>>>>> I thought this was a good idea because: >>>>>>>>> >>>>>>>>> * On some systems with hyperthreading we can have a lot of cpus (we >>>>>>>>> received >>>>>>>>> reports of systems with 72 cores). If you want to use only a >>>>>>>>> couple of cores >>>>>>>>> you're still forced to have a lot of unused txqs, which prevent you >>>>>>>>> from having >>>>>>>>> lockless tx. >>>>>>>>> * We thought that reconfiguring the number of pmds would not be a >>>>>>>>> frequent >>>>>>>>> operation. >>>>>>>>> >>>>>>>>> Why do you want to reconfigure the threads that often? Is it to be >>>>>>>>> able to support more throughput quickly? >>>>>>>> >>>>>>>> Yes. >>>>>>>> >>>>>>>>> In this case I think we shouldn't use the number of cpus, >>>>>>>>> but something else coming from the user, so that the administrator can >>>>>>>>> balance how >>>>>>>>> quickly pmd threads can be reconfigured vs how many txqs should be on >>>>>>>>> the system. >>>>>>>>> I'm not sure how to make this user friendly though. >>>>>>>>> >>>>>>>>> What do you think? >>>>>>>> >>>>>>>> Right now I'm thinking about combined solution which will respect >>>>>>>> both issues (too big number of TXQs and frequent device >>>>>>>> reconfiguration). >>>>>>>> I think, we can implement additional function to get port's >>>>>>>> limitations. >>>>>>>> For now we can request the maximum number of TX queues from netdev and >>>>>>>> use it while number of cores less then number of queues. >>>>>>>> Something like this: >>>>>>>> >>>>>>>> lib/netdev-dpdk.c: >>>>>>>> uint32_t netdev_dpdk_get_max_txqs(struct netdev *netdev) >>>>>>>> { >>>>>>>> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>>>>>>> struct rte_eth_dev_info dev_info; >>>>>>>> >>>>>>>> ovs_mutex_lock(&dev->mutex); >>>>>>>> rte_eth_dev_info_get(dev->port_id, &dev_info); >>>>>>>> ovs_mutex_unlock(&dev->mutex); >>>>>>>> >>>>>>>> return dev_info.max_tx_queues; >>>>>>>> } >>>>>>>> >>>>>>>> lib/dpif-netdev.c:reconfigure_datapath(): >>>>>>>> >>>>>>>> <-----> >>>>>>>> max_tx_queues = netdev_get_max_txqs(port->netdev); >>>>>>>> number_of_threads = cmap_count(&dp->poll_threads); >>>>>>>> wanted_txqs = MAX(max_tx_queues, number_of_threads); >>>>>>>> <-----> >>>>>>>> >>>>>>>> In this implementation there will be no additional locking if number of >>>>>>>> threads less or equal to maximum possible number of tx queues in HW. >>>>>>>> >>>>>>>> What do you think? Ian? Darrell? >>>>>>>> >>>>>>> >>>>>>> I'm not sure about using rte_eth_dev_info_get() as IIRC there was >>>>>>> problems previously with it reporting a number, but then when you went >>>>>>> to configure them they were not all available depending on mode. That >>>>>>> was why the trial and error queue configure was put in. >>>>>>> >>>>>>> What about replacing 'max_tx_queues' above with a number like 16 that is >>>>>>> likely to be supported by the ports and unlikely be exceeded by >>>>>>> number_of_threads? >>>>>>> >>>>>>> Kevin. >>>>>> >>>>>> Hi Kevin. Thank you for reminding me of this issue. >>>>>> >>>>>> But I think that magic numbers is not a good solution. >>>>>> >>>>>> One issue in my first implementation is that desired number of queues is >>>>>> not actually the same as required number. >>>>>> >>>>>> How about something like this: >>>>>> <-----------------------------------------------------------------> >>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>>>>> index 97f72d3..1a18e4f 100644 >>>>>> --- a/lib/dpif-netdev.c >>>>>> +++ b/lib/dpif-netdev.c >>>>>> @@ -3448,7 +3448,7 @@ reconfigure_datapath(struct dp_netdev *dp) >>>>>> { >>>>>> struct dp_netdev_pmd_thread *pmd; >>>>>> struct dp_netdev_port *port; >>>>>> - int wanted_txqs; >>>>>> + int needed_txqs, wanted_txqs; >>>>>> >>>>>> dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); >>>>>> >>>>>> @@ -3456,7 +3456,15 @@ reconfigure_datapath(struct dp_netdev *dp) >>>>>> * on the system and the user configuration. */ >>>>>> reconfigure_pmd_threads(dp); >>>>>> >>>>>> - wanted_txqs = cmap_count(&dp->poll_threads); >>>>>> + /* We need 1 Tx queue for each thread to avoid locking, but we will >>>>>> try >>>>>> + * to allocate the maximum possible value to minimize the number of >>>>>> port >>>>>> + * reconfigurations. */ >>>>>> + needed_txqs = cmap_count(&dp->poll_threads); >>>>>> + /* (n_cores + 1) is the maximum that we might need to have. >>>>>> + * Additional queue is for non-PMD threads. */ >>>>>> + wanted_txqs = ovs_numa_get_n_cores(); >>>>>> + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); >>>>>> + wanted_txqs++; >>>>>> >>>>>> /* The number of pmd threads might have changed, or a port can be >>>>>> new: >>>>>> * adjust the txqs. */ >>>>>> @@ -3469,9 +3477,13 @@ reconfigure_datapath(struct dp_netdev *dp) >>>>>> >>>>>> /* Check for all the ports that need reconfiguration. We cache >>>>>> this in >>>>>> * 'port->reconfigure', because netdev_is_reconf_required() can >>>>>> change at >>>>>> - * any time. */ >>>>>> + * any time. Also mark for reconfiguration all ports which will >>>>>> likely >>>>>> + * change their 'dynamic_txqs' parameter. It's required to stop >>>>>> using >>>>>> + * them before changing this setting. */ >>>>>> HMAP_FOR_EACH (port, node, &dp->ports) { >>>>>> - if (netdev_is_reconf_required(port->netdev)) { >>>>>> + if (netdev_is_reconf_required(port->netdev) >>>>>> + || (port->dynamic_txqs >>>>>> + != netdev_n_txq(port->netdev) < needed_txqs)) { >>>>> >>>>> I'm not quite sure why there needs to be a reconfigure here. It will >>>>> again try with the same wanted_txqs. Is it not enough to just update the >>>>> dynamic_txqs flag as is done later ? >>>> >>>> It's not enough because reconfiguration will not be requested if we're >>>> trying to set up same number of queues second time. >>>> >>> >>> My question is - why request a reconfigure at all based on needed_txqs? >>> >>> We already configured requesting wanted_txqs (=ovs_numa_get_n_cores()) >>> and now we would configure again requesting the same num wanted_txqs, >>> regardless of the new needed_txqs value. >>> >>> Could we just update the dynamic_txqs flag (as you have below) to >>> reflect any new difference between configured txqs and needed_txqs. >>> Maybe I missed some part? >>> >>> Kevin. >> >> Ok. I got it. The reason is in fact that we need to remove these ports >> from PMD threads. It's simpler to mark ports here and allow >> 'pmd_remove_stale_ports' to remove them from threads. >> >> After that where will be no actual reconfiguration because 'port_reconfigure' >> will just check that nothing changed (branch /* Reconfiguration is >> unnecessary */). >> >> If we want to avoid calling 'port_reconfigure' we'll have to remove >> ports from threads using additional loop and handle them separately >> in reconfiguration loop. I think, my solution just simpler and not >> really slower because there will be no actual reconfiguration. >> >> What do you think? >> > > Thanks, I understand now. Seems ok in that case, maybe you can add an > even more gratuitous comment?
Ok. Please, check v2 for updated comment. > > I'm thinking that rxqs don't need to be removed from the pmd, but maybe > it's not a good idea to be asynchronous and anyway that could be an > optimization for a different time. > >>> >>>> See lib/netdev-dpdk.c:netdev_dpdk_set_tx_multiq() for details: >>>> >>>> if (dev->requested_n_txq == n_txq) { >>>> >>>> goto out; >>>> >>>> } >>>> >>>> >>>> >>>>>> port->need_reconfigure = true; >>>>>> } >>>>>> } >>>>>> @@ -3505,7 +3517,7 @@ reconfigure_datapath(struct dp_netdev *dp) >>>>>> seq_change(dp->port_seq); >>>>>> port_destroy(port); >>>>>> } else { >>>>>> - port->dynamic_txqs = netdev_n_txq(port->netdev) < >>>>>> wanted_txqs; >>>>>> + port->dynamic_txqs = netdev_n_txq(port->netdev) < >>>>>> needed_txqs; >>>>>> } >>>>>> } >>>>>> >>>>>> <-----------------------------------------------------------------> >>>>>> >>>>>> Best regards, Ilya Maximets. >>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Daniele >>>>>>>>> >>>>>>>>>> --- >>>>>>>>>> lib/dpif-netdev.c | 6 +++++- >>>>>>>>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>>>>>>>>> index 6e575ab..e2b4f39 100644 >>>>>>>>>> --- a/lib/dpif-netdev.c >>>>>>>>>> +++ b/lib/dpif-netdev.c >>>>>>>>>> @@ -3324,7 +3324,11 @@ reconfigure_datapath(struct dp_netdev *dp) >>>>>>>>>> * on the system and the user configuration. */ >>>>>>>>>> reconfigure_pmd_threads(dp); >>>>>>>>>> >>>>>>>>>> - wanted_txqs = cmap_count(&dp->poll_threads); >>>>>>>>>> + /* We need 1 Tx queue for each possible cpu core. */ >>>>>>>>>> + wanted_txqs = ovs_numa_get_n_cores(); >>>>>>>>>> + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); >>>>>>>>>> + /* And 1 Tx queue for non-PMD threads. */ >>>>>>>>>> + wanted_txqs++; >>>>>>>>>> >>>>>>>>>> /* The number of pmd threads might have changed, or a port can >>>>>>>>>> be new: >>>>>>>>>> * adjust the txqs. */ >>>>>>>>>> -- >>>>>>>>>> 2.7.4 >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> dev mailing list >>>>>>>> d...@openvswitch.org >>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> >>> > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev