On 05/29/2017 04:37 PM, Ilya Maximets wrote:
> On 29.05.2017 18:00, Kevin Traynor wrote:
>> On 05/29/2017 02:31 PM, Ilya Maximets wrote:
>>> On 29.05.2017 16:26, Kevin Traynor wrote:
>>>> On 05/29/2017 01:22 PM, Ilya Maximets wrote:
>>>>> On 26.05.2017 20:14, Kevin Traynor wrote:
>>>>>> On 05/26/2017 03:55 PM, Ilya Maximets wrote:
>>>>>>> On 10.03.2017 07:27, Daniele Di Proietto wrote:
>>>>>>>> 2017-02-21 6:49 GMT-08:00 Ilya Maximets <i.maxim...@samsung.com>:
>>>>>>>>> Reconfiguration of HW NICs may lead to packet drops.
>>>>>>>>> In current model all physical ports will be reconfigured each
>>>>>>>>> time number of PMD threads changed. Since we not stopping
>>>>>>>>> threads on pmd-cpu-mask changes, this patch will help to further
>>>>>>>>> decrease port's downtime by setting the maximum possible number
>>>>>>>>> of wanted tx queues to avoid unnecessary reconfigurations.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
>>>>>>>>
>>>>>>>> I haven't thought this through a lot, but the last big series we pushed
>>>>>>>> on master went exactly in the opposite direction, i.e. created one txq
>>>>>>>> for each thread in the datapath.
>>>>>>>>
>>>>>>>> I thought this was a good idea because:
>>>>>>>>
>>>>>>>> * On some systems with hyperthreading we can have a lot of cpus (we 
>>>>>>>> received
>>>>>>>>    reports of systems with 72 cores). If you want to use only a couple 
>>>>>>>> of cores
>>>>>>>>    you're still forced to have a lot of unused txqs, which prevent you
>>>>>>>> from having
>>>>>>>>    lockless tx.
>>>>>>>> * We thought that reconfiguring the number of pmds would not be a 
>>>>>>>> frequent
>>>>>>>>    operation.
>>>>>>>>
>>>>>>>> Why do you want to reconfigure the threads that often?  Is it to be
>>>>>>>> able to support more throughput quickly?
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> In this case I think we shouldn't use the number of cpus,
>>>>>>>> but something else coming from the user, so that the administrator can
>>>>>>>> balance how
>>>>>>>> quickly pmd threads can be reconfigured vs how many txqs should be on
>>>>>>>> the system.
>>>>>>>> I'm not sure how to make this user friendly though.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>
>>>>>>> Right now I'm thinking about combined solution which will respect
>>>>>>> both issues (too big number of TXQs and frequent device 
>>>>>>> reconfiguration).
>>>>>>> I think, we can implement additional function to get port's limitations.
>>>>>>> For now we can request the maximum number of TX queues from netdev and
>>>>>>> use it while number of cores less then number of queues.
>>>>>>> Something like this:
>>>>>>>         
>>>>>>> lib/netdev-dpdk.c:
>>>>>>> uint32_t netdev_dpdk_get_max_txqs(struct netdev *netdev)
>>>>>>> {
>>>>>>>     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>>>>>>     struct rte_eth_dev_info dev_info;
>>>>>>>
>>>>>>>     ovs_mutex_lock(&dev->mutex);
>>>>>>>     rte_eth_dev_info_get(dev->port_id, &dev_info);
>>>>>>>     ovs_mutex_unlock(&dev->mutex);
>>>>>>>
>>>>>>>     return dev_info.max_tx_queues;
>>>>>>> }
>>>>>>>
>>>>>>> lib/dpif-netdev.c:reconfigure_datapath():
>>>>>>>
>>>>>>>     <----->
>>>>>>>     max_tx_queues = netdev_get_max_txqs(port->netdev);
>>>>>>>     number_of_threads = cmap_count(&dp->poll_threads);
>>>>>>>     wanted_txqs = MAX(max_tx_queues, number_of_threads);
>>>>>>>     <----->
>>>>>>>
>>>>>>> In this implementation there will be no additional locking if number of
>>>>>>> threads less or equal to maximum possible number of tx queues in HW.
>>>>>>>
>>>>>>> What do you think? Ian? Darrell?
>>>>>>>
>>>>>>
>>>>>> I'm not sure about using rte_eth_dev_info_get() as IIRC there was
>>>>>> problems previously with it reporting a number, but then when you went
>>>>>> to configure them they were not all available depending on mode. That
>>>>>> was why the trial and error queue configure was put in.
>>>>>>
>>>>>> What about replacing 'max_tx_queues' above with a number like 16 that is
>>>>>> likely to be supported by the ports and unlikely be exceeded by
>>>>>> number_of_threads?
>>>>>>
>>>>>> Kevin.
>>>>>
>>>>> Hi Kevin. Thank you for reminding me of this issue.
>>>>>
>>>>> But I think that magic numbers is not a good solution.
>>>>>
>>>>> One issue in my first implementation is that desired number of queues is
>>>>> not actually the same as required number.
>>>>>
>>>>> How about something like this:
>>>>> <----------------------------------------------------------------->
>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>> index 97f72d3..1a18e4f 100644
>>>>> --- a/lib/dpif-netdev.c
>>>>> +++ b/lib/dpif-netdev.c
>>>>> @@ -3448,7 +3448,7 @@ reconfigure_datapath(struct dp_netdev *dp)
>>>>>  {
>>>>>      struct dp_netdev_pmd_thread *pmd;
>>>>>      struct dp_netdev_port *port;
>>>>> -    int wanted_txqs;
>>>>> +    int needed_txqs, wanted_txqs;
>>>>>  
>>>>>      dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
>>>>>  
>>>>> @@ -3456,7 +3456,15 @@ reconfigure_datapath(struct dp_netdev *dp)
>>>>>       * on the system and the user configuration. */
>>>>>      reconfigure_pmd_threads(dp);
>>>>>  
>>>>> -    wanted_txqs = cmap_count(&dp->poll_threads);
>>>>> +    /* We need 1 Tx queue for each thread to avoid locking, but we will 
>>>>> try
>>>>> +     * to allocate the maximum possible value to minimize the number of 
>>>>> port
>>>>> +     * reconfigurations. */
>>>>> +    needed_txqs = cmap_count(&dp->poll_threads);
>>>>> +    /* (n_cores + 1) is the maximum that we might need to have.
>>>>> +     * Additional queue is for non-PMD threads. */
>>>>> +    wanted_txqs = ovs_numa_get_n_cores();
>>>>> +    ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
>>>>> +    wanted_txqs++;
>>>>>  
>>>>>      /* The number of pmd threads might have changed, or a port can be 
>>>>> new:
>>>>>       * adjust the txqs. */
>>>>> @@ -3469,9 +3477,13 @@ reconfigure_datapath(struct dp_netdev *dp)
>>>>>  
>>>>>      /* Check for all the ports that need reconfiguration.  We cache this 
>>>>> in
>>>>>       * 'port->reconfigure', because netdev_is_reconf_required() can 
>>>>> change at
>>>>> -     * any time. */
>>>>> +     * any time.  Also mark for reconfiguration all ports which will 
>>>>> likely
>>>>> +     * change their 'dynamic_txqs' parameter. It's required to stop using
>>>>> +     * them before changing this setting. */
>>>>>      HMAP_FOR_EACH (port, node, &dp->ports) {
>>>>> -        if (netdev_is_reconf_required(port->netdev)) {
>>>>> +        if (netdev_is_reconf_required(port->netdev)
>>>>> +            || (port->dynamic_txqs
>>>>> +                !=  netdev_n_txq(port->netdev) < needed_txqs)) {
>>>>
>>>> I'm not quite sure why there needs to be a reconfigure here. It will
>>>> again try with the same wanted_txqs. Is it not enough to just update the
>>>> dynamic_txqs flag as is done later ?
>>>
>>> It's not enough because reconfiguration will not be requested if we're
>>> trying to set up same number of queues second time.
>>>
>>
>> My question is - why request a reconfigure at all based on needed_txqs?
>>
>> We already configured requesting wanted_txqs (=ovs_numa_get_n_cores())
>> and now we would configure again requesting the same num wanted_txqs,
>> regardless of the new needed_txqs value.
>>
>> Could we just update the dynamic_txqs flag (as you have below) to
>> reflect any new difference between configured txqs and needed_txqs.
>> Maybe I missed some part?
>>
>> Kevin.
> 
> Ok. I got it. The reason is in fact that we need to remove these ports
> from PMD threads. It's simpler to mark ports here and allow
> 'pmd_remove_stale_ports' to remove them from threads.
> 
> After that where will be no actual reconfiguration because 'port_reconfigure'
> will just check that nothing changed (branch /* Reconfiguration is 
> unnecessary */).
> 
> If we want to avoid calling 'port_reconfigure' we'll have to remove
> ports from threads using additional loop and handle them separately
> in reconfiguration loop. I think, my solution just simpler and not
> really slower because there will be no actual reconfiguration.
> 
> What do you think?
> 

Thanks, I understand now. Seems ok in that case, maybe you can add an
even more gratuitous comment?

I'm thinking that rxqs don't need to be removed from the pmd, but maybe
it's not a good idea to be asynchronous and anyway that could be an
optimization for a different time.

>>
>>> See lib/netdev-dpdk.c:netdev_dpdk_set_tx_multiq() for details:
>>>
>>>      if (dev->requested_n_txq == n_txq) {                                   
>>>     
>>>          goto out;                                                          
>>>     
>>>      }
>>>
>>>
>>>  
>>>>>              port->need_reconfigure = true;
>>>>>          }
>>>>>      }
>>>>> @@ -3505,7 +3517,7 @@ reconfigure_datapath(struct dp_netdev *dp)
>>>>>              seq_change(dp->port_seq);
>>>>>              port_destroy(port);
>>>>>          } else {
>>>>> -            port->dynamic_txqs = netdev_n_txq(port->netdev) < 
>>>>> wanted_txqs;
>>>>> +            port->dynamic_txqs = netdev_n_txq(port->netdev) < 
>>>>> needed_txqs;
>>>>>          }
>>>>>      }
>>>>>  
>>>>> <----------------------------------------------------------------->
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Daniele
>>>>>>>>
>>>>>>>>> ---
>>>>>>>>>  lib/dpif-netdev.c | 6 +++++-
>>>>>>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>>>> index 6e575ab..e2b4f39 100644
>>>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>>>> @@ -3324,7 +3324,11 @@ reconfigure_datapath(struct dp_netdev *dp)
>>>>>>>>>       * on the system and the user configuration. */
>>>>>>>>>      reconfigure_pmd_threads(dp);
>>>>>>>>>
>>>>>>>>> -    wanted_txqs = cmap_count(&dp->poll_threads);
>>>>>>>>> +    /* We need 1 Tx queue for each possible cpu core. */
>>>>>>>>> +    wanted_txqs = ovs_numa_get_n_cores();
>>>>>>>>> +    ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
>>>>>>>>> +    /* And 1 Tx queue for non-PMD threads. */
>>>>>>>>> +    wanted_txqs++;
>>>>>>>>>
>>>>>>>>>      /* The number of pmd threads might have changed, or a port can 
>>>>>>>>> be new:
>>>>>>>>>       * adjust the txqs. */
>>>>>>>>> --
>>>>>>>>> 2.7.4
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> d...@openvswitch.org
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>>
>>

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to