On 7/10/17, 12:41 AM, "Ilya Maximets" <i.maxim...@samsung.com> wrote:

    On 07.07.2017 21:09, Darrell Ball wrote:
    > 
    > 
    > On 7/6/17, 11:11 PM, "Ilya Maximets" <i.maxim...@samsung.com> wrote:
    > 
    >     On 07.07.2017 08:08, Darrell Ball wrote:
    >     > 
    >     > 
    >     > On 5/30/17, 7:12 AM, "Ilya Maximets" <i.maxim...@samsung.com> wrote:
    >     > 
    >     >     Reconfiguration of HW NICs may lead to packet drops.
    >     >     In current model all physical ports will be reconfigured each
    >     >     time number of PMD threads changed. Since we not stopping
    >     >     threads on pmd-cpu-mask changes, this patch will help to further
    >     >     decrease port's downtime by setting the maximum possible number
    >     >     of wanted tx queues to avoid unnecessary reconfigurations.
    >     >     
    >     >     Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
    >     >     ---
    >     >      lib/dpif-netdev.c | 26 +++++++++++++++++++++-----
    >     >      1 file changed, 21 insertions(+), 5 deletions(-)
    >     >     
    >     >     diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    >     >     index 596d133..79db770 100644
    >     >     --- a/lib/dpif-netdev.c
    >     >     +++ b/lib/dpif-netdev.c
    >     >     @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp)
    >     >      {
    >     >          struct dp_netdev_pmd_thread *pmd;
    >     >          struct dp_netdev_port *port;
    >     >     -    int wanted_txqs;
    >     >     +    int needed_txqs, wanted_txqs;
    >     >      
    >     >          dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
    >     >      
    >     >     @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev 
*dp)
    >     >           * on the system and the user configuration. */
    >     >          reconfigure_pmd_threads(dp);
    >     >      
    >     >     -    wanted_txqs = cmap_count(&dp->poll_threads);
    >     >     +    /* We need 1 Tx queue for each thread to avoid locking, 
but we will try
    >     >     +     * to allocate the maximum possible value to minimize the 
number of port
    >     >     +     * reconfigurations. */
    >     >     +    needed_txqs = cmap_count(&dp->poll_threads);
    >     >     +    /* (n_cores + 1) is the maximum that we might need to have.
    >     >     +     * Additional queue is for non-PMD threads. */
    >     >     +    wanted_txqs = ovs_numa_get_n_cores();
    >     >     +    ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
    >     >     +    wanted_txqs++;
    >     > 
    >     > I don’t think PMD mask changes are common, so this patch is trying 
to optimize a 
    >     > rare disruptive event that can/will be scheduled by the 
administrator.
    >     > 
    >     > Based on the actual number of queues supported and the number of 
cores present,
    >     > this optimization may or may not work. It is unpredictable whether 
there will be benefit
    >     > in a particular case from the user POV.
    >     > If I were the administrator, I would try to error on the 
conservative side anyways if I could
    >     > not predict the result.
    >     > 
    >     > Did I miss something ?
    >     
    >     In NFV environment if you want to add one more VM to your hosts you 
will have to
    >     choose between:
    >     
    >           * not creating a new PMD thread
    >                   -> performance degradation of networking for other 
working VMs
    > 
    >     
    >           * adding of the new PMD thread
    >                   -> desruption of the networking for the whole host for 
the time
    >                      of OVS reconfiguration.
    >     
    >     This patch removes the cost of the second option.
    > 
    > In general, adding a PMD thread per VM may not always (or even usually) 
make sense.
    
    Not per VM. Lets assume that all existing threads are already overloaded.

That would be less common; it is not the usual case.
    
    > There are use cases for sure, but using a PMD core per VM is often viewed 
as dpdk using too much
    > cpu resources and limiting the adoption of dpdk.
    > 
    > Furthermore, for dpdk gateways it is irrelevant.
    
    Disagree with that statement.

dpdk gateways should not usually require changing pmd-cpu-mask when a VM is 
added to hypervisor
somewhere. Sometimes you need to deploy more resources for sure, but this 
should be much less
frequent than adding or removing VMs.
The point is the user should be rarely changing the pmd-cpu-mask and this can 
be done during
a maintenance window when needed.


    >     
    >     I don't understand what you mean saying 'unpredictable benefit'. The 
benefit is clear
    >     and this optimization will work, except for HW NICs with very low 
number of HW queues.
    > 
    > HW nic interfaces carry aggregated traffic for multiple VMs etc, so these 
cases are most important.
    
    And that is the point why this patch implemented. This patch tries to
    minimize the packet drops on the "shared" between VMs HW NICs. It
    allows to add new threads without breaking networking for others not
    reconfiguring device that handles traffic for other VMs at this moment.

I understand what this patch wants to do and I understand what the last designs 
wanted to
do as well.

    
    > It is the ratio of the number of cores to hw queues that matter. If 
queues are less than
    > cores, then the benefit may not occur.
    
    That's right. But are you saying that we should not apply optimization if
    it covers only 50% of cases? (There is bigger percentage, actually, because
    modern NICs supports pretty much tx queues)

Nobody knows what the number is, but I know that the number of cpus is 
increasing
and making assumptions about the relative numbers of cores vs queues can only 
work by luck.
It is inherently unpredictable and unpredictable for a user 
initiated/controlled change is ugly.
This is even worse when the user initiated change is rare because the user does 
not remember
the full context of the command.

    
    > This patch makes assumptions about the relative number of cores vs the 
number of queues.
    > These two parameters should be treated as independent – this part really 
bothers me.
    
    It doesn't make any assumptions, it just optimizes one of 2 possible 
situations.
    If the number of queues will be less, the behaviour will be exactly same as 
in
    current master.

We are repeating the same thing.

    
    > I think we could solve this in a coordinated generic fashion with a user 
management command
    > that would also give feedback to the user as to the possible success and 
maybe provide options.
    > I wonder if we will eventually go in that direction, in the future, 
anyways ?
    
    I don't think so. We don't need to have user management command because the
    only sane values for the number of queues right now is the number of threads
    or the maximum number of available queues in hardware. We're using the
    first one only because HW is not able to say how many queues it able to
    successfully configure.

The fact that hardware cannot always report the correct number of queues is an 
issue
that will get solved over time and should be part of what a user command reports
(eg) “unknown queue number, therefore it is unknown whether a pmd-cpu-mask 
change
will be hitless”, issue the following command ‘X’ when you would like to make 
the change”).

A wrapper user command(s) would provide visibility and predictability to the 
user.
i.e. we can do better.


    This patch allows us to use the second value in all cases where possible.
    
    > 
    >     But situation will never be worse than without this patch anyway.
    >     
    >     About PMD mask changes:
    >     Even one change in a week can be disruptive in high-loaded NFV 
environment.
    >     Also, this changes can be performed automatically by monitoring tools 
while
    >     VMs migrations or rebalancing the load between PMD threads to free 
the CPU
    >     cores for other applications/VMs.
    >     
    >     > 
    >     >          /* The number of pmd threads might have changed, or a port 
can be new:
    >     >           * adjust the txqs. */
    >     >     @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev 
*dp)
    >     >      
    >     >          /* Check for all the ports that need reconfiguration.  We 
cache this in
    >     >           * 'port->reconfigure', because 
netdev_is_reconf_required() can change at
    >     >     -     * any time. */
    >     >     +     * any time.
    >     >     +     * Also mark for reconfiguration all ports which will 
likely change their
    >     >     +     * 'dynamic_txqs' parameter. It's required to stop using 
them before
    >     >     +     * changing this setting and it's simpler to mark ports 
here and allow
    >     >     +     * 'pmd_remove_stale_ports' to remove them from threads. 
There will be
    >     >     +     * no actual reconfiguration in 'port_reconfigure' because 
it's
    >     >     +     * unnecessary.  */
    >     >          HMAP_FOR_EACH (port, node, &dp->ports) {
    >     >     -        if (netdev_is_reconf_required(port->netdev)) {
    >     >     +        if (netdev_is_reconf_required(port->netdev)
    >     >     +            || (port->dynamic_txqs
    >     >     +                !=  netdev_n_txq(port->netdev) < needed_txqs)) 
{
    >     >                  port->need_reconfigure = true;
    >     >              }
    >     >          }
    >     >     @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp)
    >     >                  seq_change(dp->port_seq);
    >     >                  port_destroy(port);
    >     >              } else {
    >     >     -            port->dynamic_txqs = netdev_n_txq(port->netdev) < 
wanted_txqs;
    >     >     +            port->dynamic_txqs = netdev_n_txq(port->netdev) < 
needed_txqs;
    >     >              }
    >     >          }
    >     >      
    >     >     -- 
    >     >     2.7.4
    >     >     
    >     >     
    >     > 
    >     
    > 
    

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to