Hi Kevin,

Thanks a lot for addressing this very important limitation of OVS-DPDK 
multi-core scalability in cloud contexts such as OpenStack. This is highly 
appreciated!

We have not started testing this, so for the time being just some high-level 
comments:

I would really like to see a new command ovs-appctl 
dpif-netdev/pmd-rxq-rebalance or similar to manually trigger redistribution for 
testing without having to reconfigure things.

Any redistribution of rx queues across PMDs under high load is a critical thing 
as the service interruption during the PMD reload can easily cause rx queue 
overruns and packet drop. Independently from this patch that optimizes the load 
balance of PMDs after redistribution, we should try to improve the actual 
reconfiguration to become hitless (i.e. not requiring a reload of PMDs).

In OpenStack context we really need an automatic re-balancing of rx queues over 
PMDs when the load balance of PMDs becomes so skewed that OVS unnecessarily 
drops packets due to overload of some PMDs while others are not fully loaded. 
Without such a function this patch does really not solve the scalability issue. 
Starting a new VM forces a re-balance, but that cannot take the load on the 
just added ports into account, so it will typically be sub-optimal. Also OVS 
would have no means to adapt to shifting load over time in a stable 
configuration.

If re-balance were hitless (see above) it could be triggered at any time. As 
long as it is not, it should probably only be triggered if a) there is overload 
on some PMD and b) a rebalancing would improve the situation such that there is 
zero (or less) loss. Due to a) the additional short service interruption should 
not matter.

A final note: when experimenting with a similar in-house prototype for rx queue 
rebalancing we have experienced strange effects with vhostuser tx queues 
locking up as a result of frequent reconfiguration. These might have been 
caused by internal vulnerabilities of the tested complex DPDK application in 
the guest, but I would suggest we pay very good attention to thread safety of 
shared DPDK and virtio data structures in host and guest when testing and 
reviewing this.

BR, Jan


> -----Original Message-----
> From: ovs-dev-boun...@openvswitch.org 
> [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Kevin Traynor
> Sent: Friday, 05 May, 2017 18:34
> To: d...@openvswitch.org
> Subject: [ovs-dev] [RFC PATCH 0/6] Change dpdk rxq scheduling to incorporate 
> rxq processing cycles.
> 
> Rxqs are scheduled to be handled across available pmds in round robin
> order with no weight or priority.
> 
> It can happen that some very busy queues are handled by one pmd which
> does not have enough cycles to prevent packets being dropped on them.
> While at the same time another pmd which handles queues with no traffic
> on them, is essentially idling.
> 
> Rxq scheduling happens as a result of a number of events and when it does,
> the same unweighted round robin approach is applied each time.
> 
> This patchset proposes to augment the round robin nature of rxq scheduling
> by counting the processing cycles used by the rxqs during their operation
> and incorporate it into the rxq scheduling.
> 
> Before distributing in a round robin manner, the rxqs will be sorted in
> order of the processing cycles they have been consuming. Assuming multiple
> pmds, this ensures that the measured rxqs using most processing cycles will
> be distributed to different cores.
> 
> To try out:
> This patchset requires the updated pmd counting patch applied as a
> prerequisite. https://patchwork.ozlabs.org/patch/729970/
> 
> Alternatively the series with dependencies can be cloned from here:
> https://github.com/kevintraynor/ovs-rxq.git
> 
> Simple way to test is add some dpdk ports, add multiple pmds, vary traffic
> rates and rxqs on ports and trigger reschedules e.g. by changing rxqs or
> the pmd-cpu-mask.
> 
> Check rxq distribution with ovs-appctl dpif-netdev/pmd-rxq-show and see
> if it matches expected.
> 
> todo:
> -possibly add a dedicated reschedule trigger command
> -use consistent type names
> -update docs
> -more testing, especially for dual numa
> 
> thanks,
> Kevin.
> 
> Kevin Traynor (6):
>   dpif-netdev: Add rxq processing cycle counters.
>   dpif-netdev: Update rxq processing cycles from
>     cycles_count_intermediate.
>   dpif-netdev: Change polled_queue to use dp_netdev_rxq.
>   dpif-netdev: Make dpcls optimization interval more generic.
>   dpif-netdev: Count the rxq processing cycles for an rxq.
>   dpif-netdev: Change rxq_scheduling to use rxq processing cycles.
> 
>  lib/dpif-netdev.c | 163 
> ++++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 133 insertions(+), 30 deletions(-)
> 
> --
> 1.8.3.1
> 
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to