Hi Kevin, Thanks a lot for addressing this very important limitation of OVS-DPDK multi-core scalability in cloud contexts such as OpenStack. This is highly appreciated!
We have not started testing this, so for the time being just some high-level comments: I would really like to see a new command ovs-appctl dpif-netdev/pmd-rxq-rebalance or similar to manually trigger redistribution for testing without having to reconfigure things. Any redistribution of rx queues across PMDs under high load is a critical thing as the service interruption during the PMD reload can easily cause rx queue overruns and packet drop. Independently from this patch that optimizes the load balance of PMDs after redistribution, we should try to improve the actual reconfiguration to become hitless (i.e. not requiring a reload of PMDs). In OpenStack context we really need an automatic re-balancing of rx queues over PMDs when the load balance of PMDs becomes so skewed that OVS unnecessarily drops packets due to overload of some PMDs while others are not fully loaded. Without such a function this patch does really not solve the scalability issue. Starting a new VM forces a re-balance, but that cannot take the load on the just added ports into account, so it will typically be sub-optimal. Also OVS would have no means to adapt to shifting load over time in a stable configuration. If re-balance were hitless (see above) it could be triggered at any time. As long as it is not, it should probably only be triggered if a) there is overload on some PMD and b) a rebalancing would improve the situation such that there is zero (or less) loss. Due to a) the additional short service interruption should not matter. A final note: when experimenting with a similar in-house prototype for rx queue rebalancing we have experienced strange effects with vhostuser tx queues locking up as a result of frequent reconfiguration. These might have been caused by internal vulnerabilities of the tested complex DPDK application in the guest, but I would suggest we pay very good attention to thread safety of shared DPDK and virtio data structures in host and guest when testing and reviewing this. BR, Jan > -----Original Message----- > From: ovs-dev-boun...@openvswitch.org > [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Kevin Traynor > Sent: Friday, 05 May, 2017 18:34 > To: d...@openvswitch.org > Subject: [ovs-dev] [RFC PATCH 0/6] Change dpdk rxq scheduling to incorporate > rxq processing cycles. > > Rxqs are scheduled to be handled across available pmds in round robin > order with no weight or priority. > > It can happen that some very busy queues are handled by one pmd which > does not have enough cycles to prevent packets being dropped on them. > While at the same time another pmd which handles queues with no traffic > on them, is essentially idling. > > Rxq scheduling happens as a result of a number of events and when it does, > the same unweighted round robin approach is applied each time. > > This patchset proposes to augment the round robin nature of rxq scheduling > by counting the processing cycles used by the rxqs during their operation > and incorporate it into the rxq scheduling. > > Before distributing in a round robin manner, the rxqs will be sorted in > order of the processing cycles they have been consuming. Assuming multiple > pmds, this ensures that the measured rxqs using most processing cycles will > be distributed to different cores. > > To try out: > This patchset requires the updated pmd counting patch applied as a > prerequisite. https://patchwork.ozlabs.org/patch/729970/ > > Alternatively the series with dependencies can be cloned from here: > https://github.com/kevintraynor/ovs-rxq.git > > Simple way to test is add some dpdk ports, add multiple pmds, vary traffic > rates and rxqs on ports and trigger reschedules e.g. by changing rxqs or > the pmd-cpu-mask. > > Check rxq distribution with ovs-appctl dpif-netdev/pmd-rxq-show and see > if it matches expected. > > todo: > -possibly add a dedicated reschedule trigger command > -use consistent type names > -update docs > -more testing, especially for dual numa > > thanks, > Kevin. > > Kevin Traynor (6): > dpif-netdev: Add rxq processing cycle counters. > dpif-netdev: Update rxq processing cycles from > cycles_count_intermediate. > dpif-netdev: Change polled_queue to use dp_netdev_rxq. > dpif-netdev: Make dpcls optimization interval more generic. > dpif-netdev: Count the rxq processing cycles for an rxq. > dpif-netdev: Change rxq_scheduling to use rxq processing cycles. > > lib/dpif-netdev.c | 163 > ++++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 133 insertions(+), 30 deletions(-) > > -- > 1.8.3.1 > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev