On Thu, Aug 8, 2019 at 10:57 AM Vishal Deep Ajmera <vishal.deep.ajm...@ericsson.com> wrote: > > Problem: > -------- > In OVS-DPDK, flows with output over a bond interface of type “balance-tcp” > (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into > "HASH" and "RECIRC" datapath actions. After recirculation, the packet is > forwarded to the bond member port based on 8-bits of the datapath hash > value computed through dp_hash. This causes performance degradation in the > following ways: > > 1. The recirculation of the packet implies another lookup of the packet’s > flow key in the exact match cache (EMC) and potentially Megaflow classifier > (DPCLS). This is the biggest cost factor. > > 2. The recirculated packets have a new “RSS” hash and compete with the > original packets for the scarce number of EMC slots. This implies more > EMC misses and potentially EMC thrashing causing costly DPCLS lookups. > > 3. The 256 extra megaflow entries per bond for dp_hash bond selection put > additional load on the revalidation threads. > > Owing to this performance degradation, deployments stick to “balance-slb” > bond mode even though it does not do active-active load balancing for > VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same > source MAC address. > > Proposed optimization: > ---------------------- > This proposal introduces a new load-balancing output action instead of > recirculation. > > Maintain one table per-bond (could just be an array of uint16's) and > program it the same way internal flows are created today for each possible > hash value(256 entries) from ofproto layer. Use this table to load-balance > flows as part of output action processing. > > Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules() > -> bond_may_recirc() and compose_output_action__() generate > “dp_hash(hash_l4(0))” and “recirc(<RecircID>)” actions. In this case the > RecircID identifies the bond. For the recirculated packets the ofproto layer > installs megaflow entries that match on RecircID and masked dp_hash and send > them to the corresponding output port. > > Instead, we will now generate actions as > "hash(l4(0)),lb_output(bond,<bond id>)" > > This combines hash computation (only if needed, else re-use RSS hash) and > inline load-balancing over the bond. This action is used *only* for > balance-tcp > bonds in OVS-DPDK datapath (the OVS kernel datapath remains unchanged). > > Example: > -------- > Current scheme: > --------------- > With 1 IP-UDP flow: > > flow-dump from pmd on cpu core: 2 > recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), > packets:2828969, bytes:181054016, used:0.000s, > actions:hash(hash_l4(0)),recirc(0x1) > > recirc_id(0x1),dp_hash(0x113683bd/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), > packets:2828937, bytes:181051968, used:0.000s, actions:2 > > With 8 IP-UDP flows (with random UDP src port): (all hitting same DPCL): > > flow-dump from pmd on cpu core: 2 > recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), > packets:2674009, bytes:171136576, used:0.000s, > actions:hash(hash_l4(0)),recirc(0x1) > > recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), > packets:377395, bytes:24153280, used:0.000s, actions:2 > recirc_id(0x1),dp_hash(0xb236c260/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), > packets:333486, bytes:21343104, used:0.000s, actions:1 > recirc_id(0x1),dp_hash(0x7d89eb18/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), > packets:348461, bytes:22301504, used:0.000s, actions:1 > recirc_id(0x1),dp_hash(0xa78d75df/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), > packets:633353, bytes:40534592, used:0.000s, actions:2 > recirc_id(0x1),dp_hash(0xb58d846f/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), > packets:319901, bytes:20473664, used:0.001s, actions:2 > recirc_id(0x1),dp_hash(0x24534406/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), > packets:334985, bytes:21439040, used:0.001s, actions:1 > recirc_id(0x1),dp_hash(0x3cf32550/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), > packets:326404, bytes:20889856, used:0.001s, actions:1 > > New scheme: > ----------- > We can do with a single flow entry (for any number of new flows): > > in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), > packets:2674009, bytes:171136576, used:0.000s, > actions:hash(l4(0)),lb_output(bond,1) > > A new CLI has been added to dump the per-PMD bond cache as given below. > > “sudo ovs-appctl dpif-netdev/pmd-bond-show” > > root@ubuntu-190:performance_scripts # sudo ovs-appctl > dpif-netdev/pmd-bond-show > pmd thread numa_id 0 core_id 4: > Bond cache: > bond-id 1 : > bucket 0 - slave 2 > bucket 1 - slave 1 > bucket 2 - slave 2 > bucket 3 - slave 1 > > Performance improvement: > ------------------------ > With a prototype of the proposed idea, the following perf improvement is seen > with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement > is even more enhanced (due to reduced number of flows). > > 1 VM: > ***** > +--------------------------------------+ > | mpps | > +--------------------------------------+ > | Flows master with-opt. %delta | > +--------------------------------------+ > | 1 4.53 5.89 29.96 > | 10 4.16 5.89 41.51 > | 400 3.55 5.55 56.22 > | 1k 3.44 5.45 58.30 > | 10k 2.50 4.63 85.34 > | 100k 2.29 4.27 86.15 > | 500k 2.25 4.27 89.23 > +--------------------------------------+ > mpps: million packets per second. > > Signed-off-by: Manohar Krishnappa Chidambaraswamy <man...@gmail.com> > Co-authored-by: Manohar Krishnappa Chidambaraswamy <man...@gmail.com> > Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajm...@ericsson.com> > > CC: Jan Scheurich <jan.scheur...@ericsson.com> > CC: Venkatesan Pradeep <venkatesan.prad...@ericsson.com> > CC: Nitin Katiyar <nitin.kati...@ericsson.com>
Hi Vishal, I quickly tested your patch on two servers with 2 ixgbe cards each linked via a juniper switch with LACP. With testpmd using a single core the switching rate raised from ~2.0 Mpps to ~2.4+ Mpps, so I read at least a +20% gain. Please add an example command on how to enable it in the commit message, e.g. ovs-vsctl set port bond0 other_config:opt-bond-tcp=true Thanks, -- Matteo Croce per aspera ad upstream _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev