> -----Original Message----- > From: Darrell Ball [mailto:db...@vmware.com] > Sent: Monday, August 14, 2017 7:27 AM > To: Fischetti, Antonio <antonio.fische...@intel.com>; d...@openvswitch.org > Subject: Re: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for > recirc packets > > > > -----Original Message----- > From: <ovs-dev-boun...@openvswitch.org> on behalf of > "antonio.fische...@intel.com" <antonio.fische...@intel.com> > Date: Friday, August 11, 2017 at 8:52 AM > To: "d...@openvswitch.org" <d...@openvswitch.org> > Subject: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for > recirc packets > > When OVS is configured as a firewall, with thousands of active > concurrent connections, the EMC gets quicly saturated and may > come under heavy thrashing for the reason that original and > recirculated packets keep overwriting the existing active EMC > entries due to its limited size (8k). > > > The recirculated packet could have been modified, in which case, maybe we > still want to do the emc lookup/insert ?
[Antonio] IMPO I'd say we should still skip emc anyway, because the purpose is to mitigate thrashing when emc is full. So any recirculated packet should be classified at the dpcls/ofproto layers. I don't know if I'm missing something from your question? We can expect that a recirc pkt that has been modified - similarly to all other recirculated pkts - could result in a miss when emc is full. Later we should do an emc insertion that is likely to overwrite some active entry. And recursively, this new insertion itself could be overwritten - due to the shortage of locations - even before it is hit again. This proposal is to mitigate the thrashing with the criteria of reserving emc usage to original packets only. So a limited resource like emc hopefully could be used more efficiently, especially when there is more than 1 recirculation. I guess that adding an exception for modified recirc pkts could also drop a bit the throughtput as we should add another if statement inside emc_processing. > > > This thrashing causes the EMC to be less efficient than the dcpls > in terms of lookups and insertions. > > This patch allows to use the EMC efficiently by allowing only > the 'original' packets to hit EMC. All recirculated packets are > sent to the classifier directly. > An empirical threshold EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50% - > for EMC occupancy is set to trigger this logic. By doing so when > EMC utilization exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD: > - EMC Insertions are allowed just for original packets. > EMC insertion and look up are skipped for recirculated packets. > - Recirculated packets are sent to the classifier. > > This patch is based on patch > "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at: > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017- > 2DJanuary_327570.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih- > uZnsw&m=NHY06RD-Bcweizxd86m6hcsLPKpe7a4WVSyh9aNZQlo&s=- > PhWyltJ71UipVzd1D0H0I9k4uSTLdCJ_zanXxHd7fo&e= > > CC: Jan Scheurich <jan.scheur...@ericsson.com> > Signed-off-by: Antonio Fischetti <antonio.fische...@intel.com> > Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodire...@intel.com> > Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodire...@intel.com> > --- > Connection Tracker testbench set up with > > table=0, priority=1 actions=drop > table=0, priority=10,arp actions=NORMAL > table=0, priority=100,ct_state=-trk,ip actions=ct(table=1) > table=1, ct_state=+new+trk,ip,in_port=1 actions=ct(commit),output:2 > table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2 > table=1, ct_state=+new+trk,ip,in_port=2 actions=drop > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1 > > 2 PMDs, 3 Tx queues. > > I measured packet Rx rate (regardless of packet loss). Bidirectional > test with 64B UDP packets. > Each row is a test with a different number of traffic streams. The traffic > generator is set so that each stream establishes one UDP connection. > Mpps columns reports the Rx rates on the 2 sides. > > I set up the generator to loop on the dest IP addr on one side, > and loop instead on the source IP addr on the other side. > > For example to generate 10 different flows, I was sending to phy port #1 > UDP, IPsrc:10.10.10.10, IPdest: 20.20.20.[20-29], PortSrc: 63, PortDest: > 63 > > Instead to phy port #2 (source and dest IPs are now swapped): > UDP, IPsrc: 20.20.20.[20-29], IPdest: 10.10.10.10, PortSrc: 63, PortDest: > 63 > > I saw the following performance improvement. > > Original OvS-DPDK means at Commit ID: > 6b1babacc3ca0488e07596bf822fe356c9bab646 > > +----------------------+-----------------------+ > | Original OvS-DPDK | Original OvS-DPDK | > | | + this patch | > ---------+------------+---------+------------+----------+ > Traffic | Rx | EMC | Rx | EMC | > Streams | [Mpps] | entries | [Mpps] | entries | > ---------+------------+---------+------------+----------+ > 100 | 2.43, 2.49 | 200 | 2.55, 2.57 | 201 | > 1,000 | 2.01, 2.02 | 2007 | 2.12, 2.12 | 2006 | > 2,000 | 1.93, 1.95 | 3868 | 1.98, 1.96 | 3884 | > 3,000 | 1.87, 1.91 | 5086 | 1.97, 1.97 | 4757 | > 4,000 | 1.83, 1.82 | 6173 | 1.94, 1.93 | 5280 | > 10,000 | 1.67, 1.69 | 7826 | 1.82, 1.81 | 7090 | > 30,000 | 1.57, 1.59 | 8192 | 1.66, 1.67 | 8192 | > ---------+------------+---------+------------+----------+ > > This test setup implies 1 recirculation on each received packet. > We didn't check this patch in a test scenario where more than 1 > recirculation is occurring per packet. > --- > lib/dpif-netdev.c | 65 > +++++++++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 61 insertions(+), 4 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index bea1c3f..8f6b96b 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -4663,6 +4663,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt, > packet_batch_per_flow_update(batch, pkt, mf); > } > > +/* Threshold to skip EMC for recirculated packets. */ > +#define EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000 > + > /* Try to process all ('cnt') the 'packets' using only the exact match > cache > * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', > the > * miniflow is copied into 'keys' and the packet pointer is moved at the > @@ -4714,8 +4717,36 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, > key->len = 0; /* Not computed yet. */ > key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf); > > - /* If EMC is disabled skip emc_lookup */ > - flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key); > + /* > + * EMC lookup is skipped when one or both of the following > + * two cases occurs: > + * > + * - EMC is disabled. This is detected from cur_min. > + * > + * - The EMC occupancy exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD > and > + * the packet to be classified is being recirculated. When > this > + * happens also EMC insertions are skipped for recirculated > + * packets. So that EMC is used just to store entries which > + * are hit from the 'original' packets. This way the EMC > + * thrashing is mitigated with a benefit on performance. > + */ > + if (OVS_LIKELY(cur_min)) { > + if (!md_is_valid) { > + flow = emc_lookup(flow_cache, key); > + } else { > + /* Recirculated packet. */ > + if (flow_cache->n_entries & > EMC_RECIRCT_NO_INSERT_THRESHOLD) { > + /* EMC occupancy is over the threshold. We skip EMC > + * lookup for recirculated packets. */ > + flow = NULL; > + } else { > + flow = emc_lookup(flow_cache, key); > + } > + } > + } else { > + flow = NULL; > + } > + > if (OVS_LIKELY(flow)) { > dp_netdev_queue_batches(packet, flow, &key->mf, batches, > n_batches); > @@ -4800,7 +4831,20 @@ handle_packet_upcall(struct dp_netdev_pmd_thread > *pmd, > add_actions->size); > } > ovs_mutex_unlock(&pmd->flow_mutex); > - emc_probabilistic_insert(pmd, key, netdev_flow); > + /* EMC insertion can be skipped by a probabilistic criteria or > + * - in case of recirculated packets - depending on the number of > + * EMC entries. */ > + if (!packet->md.recirc_id) { > + emc_probabilistic_insert(pmd, key, netdev_flow); > + } else { > + /* Recirculated packets. When EMC occupancy goes over > + * a threshold we avoid inserting new entries. */ > + if (!(pmd->flow_cache.n_entries & > + EMC_RECIRCT_NO_INSERT_THRESHOLD)) { > + /* Still under the threshold. */ > + emc_probabilistic_insert(pmd, key, netdev_flow); > + } > + } > } > } > > @@ -4893,7 +4937,20 @@ fast_path_processing(struct dp_netdev_pmd_thread > *pmd, > > flow = dp_netdev_flow_cast(rules[i]); > > - emc_probabilistic_insert(pmd, &keys[i], flow); > + /* EMC insertion can be skipped by a probabilistic criteria or > + * - in case of recirculated packets - depending on the number of > + * EMC entries. */ > + if (!packet->md.recirc_id) { > + emc_probabilistic_insert(pmd, &keys[i], flow); > + } else { > + /* Recirculated packets. When EMC occupancy goes over > + * a threshold we avoid inserting new entries. */ > + if (!(pmd->flow_cache.n_entries & > + EMC_RECIRCT_NO_INSERT_THRESHOLD)) { > + /* Still under the threshold. */ > + emc_probabilistic_insert(pmd, &keys[i], flow); > + } > + } > dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, > n_batches); > } > > -- > 2.4.11 > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__mail.openvswitch.org_mailman_listinfo_ovs- > 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=NHY06RD- > Bcweizxd86m6hcsLPKpe7a4WVSyh9aNZQlo&s=-xSW7voYnxrudlh_WPXXsKJ1n1o680- > 3ZCuwj33q0H8&e= > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev