-----Original Message-----
From: "Fischetti, Antonio" <[email protected]>
Date: Tuesday, August 15, 2017 at 6:55 AM
To: Darrell Ball <[email protected]>, "[email protected]" 
<[email protected]>
Subject: RE: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for 
recirc packets

    
    
    > -----Original Message-----
    > From: Darrell Ball [mailto:[email protected]]
    > Sent: Monday, August 14, 2017 7:27 AM
    > To: Fischetti, Antonio <[email protected]>; [email protected]
    > Subject: Re: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert 
for
    > recirc packets
    > 
    > 
    > 
    > -----Original Message-----
    > From: <[email protected]> on behalf of
    > "[email protected]" <[email protected]>
    > Date: Friday, August 11, 2017 at 8:52 AM
    > To: "[email protected]" <[email protected]>
    > Subject: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for
    >   recirc packets
    > 
    >     When OVS is configured as a firewall, with thousands of active
    >     concurrent connections, the EMC gets quicly saturated and may
    >     come under heavy thrashing for the reason that original and
    >     recirculated packets keep overwriting the existing active EMC
    >     entries due to its limited size (8k).
    > 
    > 
    > The recirculated packet could have been modified, in which case, maybe we
    > still want to do the emc lookup/insert ?
    
    [Antonio] 
    IMPO I'd say we should still skip emc anyway, because the purpose is to 
    mitigate thrashing when emc is full. So any recirculated packet should
    be classified at the dpcls/ofproto layers.
    I don't know if I'm missing something from your question?
    
    We can expect that a recirc pkt that has been modified - similarly to all 
    other recirculated pkts - could result in a miss when emc is full. 
    Later we should do an emc insertion that is likely to overwrite some 
    active entry. And recursively, this new insertion itself could be 
    overwritten - due to the shortage of locations - even before it is hit 
    again. This proposal is to mitigate the thrashing with the criteria of 
    reserving emc usage to original packets only. 
    So a limited resource like emc hopefully could be used more efficiently, 
    especially when there is more than 1 recirculation.
    I guess that adding an exception for modified recirc pkts could also 
    drop a bit the throughtput as we should add another if statement inside 
    emc_processing.

[Darrell]
I’ll can drop the edited packet case as my concern was really more general.
The concern is that recirculated packets should still be forwarded quickly if 
possible
and using emc should help that. The first time through, emc is used for the 
packet and then the second
time through, emc is not used, so it is slower. But, possibly the argument 
could be made that since it is recirculated,
it is already slower, in which case, maybe a penalty for recirculated packets 
is reasonable.
Instead of having a simple 50% black and white cutoff, maybe a penalty to the 
insertion probability could be used ?

    
    > 
    > 
    >     This thrashing causes the EMC to be less efficient than the dcpls
    >     in terms of lookups and insertions.
    > 
    >     This patch allows to use the EMC efficiently by allowing only
    >     the 'original' packets to hit EMC. All recirculated packets are
    >     sent to the classifier directly.
    >     An empirical threshold EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50% -
    >     for EMC occupancy is set to trigger this logic. By doing so when
    >     EMC utilization exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD:
    >      - EMC Insertions are allowed just for original packets.
    >        EMC insertion and look up are skipped for recirculated packets.
    >      - Recirculated packets are sent to the classifier.
    > 
    >     This patch is based on patch
    >     "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" 
at:
    >     https://urldefense.proofpoint.com/v2/url?u=https-
    > 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
    > 
2DJanuary_327570.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
    > uZnsw&m=NHY06RD-Bcweizxd86m6hcsLPKpe7a4WVSyh9aNZQlo&s=-
    > PhWyltJ71UipVzd1D0H0I9k4uSTLdCJ_zanXxHd7fo&e=
    > 
    >     CC: Jan Scheurich <[email protected]>
    >     Signed-off-by: Antonio Fischetti <[email protected]>
    >     Signed-off-by: Bhanuprakash Bodireddy 
<[email protected]>
    >     Co-authored-by: Bhanuprakash Bodireddy 
<[email protected]>
    >     ---
    >     Connection Tracker testbench set up with
    > 
    >      table=0, priority=1 actions=drop
    >      table=0, priority=10,arp actions=NORMAL
    >      table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
    >      table=1, ct_state=+new+trk,ip,in_port=1 actions=ct(commit),output:2
    >      table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
    >      table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
    >      table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
    > 
    >     2 PMDs, 3 Tx queues.
    > 
    >     I measured packet Rx rate (regardless of packet loss). Bidirectional
    >     test with 64B UDP packets.
    >     Each row is a test with a different number of traffic streams. The 
traffic
    >     generator is set so that each stream establishes one UDP connection.
    >     Mpps columns reports the Rx rates on the 2 sides.
    > 
    >     I set up the generator to loop on the dest IP addr on one side,
    >     and loop instead on the source IP addr on the other side.
    > 
    >     For example to generate 10 different flows, I was sending to phy port 
#1
    >     UDP, IPsrc:10.10.10.10, IPdest: 20.20.20.[20-29], PortSrc: 63, 
PortDest: 63
    > 
    >     Instead to phy port #2 (source and dest IPs are now swapped):
    >     UDP, IPsrc: 20.20.20.[20-29], IPdest: 10.10.10.10, PortSrc: 63, 
PortDest:
    > 63
    > 
    >     I saw the following performance improvement.
    > 
    >     Original OvS-DPDK means at Commit ID:
    >       6b1babacc3ca0488e07596bf822fe356c9bab646
    > 
    >               +----------------------+-----------------------+
    >               |  Original OvS-DPDK   |   Original OvS-DPDK   |
    >               |                      |    + this patch       |
    >      ---------+------------+---------+------------+----------+
    >       Traffic |     Rx     |   EMC   |     Rx     |   EMC    |
    >       Streams |   [Mpps]   | entries |   [Mpps]   | entries  |
    >      ---------+------------+---------+------------+----------+
    >          100  | 2.43, 2.49 |   200   | 2.55, 2.57 |   201    |
    >        1,000  | 2.01, 2.02 |  2007   | 2.12, 2.12 |  2006    |
    >        2,000  | 1.93, 1.95 |  3868   | 1.98, 1.96 |  3884    |
    >        3,000  | 1.87, 1.91 |  5086   | 1.97, 1.97 |  4757    |
    >        4,000  | 1.83, 1.82 |  6173   | 1.94, 1.93 |  5280    |
    >       10,000  | 1.67, 1.69 |  7826   | 1.82, 1.81 |  7090    |
    >       30,000  | 1.57, 1.59 |  8192   | 1.66, 1.67 |  8192    |
    >      ---------+------------+---------+------------+----------+
    > 
    >     This test setup implies 1 recirculation on each received packet.
    >     We didn't check this patch in a test scenario where more than 1
    >     recirculation is occurring per packet.
    >     ---
    >      lib/dpif-netdev.c | 65
    > +++++++++++++++++++++++++++++++++++++++++++++++++++----
    >      1 file changed, 61 insertions(+), 4 deletions(-)
    > 
    >     diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    >     index bea1c3f..8f6b96b 100644
    >     --- a/lib/dpif-netdev.c
    >     +++ b/lib/dpif-netdev.c
    >     @@ -4663,6 +4663,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
    >          packet_batch_per_flow_update(batch, pkt, mf);
    >      }
    > 
    >     +/* Threshold to skip EMC for recirculated packets. */
    >     +#define EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000
    >     +
    >      /* Try to process all ('cnt') the 'packets' using only the exact 
match
    > cache
    >       * 'pmd->flow_cache'. If a flow is not found for a packet 
'packets[i]',
    > the
    >       * miniflow is copied into 'keys' and the packet pointer is moved at 
the
    >     @@ -4714,8 +4717,36 @@ emc_processing(struct dp_netdev_pmd_thread 
*pmd,
    >              key->len = 0; /* Not computed yet. */
    >              key->hash = dpif_netdev_packet_get_rss_hash(packet, 
&key->mf);
    > 
    >     -        /* If EMC is disabled skip emc_lookup */
    >     -        flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
    >     +        /*
    >     +         * EMC lookup is skipped when one or both of the following
    >     +         * two cases occurs:
    >     +         *
    >     +         *    - EMC is disabled.  This is detected from cur_min.
    >     +         *
    >     +         *    - The EMC occupancy exceeds 
EMC_RECIRCT_NO_INSERT_THRESHOLD
    > and
    >     +         *      the packet to be classified is being recirculated.  
When
    > this
    >     +         *      happens also EMC insertions are skipped for 
recirculated
    >     +         *      packets.  So that EMC is used just to store entries 
which
    >     +         *      are hit from the 'original' packets.  This way the 
EMC
    >     +         *      thrashing is mitigated with a benefit on performance.
    >     +         */
    >     +        if (OVS_LIKELY(cur_min)) {
    >     +            if (!md_is_valid) {
    >     +                flow = emc_lookup(flow_cache, key);
    >     +            } else {
    >     +                /* Recirculated packet. */
    >     +                if (flow_cache->n_entries &
    > EMC_RECIRCT_NO_INSERT_THRESHOLD) {
    >     +                    /* EMC occupancy is over the threshold.  We skip 
EMC
    >     +                     * lookup for recirculated packets. */
    >     +                    flow = NULL;
    >     +                } else {
    >     +                    flow = emc_lookup(flow_cache, key);
    >     +                }
    >     +            }
    >     +        } else {
    >     +            flow = NULL;
    >     +        }
    >     +
    >              if (OVS_LIKELY(flow)) {
    >                  dp_netdev_queue_batches(packet, flow, &key->mf, batches,
    >                                          n_batches);
    >     @@ -4800,7 +4831,20 @@ handle_packet_upcall(struct 
dp_netdev_pmd_thread
    > *pmd,
    >                                                   add_actions->size);
    >              }
    >              ovs_mutex_unlock(&pmd->flow_mutex);
    >     -        emc_probabilistic_insert(pmd, key, netdev_flow);
    >     +        /* EMC insertion can be skipped by a probabilistic criteria 
or
    >     +         * - in case of recirculated packets - depending on the 
number of
    >     +         * EMC entries. */
    >     +        if (!packet->md.recirc_id) {
    >     +            emc_probabilistic_insert(pmd, key, netdev_flow);
    >     +        } else {
    >     +            /* Recirculated packets.  When EMC occupancy goes over
    >     +             * a threshold we avoid inserting new entries. */
    >     +            if (!(pmd->flow_cache.n_entries &
    >     +                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
    >     +                /* Still under the threshold. */
    >     +                emc_probabilistic_insert(pmd, key, netdev_flow);
    >     +            }
    >     +        }
    >          }
    >      }
    > 
    >     @@ -4893,7 +4937,20 @@ fast_path_processing(struct 
dp_netdev_pmd_thread
    > *pmd,
    > 
    >              flow = dp_netdev_flow_cast(rules[i]);
    > 
    >     -        emc_probabilistic_insert(pmd, &keys[i], flow);
    >     +        /* EMC insertion can be skipped by a probabilistic criteria 
or
    >     +         * - in case of recirculated packets - depending on the 
number of
    >     +         * EMC entries. */
    >     +        if (!packet->md.recirc_id) {
    >     +            emc_probabilistic_insert(pmd, &keys[i], flow);
    >     +        } else {
    >     +            /* Recirculated packets.  When EMC occupancy goes over
    >     +             * a threshold we avoid inserting new entries. */
    >     +            if (!(pmd->flow_cache.n_entries &
    >     +                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
    >     +                /* Still under the threshold. */
    >     +                emc_probabilistic_insert(pmd, &keys[i], flow);
    >     +            }
    >     +        }
    >              dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
    > n_batches);
    >          }
    > 
    >     --
    >     2.4.11
    > 
    >     _______________________________________________
    >     dev mailing list
    >     [email protected]
    >     https://urldefense.proofpoint.com/v2/url?u=https-
    > 3A__mail.openvswitch.org_mailman_listinfo_ovs-
    > 
2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=NHY06RD-
    > Bcweizxd86m6hcsLPKpe7a4WVSyh9aNZQlo&s=-xSW7voYnxrudlh_WPXXsKJ1n1o680-
    > 3ZCuwj33q0H8&e=
    > 
    
    

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to