Re: [ovs-dev] [RFC Patch] dpif-netdev: Sorted subtable vectors per in_port in dpcls

Bodireddy, Bhanuprakash Wed, 22 Jun 2016 07:28:33 -0700

>-----Original Message-----
>From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Jan
>Scheurich
>Sent: Thursday, June 16, 2016 2:56 PM
>To: dev@openvswitch.org
>Subject: [ovs-dev] [RFC Patch] dpif-netdev: Sorted subtable vectors per
>in_port in dpcls
>
>The user-space datapath (dpif-netdev) consists of a first level "exact match
>cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With
>many parallel packet flows (e.g. TCP connections) the EMC becomes
>inefficient and the OVS forwarding performance is determined by the
>megaflow classifier.
>
>The megaflow classifier (dpcls) consists of a variable number of hash tables
>(aka subtables), each containing megaflow entries with the same mask of
>packet header and metadata fields to match upon. A dpcls lookup matches a
>given packet against all subtables in sequence until it hits a match. As
>megaflow cache entries are by construction non-overlapping, the first match
>is the only match.
>
>Today the order of the subtables in the dpcls is essentially random so that on
>average a dpcsl lookup has to visit N/2 subtables for a hit, when N is the 
>total
>number of subtables. Even though every single hash-table lookup is fast, the
>performance of the current dpcls degrades when there are many subtables.
>
>How does the patch address this issue:
>
>In reality there is often a strong correlation between the ingress port and a
>small subset of subtables that have hits. The entire megaflow cache typically
>decomposes nicely into partitions that are hit only by packets entering from a
>range of similar ports (e.g. traffic from Phy  -> VM vs. traffic from VM -> 
>Phy).
>
>Therefore, keeping a separate list of subtables per ingress port, sorted by
>frequency of hits, reduces the average number of subtables lookups in the
>dpcls to a minimum, even if the total number of subtables gets large.


I like the proposed approach of subtable prioritization for each ingress port 
there by reducing the lookup time.
+1 on this approach.

>
>The patch introduces 32 subtable vectors per dpcls and hashes the ingress
>port to select the subtable vector. The patch also counts matches per 32 slots
>in each vector (hashing the subtable pointer to obtain the slot) and sorts the
>vectors according to match frequency every second.
>
>To monitor the effectiveness of the patch we have enhanced the ovs-appctl
>dpif-netdev/pmd-stats-show command with an extra line "avg. subtable
>lookups per hit" to report the average number of subtable lookup needed for
>a megaflow match. Ideally, this should be close to 1 and much smaller than
>N/2.
>
>I have benchmarked a cloud L3 overlay pipeline with a VXLAN overlay mesh.
>With pure L3 tenant traffic between VMs on different nodes the resulting
>netdev dpcls contains N=4 subtables.
>
>Disabling the EMC, I have measured a baseline performance (in+out) of ~1.32
>Mpps (64 bytes, 1000 L4 flows). The average number of subtable lookups per
>dpcls match is 2.5.
>
>With the patch the average number of subtable lookups per dpcls match goes
>down to 1.25 (apparently there are still two ports of different nature hashed
>to the same vector, otherwise it should be exactly one). Even so the
>forwarding performance grows by ~30% to 1.72 Mpps.

I ran some benchmarks and observed that the patch improves performance even 
with multiple subtables around.
EMC is disabled here and had 5 VMs doing packet forwarding. The flow rules are 
setup so that 8 subtables are created and 
the performance improvement of 16% was observed In this case. I would like to 
try some more complex test scenarios when I get time.

Regards,
Bhanu Prakash.

>
>As the number of subtables will often be higher in reality, we can assume that
>this is at the lower end of the speed-up one can expect from this
>optimization. Just running a parallel ping between the VXLAN tunnel
>endpoints increases the number of subtables and hence the average number
>of subtable lookups from 2.5 to 3.5 with a corresponding decrease of
>throughput to 1.14 Mpps. With the patch the parallel ping has no impact on
>average number of subtable lookups and performance. The performance gain
>is then ~50%.
>
>Signed-off-by: Jan Scheurich <jan.scheur...@ericsson.com>
> 
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [RFC Patch] dpif-netdev: Sorted subtable vectors per in_port in dpcls

Reply via email to