I just submitted a v3 version of the patch. No need to review this one. Jan
> -----Original Message----- > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Jan Scheurich > Sent: Friday, 15 July, 2016 18:35 > To: dev@openvswitch.org > Subject: [ovs-dev] [PATCH v2] dpif-netdev: dpcls per in_port with sorted > subtables > > This turns the previous RFC PATCH dpif-netdev: dpcls per in_port with sorted > subtables into a non-RFC patch v2. > > The user-space datapath (dpif-netdev) consists of a first level "exact match > cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With > many parallel packet flows (e.g. TCP connections) the EMC becomes inefficient > and the OVS forwarding performance is determined by the megaflow classifier. > > The megaflow classifier (dpcls) consists of a variable number of hash tables > (aka subtables), each containing megaflow entries with the same mask of > packet header and metadata fields to match upon. A dpcls lookup matches a > given packet against all subtables in sequence until it hits a match. As > megaflow cache entries are by construction non-overlapping, the first match is > the only match. > > Today the order of the subtables in the dpcls is essentially random so that on > average a dpcsl lookup has to visit N/2 subtables for a hit, when N is the > total > number of subtables. Even though every single hash-table lookup is fast, the > performance of the current dpcls degrades when there are many subtables. > > How does the patch address this issue: > > In reality there is often a strong correlation between the ingress port and a > small subset of subtables that have hits. The entire megaflow cache typically > decomposes nicely into partitions that are hit only by packets entering from a > range of similar ports (e.g. traffic from Phy -> VM vs. traffic from VM -> > Phy). > > Therefore, maintaining a separate dpcls instance per ingress port with its > subtable vector sorted by frequency of hits reduces the average number of > subtables lookups in the dpcls to a minimum, even if the total number of > subtables gets large. This is possible because megaflows always have an exact > match on in_port, so every megaflow belongs to unique dpcls instance. > > For thread safety, the PMD thread needs to block out revalidators during the > periodic optimization. We use ovs_mutex_trylock() to avoid blocking the PMD. > > To monitor the effectiveness of the patch we have enhanced the ovs-appctl > dpif-netdev/pmd-stats-show command with an extra line "avg. subtable lookups > per hit" to report the average number of subtable lookup needed for a > megaflow match. Ideally, this should be close to 1 and almost all cases much > smaller than N/2. > > I have benchmarked a cloud L3 overlay pipeline with a VXLAN overlay mesh. > With pure L3 tenant traffic between VMs on different nodes the resulting > netdev dpcls contains N=4 subtables. > > Disabling the EMC, I have measured a baseline performance (in+out) of ~1.32 > Mpps (64 bytes, 1000 L4 flows). The average number of subtable lookups per > dpcls match is 2.5. > > With the patch the average number of subtable lookups per dpcls match is > reduced 1 and the forwarding performance grows by ~30% to 1.72 Mpps. > > As the actual number of subtables will often be higher in reality, we can > assume that this is at the lower end of the speed-up one can expect from this > optimization. Just running a parallel ping between the VXLAN tunnel endpoints > increases the number of subtables and hence the average number of subtable > lookups from 2.5 to 3.5 with a corresponding decrease of throughput to 1.14 > Mpps. With the patch the parallel ping has no impact on average number of > subtable lookups and performance. The performance gain is then ~50%. > > The main change to the previous patch is that instead of having a subtable > vector per in_port in a single dplcs instance, we now have one dpcls instance > with a single subtable per ingress port. This is better aligned with the > design > base code and also improves the number of subtable lookups in a miss case. > > The PMD tests have been adjusted to the additional line in pmd-stats-show. > > Signed-off-by: Jan Scheurich <jan.scheur...@ericsson.com> > > > Changes in v2: > - Rebased to master (commit 3041e1fc9638) > - Take the pmd->flow_mutex during optimization to block out revalidators > Use trylock in order to not block the PMD thread > - Made in_port an explicit input parameter to fast_path_processing() > - Fixed coding style issues > > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev