Re: [ovs-dev] [PATCH v1 0/6] Memory access optimization for flow scalability of userspace datapath.

William Tu Sun, 05 Jul 2020 06:24:26 -0700

On Tue, Jun 30, 2020 at 2:26 AM Yanqin Wei <yanqin....@arm.com> wrote:
>
> Hi, every contributor
>
> These patches could significantly improve multi-flow throughput of userspace 
> datapath.  If you feel it will take too much time to review all patches, I 
> suggest you could look at the 2nd/3rd first, which have the major improvement 
> in these patches.
> [ovs-dev][PATCH v1 2/6] dpif-netdev: add tunnel_valid flag to skip ip/ipv6 
> address comparison
> [ovs-dev][PATCH v1 3/6] dpif-netdev: improve emc lookup performance by 
> contiguous storage of hash value.
>
> Any comments from anyone are appreciated.
>
> Best Regards,
> Wei Yanqin
>
> > -----Original Message-----
> > From: Yanqin Wei <yanqin....@arm.com>
> > Sent: Tuesday, June 2, 2020 3:10 PM
> > To: d...@openvswitch.org
> > Cc: nd <n...@arm.com>; i.maxim...@ovn.org; u9012...@gmail.com; Malvika
> > Gupta <malvika.gu...@arm.com>; Lijian Zhang <lijian.zh...@arm.com>;
> > Ruifeng Wang <ruifeng.w...@arm.com>; Lance Yang
> > <lance.y...@arm.com>; Yanqin Wei <yanqin....@arm.com>
> > Subject: [ovs-dev][PATCH v1 0/6] Memory access optimization for flow
> > scalability of userspace datapath.
> >
> > OVS userspace datapath is a program with heavy memory access. It needs to
> > load/store a large number of memory, including packet header, metadata,
> > EMC/SMC/DPCLS tables and so on. It causes a lot of cache line missing and
> > refilling, which has a great impact on flow scalability. And in some cases, 
> > EMC
> > has a negative impact on the overall performance. It is difficult for user 
> > to
> > dynamically manage the enabling of EMC.
> >
> > This series of patches improve memory access of userspace datapath as
> > follows:
> > 1. Reduce the number of metadata cache line accessed by non-tunnel traffic.
> > 2. Decrease unnecessary memory load/store for batch/flow.
> > 3. Modify the layout of EMC data struct. Centralize the storage of hash 
> > value.
> >
> > In the NIC2NIC traffic tests, the overall performance improvement is 
> > observed,
> > especially in multi-flow cases.
> > Flows           delta
> > 1-1K flows      5-10%
> > 10K flows       20%
> > 100K flows      40%
> > EMC disable     10%


Thanks for submitting the patch series. I apply the series and I do see the
above performance improvement you describe above.
btw, is your number on ARM server or x86?
Below is my number using single flow and drop action on Intel(R)
Xeon(R) CPU @ 2.00GHz
In summary I see around 10% improvement using 1flow.

=== master ===
root@instance-3:~/ovs# ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 0:
  packets received: 96269888
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 87513839
  smc hits: 0
  megaflow hits: 8755584
  avg. subtable lookups per megaflow hit: 1.00
  miss with success upcall: 1
  miss with failed upcall: 432
  avg. packets per output batch: 0.00
  idle cycles: 0 (0.00%)
  processing cycles: 20083008856 (100.00%)
  avg cycles per packet: 208.61 (20083008856/96269888)
  avg processing cycles per packet: 208.61 (20083008856/96269888)

=== master without EMC ===
pmd thread numa_id 0 core_id 1:
  packets received: 90775936
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 0
  smc hits: 0
  megaflow hits: 90775424
  avg. subtable lookups per megaflow hit: 1.00
  miss with success upcall: 1
  miss with failed upcall: 479
  avg. packets per output batch: 0.00
  idle cycles: 0 (0.00%)
  processing cycles: 21239087946 (100.00%)
  avg cycles per packet: 233.97 (21239087946/90775936)
  avg processing cycles per packet: 233.97 (21239087946/90775936)

=== yanqin v1: ===
pmd thread numa_id 0 core_id 1:
  packets received: 156582112
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 142344109
  smc hits: 0
  megaflow hits: 14237554
  avg. subtable lookups per megaflow hit: 1.00
  miss with success upcall: 1
  miss with failed upcall: 448
  avg. packets per output batch: 0.00
  idle cycles: 4320112 (0.01%)
  processing cycles: 30503055968 (99.99%)
  avg cycles per packet: 194.83 (30507376080/156582112)
  avg processing cycles per packet: 194.81 (30503055968/156582112)

=== yanqin v1 without EMC: ===
pmd thread numa_id 0 core_id 0:
  packets received: 48441664
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 0
  smc hits: 0
  megaflow hits: 48441182
  avg. subtable lookups per megaflow hit: 1.00
  miss with success upcall: 1
  miss with failed upcall: 449
  avg. packets per output batch: 0.00
  idle cycles: 0 (0.00%)
  processing cycles: 10513468302 (100.00%)
  avg cycles per packet: 217.03 (10513468302/48441664)
  avg processing cycles per packet: 217.03 (10513468302/48441664)
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v1 0/6] Memory access optimization for flow scalability of userspace datapath.

Reply via email to