Hi Bhanu, Regards _Sugesh
> -----Original Message----- > From: Bodireddy, Bhanuprakash > Sent: Monday, November 27, 2017 4:35 PM > To: 'Aaron Conole' <acon...@redhat.com> > Cc: 'd...@openvswitch.org' <d...@openvswitch.org>; Ben Pfaff <b...@ovn.org>; > Chandran, Sugesh <sugesh.chand...@intel.com> > Subject: RE: [ovs-dev] [PATCH] packets: Prefetch the packet metadata in > cacheline1. > > >>Bhanuprakash Bodireddy <bhanuprakash.bodire...@intel.com> writes: > >> > >>> pkt_metadata_prefetch_init() is used to prefetch the packet metadata > >>> before initializing the metadata in pkt_metadata_init(). This is > >>> done for every packet in userspace datapath and is performance critical. > >>> > >>> Commit 99fc16c0 prefetches only cachline0 and cacheline2 as the > >>> metadata part of respective cachelines will be initialized by > >>pkt_metadata_init(). > >>> > >>> However in VXLAN case when popping the vxlan header, > >>> netdev_vxlan_pop_header() invokes pkt_metadata_init_tnl() which > >>> zeroes out metadata part of > >>> cacheline1 that wasn't prefetched earlier and causes performance > >>> degradation. > >>> > >>> By prefetching cacheline1, 9% performance improvement is observed. > >> > >>Do we see a degredation in the non-vxlan case? If not, then I don't > >>see any reason not to apply this patch. > > > >This patch doesn't impact the performance of non-vxlan cases and only > >have a positive impact in vxlan case. > > The commit message claims that the performance improvement was 9% with > this patch but when Sugesh was checking he wasn't getting that performance > improvement on his Haswell. > > I was chatting to Sugesh this afternoon on this patch and we found some > interesting details and much of this boils down to how the OvS is built .( > Apart > from HW, BIOS settings - TB disabled). > > The test case here measure the VXLAN de capsulation performance alone for > packet sizes of 118 bytes. > The OvS CFLAGS and throughput numbers are as below. > > CFLAGS="-O2" > Master 4.667 Mpps > With Patch 5.045 Mpps > > CFLAGS="-O2 -msse4.2" > Master 4.710 Mpps > With Patch 5.097 Mpps > > CFLAGS="-O2 -march=native" > Master 5.072 Mpps > With Patch 5.193 Mpps > > CFLAGS="-Ofast -march=native" > Master 5.349 Mpps > With Patch 5.378 Mpps > > This means the performance measurements/claims are difficult to assess and as > one can see above with "-Ofast, -march=native" > the improvement is insignificant but this is very platform dependent due to > "march=native" flag. Also the optimization flags seems to make significant > difference. [Sugesh] I also tested on my board with same set of configuration and getting the same result as yours. So this patch offers performance improvement based on the compiler option. I am not sure whats the most preferred/used compiler option out there. I always build OVS with CFLAGS="-Ofast -march=native" and the patch doesn't have a great improvement in it. I don't mind Acking the patch, if you could re-send the patch with these results and options in the commit message. Atleast it will offer performance improvement for other build options. > > - Bhanuprakash. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev