>-----Original Message----- >From: Chandran, Sugesh >Sent: Monday, November 27, 2017 5:58 PM >To: Bodireddy, Bhanuprakash <[email protected]>; 'Aaron >Conole' <[email protected]> >Cc: '[email protected]' <[email protected]>; Ben Pfaff ><[email protected]> >Subject: RE: [ovs-dev] [PATCH] packets: Prefetch the packet metadata in >cacheline1. > >Hi Bhanu, > >Regards >_Sugesh > >> -----Original Message----- >> From: Bodireddy, Bhanuprakash >> Sent: Monday, November 27, 2017 4:35 PM >> To: 'Aaron Conole' <[email protected]> >> Cc: '[email protected]' <[email protected]>; Ben Pfaff >> <[email protected]>; Chandran, Sugesh <[email protected]> >> Subject: RE: [ovs-dev] [PATCH] packets: Prefetch the packet metadata >> in cacheline1. >> >> >>Bhanuprakash Bodireddy <[email protected]> writes: >> >> >> >>> pkt_metadata_prefetch_init() is used to prefetch the packet >> >>> metadata before initializing the metadata in pkt_metadata_init(). >> >>> This is done for every packet in userspace datapath and is performance >critical. >> >>> >> >>> Commit 99fc16c0 prefetches only cachline0 and cacheline2 as the >> >>> metadata part of respective cachelines will be initialized by >> >>pkt_metadata_init(). >> >>> >> >>> However in VXLAN case when popping the vxlan header, >> >>> netdev_vxlan_pop_header() invokes pkt_metadata_init_tnl() which >> >>> zeroes out metadata part of >> >>> cacheline1 that wasn't prefetched earlier and causes performance >> >>> degradation. >> >>> >> >>> By prefetching cacheline1, 9% performance improvement is observed. >> >> >> >>Do we see a degredation in the non-vxlan case? If not, then I don't >> >>see any reason not to apply this patch. >> > >> >This patch doesn't impact the performance of non-vxlan cases and only >> >have a positive impact in vxlan case. >> >> The commit message claims that the performance improvement was 9% >with >> this patch but when Sugesh was checking he wasn't getting that >> performance improvement on his Haswell. >> >> I was chatting to Sugesh this afternoon on this patch and we found >> some interesting details and much of this boils down to how the OvS is >> built .( Apart from HW, BIOS settings - TB disabled). >> >> The test case here measure the VXLAN de capsulation performance alone >> for packet sizes of 118 bytes. >> The OvS CFLAGS and throughput numbers are as below. >> >> CFLAGS="-O2" >> Master 4.667 Mpps >> With Patch 5.045 Mpps >> >> CFLAGS="-O2 -msse4.2" >> Master 4.710 Mpps >> With Patch 5.097 Mpps >> >> CFLAGS="-O2 -march=native" >> Master 5.072 Mpps >> With Patch 5.193 Mpps >> >> CFLAGS="-Ofast -march=native" >> Master 5.349 Mpps >> With Patch 5.378 Mpps >> >> This means the performance measurements/claims are difficult to assess >> and as one can see above with "-Ofast, -march=native" >> the improvement is insignificant but this is very platform dependent >> due to "march=native" flag. Also the optimization flags seems to make >> significant difference. >[Sugesh] I also tested on my board with same set of configuration and getting >the same result as yours. >So this patch offers performance improvement based on the compiler option. >I am not sure whats the most preferred/used compiler option out there. >I always build OVS with CFLAGS="-Ofast -march=native" and the patch >doesn't have a great improvement in it. > >I don't mind Acking the patch, if you could re-send the patch with these >results and options in the commit message. >Atleast it will offer performance improvement for other build options.
Thanks Sugesh for testing this out. I will send out v2 of this with the information I mentioned in the earlier mail included in the commit message. Bhanuprakash. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
