> -----Original Message-----
> From: Flavio Leitner <f...@sysclose.org>
> Sent: Thursday 15 July 2021 19:58
> To: Ferriter, Cian <cian.ferri...@intel.com>
> Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> Subject: Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD 
> statistic.
> 
> On Thu, Jul 15, 2021 at 01:39:04PM +0000, Ferriter, Cian wrote:
> >
> >
> > > -----Original Message-----
> > > From: Flavio Leitner <f...@sysclose.org>
> > > Sent: Friday 9 July 2021 18:54
> > > To: Ferriter, Cian <cian.ferri...@intel.com>
> > > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> > > Subject: Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD 
> > > statistic.
> > >
> > >
> > >
> > > Hi,
> > >
> > > After rebasing, the performance of branch master boosted in my env
> > > from 12Mpps to 13Mpps. However, this specific patch brings down
> > > to 12Mpps. I am using dpif_scalar and generic lookup (no AVX512).
> > >
> >
> > Thanks for the investigation. Always great seeing perf numbers and details!
> >
> > I just want to check my understanding here with what you're seeing:
> >
> > Performance before DPIF patchset
> > 12Mpps
> >
> > Performance at this patch
> > 12Mpps
> >
> > Performance after DPIF patchset
> > 13Mpps
> >
> > So the performance recovers somewhere else in the patchset?
> 
> 
> Interesting, which flags are you passing to build OVS?
> 
> Thanks for following up!
> fbl
> 
> 

My flags:
./configure CFLAGS="-g -Ofast -march=native" --with-dpdk=static

This is how I build OVS to get the performance numbers below.

> >
> > I've checked the performance behaviour in my case. I'm going to report 
> > relative performance numbers.
> They are relative to master branch before AVX512 DPIF was applied (c36c8e3).
> > I tried to run a similar testcase, I can see you are using EMC from the 
> > memcmp in perf top output. I
> am also using the scalar DPIF in all the below testcases.
> >
> > Master before AVX512 DPIF (c36c8e3)
> > 1.000x (0.0%)
> > DPIF patch 3 - dpif-avx512: Add ISA implementation of dpif.
> > 1.010x (1.0%)
> > DPIF patch 4 - dpif-netdev: Add command to switch dpif implementation.
> > 1.042x (4.2%)
> > DPIF patch 5 - dpif-netdev: Add command to get dpif implementations.
> > 1.063x (6.3%)
> > DPIF patch 6 - dpif-netdev: Add a partial HWOL PMD statistic.
> > 1.069x (6.9%)
> > Latest master which has AVX512 DPIF patches (d2e9703)
> > 1.075x (7.5%)
> > Master before AVX512 DPIF (c36c8e3), with prefetch change
> > 0.983x (-1.7%)
> > Latest master which has AVX512 DPIF patches (d2e9703), with prefetch change
> > 1.080x (8.0%)
> >
> > > (I don't think this report should block the patch because the
> > > counter are interesting and the analysis below doesn't point
> > > directly to the proposed changes.)
> > >
> > > This is a diff using all patches applied versus this patch reverted:
> > >     21.44%     +6.08%  ovs-vswitchd        [.] miniflow_extract
> > >      8.94%     -1.92%  libc-2.28.so        [.] __memcmp_avx2_movbe
> > >     14.62%     +1.44%  ovs-vswitchd        [.] dp_netdev_input__
> > >      2.80%     -1.08%  ovs-vswitchd        [.] 
> > > dp_netdev_pmd_flush_output_on_port
> > >      3.44%     -0.91%  ovs-vswitchd        [.] netdev_send
> > >
> > > This is the code side by side, patch applied on the right side:
> > > (sorry, long lines)
> > >
> >
> > My mail client has wrapped the below lines, sorry for mangling the output!
> >
> > <snip mangled perf diff output>
> > Please find it here:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385448.html
> >
> > >
> > >
> > > I don't see any relevant optimization difference in the code
> > > above, but the "mov %r15w,-0x2(%r13)" on the right side accounts
> > > for almost all the difference, though on the left side it seems
> > > a bit more spread.
> > >
> > > I applied the patch below and it helped to get to 12.7Mpps, so
> > > almost at the same levels. I wonder if you see the same result.
> > >
> >
> > Since I don't see the drop that you see with this patch, when I apply the 
> > below patch to the latest
> master, I see a smaller benefit.
> > The relative performance after adding the below prefetch compared to before 
> > (latest master):
> > 1.005x (0.5%)
> >
> > When I compare before/after performance (including the prefetch code, on 
> > latest master), the overall
> performance difference is 0.5% here.
> >
> > > diff --git a/lib/flow.c b/lib/flow.c
> > > index 729d59b1b..4572e356b 100644
> > > --- a/lib/flow.c
> > > +++ b/lib/flow.c
> > > @@ -746,6 +746,9 @@ miniflow_extract(struct dp_packet *packet, struct 
> > > miniflow *dst)
> > >      uint8_t *ct_nw_proto_p = NULL;
> > >      ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
> > >
> > > +    /* dltype will be updated later. */
> > > +    OVS_PREFETCH_WRITE(miniflow_pointer(mf, dl_type));
> > > +
> > >      /* Metadata. */
> > >      if (flow_tnl_dst_is_set(&md->tunnel)) {
> > >          miniflow_push_words(mf, tunnel, &md->tunnel,
> > >
> > >
> > > fbl
> > >
> >
> > <snip actual patch away>
> >
> > Thanks,
> > Cian
> 
> --
> fbl
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to