> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> Sent: Monday, 29 July 2024 21.27
> 
> On 2024-07-29 13:00, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com]
> >> Sent: Wednesday, 24 July 2024 09.54
> >
> > Which packet mix was used for your tests? Synthetic IMIX, or some live
> data?
> >
> 
> I used the same test as was being done when the performance regression
> was demonstrated (i.e., 2x testpmd with fixed packet size).
> 
> >> +/* The code generated by GCC (and to a lesser extent, clang) with
> just
> >> + * a straight memcpy() to copy packets is less than optimal on Intel
> >> + * P-cores, for small packets. Thus the need of this specialized
> >> + * memcpy() in builds where use_cc_memcpy is set to true.
> >> + */
> >> +#if defined(RTE_USE_CC_MEMCPY) && defined(RTE_ARCH_X86_64)
> >> +static __rte_always_inline void
> >> +pktcpy(void *restrict in_dst, const void *restrict in_src, size_t
> len)
> >> +{
> >> +  void *dst = __builtin_assume_aligned(in_dst, 16);
> >> +  const void *src = __builtin_assume_aligned(in_src, 16);
> >> +
> >> +  if (len <= 256) {
> >> +          size_t left;
> >> +
> >> +          for (left = len; left >= 32; left -= 32) {
> >> +                  memcpy(dst, src, 32);
> >> +                  dst = RTE_PTR_ADD(dst, 32);
> >> +                  src = RTE_PTR_ADD(src, 32);
> >> +          }
> >> +
> >> +          memcpy(dst, src, left);
> >> +  } else
> >
> > Although the packets within a burst often have similar size, I'm not
> sure you can rely on the dynamic branch predictor here.
> >
> 
> I agree that the pktcpy() routine will likely often suffer a
> size-related branch mispredict with real packet size mix. A benchmark
> with a real packet mix would be much better than the tests I've run.
> 
> This needs to be compared, of course, with the overhead imposed by
> conditionals included in other pktcpy() implementations.

If testing with fixed packet size, only one of the branches will be taken - 
always!
And thus the branch predictor will always predict it correctly - in the test.

So, if this code performs better than simple memcpy(), I can conclude that you 
are testing with packet size <= 256.

> 
> > Looking at the ethdev packet size counters at an ISP (at the core of
> their Layer 3 network), 71 % are 256 byte or larger [1].
> >
> > For static branch prediction, I would consider > 256 more likely and
> swap the two branches, i.e. compare (len > 256) instead of (len <= 256).
> >
> 
> OK, I'll add likely() instead, to make it more explicit.
> 
> > But again: I don't know how the dynamic branch predictor behaves here.
> Perhaps my suggested change makes no difference.
> >
> 
> I think it will, but it will be tiny. From what I understand, even when
> the branch prediction guessed correctly, one receive a slight benefit if
> the branch is not taken.
> 
> >> +          memcpy(dst, src, len);
> >> +}
> >
> > With or without suggested change,
> > Acked-by: Morten Brørup <m...@smartsharesystems.com>
> >
> >
> > [1]: Details (incl. one VLAN tag)
> > tx_size_64_packets            1,1 %
> > tx_size_65_to_127_packets    25,7 %
> > tx_size_128_to_255_packets    2,6 %
> > tx_size_256_to_511_packets    1,4 %
> > tx_size_512_to_1023_packets   1,7 %
> > tx_size_1024_to_1522_packets 67,6 %
> >

Reply via email to