> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Monday, 29 July 2024 21.27 > > On 2024-07-29 13:00, Morten Brørup wrote: > >> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] > >> Sent: Wednesday, 24 July 2024 09.54 > > > > Which packet mix was used for your tests? Synthetic IMIX, or some live > data? > > > > I used the same test as was being done when the performance regression > was demonstrated (i.e., 2x testpmd with fixed packet size). > > >> +/* The code generated by GCC (and to a lesser extent, clang) with > just > >> + * a straight memcpy() to copy packets is less than optimal on Intel > >> + * P-cores, for small packets. Thus the need of this specialized > >> + * memcpy() in builds where use_cc_memcpy is set to true. > >> + */ > >> +#if defined(RTE_USE_CC_MEMCPY) && defined(RTE_ARCH_X86_64) > >> +static __rte_always_inline void > >> +pktcpy(void *restrict in_dst, const void *restrict in_src, size_t > len) > >> +{ > >> + void *dst = __builtin_assume_aligned(in_dst, 16); > >> + const void *src = __builtin_assume_aligned(in_src, 16); > >> + > >> + if (len <= 256) { > >> + size_t left; > >> + > >> + for (left = len; left >= 32; left -= 32) { > >> + memcpy(dst, src, 32); > >> + dst = RTE_PTR_ADD(dst, 32); > >> + src = RTE_PTR_ADD(src, 32); > >> + } > >> + > >> + memcpy(dst, src, left); > >> + } else > > > > Although the packets within a burst often have similar size, I'm not > sure you can rely on the dynamic branch predictor here. > > > > I agree that the pktcpy() routine will likely often suffer a > size-related branch mispredict with real packet size mix. A benchmark > with a real packet mix would be much better than the tests I've run. > > This needs to be compared, of course, with the overhead imposed by > conditionals included in other pktcpy() implementations.
If testing with fixed packet size, only one of the branches will be taken - always! And thus the branch predictor will always predict it correctly - in the test. So, if this code performs better than simple memcpy(), I can conclude that you are testing with packet size <= 256. > > > Looking at the ethdev packet size counters at an ISP (at the core of > their Layer 3 network), 71 % are 256 byte or larger [1]. > > > > For static branch prediction, I would consider > 256 more likely and > swap the two branches, i.e. compare (len > 256) instead of (len <= 256). > > > > OK, I'll add likely() instead, to make it more explicit. > > > But again: I don't know how the dynamic branch predictor behaves here. > Perhaps my suggested change makes no difference. > > > > I think it will, but it will be tiny. From what I understand, even when > the branch prediction guessed correctly, one receive a slight benefit if > the branch is not taken. > > >> + memcpy(dst, src, len); > >> +} > > > > With or without suggested change, > > Acked-by: Morten Brørup <m...@smartsharesystems.com> > > > > > > [1]: Details (incl. one VLAN tag) > > tx_size_64_packets 1,1 % > > tx_size_65_to_127_packets 25,7 % > > tx_size_128_to_255_packets 2,6 % > > tx_size_256_to_511_packets 1,4 % > > tx_size_512_to_1023_packets 1,7 % > > tx_size_1024_to_1522_packets 67,6 % > >