> > From: Konstantin Ananyev [mailto:[email protected]] > > Sent: Thursday, 23 October 2025 16.05 > > > > > > From: Konstantin Ananyev [mailto:[email protected]] > > > > Sent: Thursday, 23 October 2025 10.51 > > > > > > > > > -#define RTE_MBUF_DIRECT(mb) \ > > > > > - (!((mb)->ol_flags & (RTE_MBUF_F_INDIRECT | > > RTE_MBUF_F_EXTERNAL))) > > > > > + * > > > > > + * Note: Macro optimized for code size. > > > > > + * > > > > > + * The plain macro would be: > > > > > + * #define RTE_MBUF_DIRECT(mb) \ > > > > > + * (!((mb)->ol_flags & (RTE_MBUF_F_INDIRECT | > > > > RTE_MBUF_F_EXTERNAL))) > > > > > + * > > > > > + * The flags RTE_MBUF_F_INDIRECT and RTE_MBUF_F_EXTERNAL are > > both in > > > > the > > > > > MSB (most significant > > > > > + * byte) of the 64-bit ol_flags field, so we only compare this > > one > > > > byte instead of all > > > > > 64 bits. > > > > > + * > > > > > + * E.g., GCC version 16.0.0 20251019 (experimental) generates > > the > > > > following code > > > > > for x86-64. > > > > > + * > > > > > + * With the plain macro, 17 bytes of instructions: > > > > > + * movabs rax,0x6000000000000000 // 10 bytes > > > > > + * and rax,QWORD PTR [rdi+0x18] // 4 bytes > > > > > + * sete al // 3 bytes > > > > > + * With this optimized macro, only 7 bytes of instructions: > > > > > + * test BYTE PTR [rdi+0x1f],0x60 // 4 bytes > > > > > + * sete al // 3 bytes > > > > > + */ > > > > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN > > > > > +/* On little endian architecture, the MSB of a 64-bit integer is > > at > > > > byte offset 7. */ > > > > > +#define RTE_MBUF_DIRECT(mb) !(((const char *)(&(mb)- > > > > >ol_flags))[7] & 0x60) > > > > > +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN > > > > > +/* On big endian architecture, the MSB of a 64-bit integer is at > > > > byte offset 0. */ > > > > > +#define RTE_MBUF_DIRECT(mb) !(((const char *)(&(mb)- > > > > >ol_flags))[0] & 0x60) > > > > > > > > A stupid q: why then not simply do: > > > > (mb->ol_flags >> 56) & 0x60 > > > > then? > > > > Should help to all these LE/BE casts, etc. > > > > > > GCC is too stupid for that too. > > > > > > Playing around with Godbolt shows that > > > return !((char)(p[3] >> 56) & 0x60); > > > becomes > > > movzx eax,BYTE PTR [rdi+0x1f] // 4 bytes > > > test al,0x60 // 2 bytes > > > Instead of simply > > > test BYTE PTR [rdi+0x1f],0x60 // 4 bytes > > > > And these 2 extra bytes in instructions, are that really that critical? > > My guess, we wouldn't notice any real diff here. > > The optimized macro made the common code path of the refactored > rte_pktmbuf_prefree_seg() fit into one cache line. > IIRC, all 10 bytes saving were required for this.
I understand that. but is that change will provide a measurable impact, in terms of cycles/op or pps or so? > > But if it really is, can I ask you to create a new define for 0x60, > > to avoid hardcoded constants in the code? > > Might be something like > > #define RTE_MBUF_F_INDIRECT_EXTERNAL_1B ... > > or so. > > I started out using the field names, but Bruce suggested using 0x60 for > readability, making the macros shorter, which IMO looks good. > > I don't like adding special names just for this, so either we stick with 0x60 > or go for > "(char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 * > CHAR_BIT))", something like this: My vote would be to use the construction above. Might be put it in a new macro for readability. Konstantin > #ifdef __DOXYGEN__ > #define RTE_MBUF_DIRECT(mb) \ > !(((const char *)(&(mb)->ol_flags))[MSB_OFFSET /* 7 or 0, depending on > endianness */] & \ > (char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 * > CHAR_BIT))) > #else /* !__DOXYGEN__ */ > #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN > /* On little endian architecture, the MSB of a 64-bit integer is at byte > offset 7. */ > #define RTE_MBUF_DIRECT(mb) \ > !(((const char *)(&(mb)->ol_flags))[7] & \ > (char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 * > CHAR_BIT))) > #elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN > /* On big endian architecture, the MSB of a 64-bit integer is at byte offset > 0. */ > #define RTE_MBUF_DIRECT(mb) \ > !(((const char *)(&(mb)->ol_flags))[0] & \ > (char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 * > CHAR_BIT))) > #endif /* RTE_BYTE_ORDER */ > #endif /* !__DOXYGEN__ */ > /* Verify the optimization above. */ > static_assert(((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) & > (UINT64_C(0xFF) << (7 * CHAR_BIT))) == > (RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL), > "(RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) is not at MSB"); > > > Konstantin > > > > > Good suggestion, though. > > > > > > > > > > > > +#endif > > > > > +/* Verify the optimization above. */ > > > > > +static_assert((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) == > > > > > UINT64_C(0x60) << (7 * CHAR_BIT), > > > > > + "(RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) is not 0x60 > at > > > > MSB"); > > > > > > > > > > /** Uninitialized or unspecified port. */ > > > > > #define RTE_MBUF_PORT_INVALID UINT16_MAX > > > > > -- > > > > > 2.43.0

