On 10/26/22 17:44, Morten Brørup wrote:
Add __rte_cache_aligned to the objs array.It makes no difference in the general case, but if get/put operations are always 32 objects, it will reduce the number of memory (or last level cache) accesses from five to four 64 B cache lines for every get/put operation. For readability reasons, an example using 16 objects follows: Currently, with 16 objects (128B), we access to 3 cache lines: ┌────────┐ │len │ cache │********│--- line0 │********│ ^ │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line1 │********│ | │********│ | ├────────┤ | │********│_v_ cache │ │ line2 │ │ │ │ └────────┘ With the alignment, it is also 3 cache lines: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤--- │********│ ^ cache │********│ | line1 │********│ | │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line2 │********│ | │********│ v └────────┘--- However, accessing the objects at the bottom of the mempool cache is a special case, where cache line0 is also used for objects. Consider the next burst (and any following bursts): Current: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤ │ │ cache │ │ line1 │ │ │ │ ├────────┤ │ │ cache │********│--- line2 │********│ ^ │********│ | ├────────┤ | 16 objects │********│ | 128B cache │********│ | line3 │********│ | │********│ | ├────────┤ | │********│_v_ cache │ │ line4 │ │ │ │ └────────┘ 4 cache lines touched, incl. line0 for len. With the proposed alignment: ┌────────┐ │len │ cache │ │ line0 │ │ │ │ ├────────┤ │ │ cache │ │ line1 │ │ │ │ ├────────┤ │ │ cache │ │ line2 │ │ │ │ ├────────┤ │********│--- cache │********│ ^ line3 │********│ | │********│ | 16 objects ├────────┤ | 128B │********│ | cache │********│ | line4 │********│ | │********│_v_ └────────┘ Only 3 cache lines touched, incl. line0 for len. Credits go to Olivier Matz for the nice ASCII graphics. Signed-off-by: Morten Brørup <[email protected]>
Reviewed-by: Andrew Rybchenko <[email protected]>

