09/10/2025 19:35, Morten Brørup: > > From: Bruce Richardson [mailto:[email protected]] > > > + m->pkt_len = 0; > > > + m->tx_offload = 0; > > > + m->vlan_tci = 0; > > > + m->vlan_tci_outer = 0; > > > + m->port = RTE_MBUF_PORT_INVALID; > > > > Have you considered doing all initialization using 64-bit stores? It's > > generally cheaper to do a single 64-bit store than e.g. set of 16-bit > > ones. > > The code is basically copy-paste from rte_pktmbuf_reset(). > I kept it the same way for readability. > > > This also means that we could remove the restriction on having refcnt > > and > > nb_segs already set. As in PMDs, a single store can init data_off, > > ref_cnt, > > nb_segs and port. > > Yes, I have given the concept a lot of thought already. > If we didn't require mbufs residing in the mempool to have any fields > initialized, specifically "next" and "nb_segs", it would improve performance > for drivers freeing mbufs back to the mempool, because writing to the mbufs > would no longer be required at that point; the mbufs could simply be freed > back to the mempool. Instead, we would require the driver to initialize these > fields - which it probably does on RX anyway, if it supports segmented > packets. > But I consider this concept a major API change, also affecting applications > assuming that these fields are initialized when allocating raw mbufs from the > mempool. So I haven't pursued it. > > > > > Similarly for packet_type and pkt_len, and data_len/vlan_tci and rss > > fields > > etc. For max performance, the whole of the mbuf cleared here can be > > done in > > 40 bytes, or 5 64-bit stores. If we do the stores in order, possibly > > the > > compiler can even opportunistically coalesce more stores, so we could > > even > > end up getting 128-bit or larger stores depending on the ISA compiled > > for. > > [Maybe the compiler will do this even if they are not in order, but I'd > > like to maximize my chances here! :-)]
Morten, you didn't reply to this. Can we optimize more with big stores?

