> -----Original Message----- > From: dev <[email protected]> On Behalf Of Thomas Monjalon > Sent: Wednesday, October 28, 2020 10:09 AM > To: Nithin Dabilpuram <[email protected]> > Cc: Pavan Nikhilesh <[email protected]>; Jerin Jacob > <[email protected]>; Ruifeng Wang <[email protected]>; Richardson, Bruce > <[email protected]>; Ananyev, Konstantin > <[email protected]>; [email protected]; [email protected]; > [email protected]; [email protected] > Subject: Re: [dpdk-dev] [PATCH v4] node: switch IPv4 metadata to dynamic mbuf > field > > 28/10/2020 10:30, Nithin Dabilpuram: > > From: Thomas Monjalon <[email protected]> > > > > The node_mbuf_priv1 was stored in the deprecated mbuf field udata64. > > It is moved to a dynamic field in order to allow removal of udata64. > > > > Signed-off-by: Thomas Monjalon <[email protected]> > > Signed-off-by: Nithin Dabilpuram <[email protected]> > [...] > > + IP4_LOOKUP_NODE_PRIV1_OFF(node->ctx) = > node_mbuf_priv1_dynfield_offset; > > That's interesting. > You copy the offset in the node context for better performance. > How much is it better than with global offset variable? > How much it decreases compared to a static mbuf field?
Also interested in this topic, I'll offer the logical/theory point of view; With a static field, the offset into the mbuf can be encoded in the instruction stream, meaning there are no d-cache loads to identify particular dynamic field. With a static/global variable, the cache line where the value resides is presumably not hot in cache per burst (assuming an application that does significant work, so not in cache since last burst). Hence overhead estimate could be 1x cache line load per burst. With the data copied into the node, the offset is presumably on a hot cache line as the node is using other data-members of its context. As a result, perhaps a cold static cache line load is converted to a hot node-context line re-use. Real world overhead likely depends on A) does the application cache-trash enough to make the static/global line fall out of cache - causing perf degradation due to reload, and B) does the node->ctx still fit in the same number of lines as before if the value is copied there.

