On Wed, Oct 28, 2020 at 10:24:01AM +0000, Van Haaren, Harry wrote: > > -----Original Message----- > > From: dev <[email protected]> On Behalf Of Thomas Monjalon > > Sent: Wednesday, October 28, 2020 10:09 AM > > To: Nithin Dabilpuram <[email protected]> > > Cc: Pavan Nikhilesh <[email protected]>; Jerin Jacob > > <[email protected]>; Ruifeng Wang <[email protected]>; Richardson, Bruce > > <[email protected]>; Ananyev, Konstantin > > <[email protected]>; [email protected]; [email protected]; > > [email protected]; [email protected] > > Subject: Re: [dpdk-dev] [PATCH v4] node: switch IPv4 metadata to dynamic > > mbuf > > field > > > > 28/10/2020 10:30, Nithin Dabilpuram: > > > From: Thomas Monjalon <[email protected]> > > > > > > The node_mbuf_priv1 was stored in the deprecated mbuf field udata64. > > > It is moved to a dynamic field in order to allow removal of udata64. > > > > > > Signed-off-by: Thomas Monjalon <[email protected]> > > > Signed-off-by: Nithin Dabilpuram <[email protected]> > > [...] > > > + IP4_LOOKUP_NODE_PRIV1_OFF(node->ctx) = > > node_mbuf_priv1_dynfield_offset; > > > > That's interesting. > > You copy the offset in the node context for better performance. > > How much is it better than with global offset variable? > > How much it decreases compared to a static mbuf field? > > Also interested in this topic, I'll offer the logical/theory point of view; > > With a static field, the offset into the mbuf can be encoded in the > instruction > stream, meaning there are no d-cache loads to identify particular dynamic > field. > > With a static/global variable, the cache line where the value resides is > presumably > not hot in cache per burst (assuming an application that does significant > work, so not > in cache since last burst). Hence overhead estimate could be 1x cache line > load per burst. > > With the data copied into the node, the offset is presumably on a hot cache > line as the > node is using other data-members of its context. As a result, perhaps a cold > static cache > line load is converted to a hot node-context line re-use. > > Real world overhead likely depends on A) does the application cache-trash > enough to make > the static/global line fall out of cache - causing perf degradation due to > reload, and B) does > the node->ctx still fit in the same number of lines as before if the value is > copied there.
Agreed, node->ctx is already referred to get other data (lpm pointer). So referening another 4 bytes might even convert that to load pair which is at no extra cost. Number's wise, it decreases by ~1.4 % from static mbuf field to global offset variable and it decreases by ~1% from static mbuf field to node context field cached per process call

