> -----Original Message----- > From: Thomas Monjalon <[email protected]> > Sent: Wednesday, October 28, 2020 6:08 PM > To: Nithin Dabilpuram <[email protected]>; Van Haaren, Harry > <[email protected]> > Cc: [email protected]; Pavan Nikhilesh <[email protected]>; Jerin Jacob > <[email protected]>; Ruifeng Wang <[email protected]>; Richardson, Bruce > <[email protected]>; Ananyev, Konstantin > <[email protected]>; [email protected]; [email protected]; > [email protected]; [email protected] > Subject: Re: [dpdk-dev] [PATCH v4] node: switch IPv4 metadata to dynamic mbuf > field > > 28/10/2020 11:24, Van Haaren, Harry: > > From: Thomas Monjalon > > > > + IP4_LOOKUP_NODE_PRIV1_OFF(node->ctx) = > node_mbuf_priv1_dynfield_offset; > > > > > > That's interesting. > > > You copy the offset in the node context for better performance. > > > How much is it better than with global offset variable? > > > How much it decreases compared to a static mbuf field? > > > > Also interested in this topic, I'll offer the logical/theory point of view; > > > > With a static field, the offset into the mbuf can be encoded in the > > instruction > > stream, meaning there are no d-cache loads to identify particular dynamic > > field. > > > > With a static/global variable, the cache line where the value resides is > > presumably > > not hot in cache per burst (assuming an application that does significant > > work, so > not > > in cache since last burst). Hence overhead estimate could be 1x cache line > > load per > burst. > > Would it help to group all dynfields and dynflags offsets > in the same cache line?
It could - but if/how-much it would benefit depends on the workload I think. Using each cache line fully is always good, so if grouping the offsets together is reasonable to do, it seems a good idea. My assumptions is that registration of dynamic fields/flags is expected at init time, and that the values remain constant at runtime. That would make this a cache-line in "shared" state in each core that uses the dynfields of mbuf. Overall, it is unlikely to have much impact on a real-world application.. but DPDK puts performance first! And packing a single cache-line full of hot data is best practice :)

