Hi Bruce, > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, > Konstantin > Sent: Thursday, April 14, 2016 3:00 PM > To: Richardson, Bruce <bruce.richardson at intel.com>; dev at dpdk.org > Cc: Zhang, Helin <helin.zhang at intel.com>; Wu, Jingjing > <jingjing.wu at intel.com> > Subject: Re: [dpdk-dev] [PATCH] i40e: improve performance of vector PMD > > > > > -----Original Message----- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson > > Sent: Thursday, April 14, 2016 2:50 PM > > To: dev at dpdk.org > > Cc: Zhang, Helin; Wu, Jingjing > > Subject: Re: [dpdk-dev] [PATCH] i40e: improve performance of vector > > PMD > > > > On Thu, Apr 14, 2016 at 11:15:21AM +0100, Bruce Richardson wrote: > > > An analysis of the i40e code using Intel? VTune? Amplifier 2016 > > > showed that the code was unexpectedly causing stalls due to "Loads > > > blocked by Store Forwards". This can occur when a load from memory > > > has to wait due to the prior store being to the same address, but > > > being of a smaller size i.e. the stored value cannot be directly returned > > > to > the loader. > > > [See ref: https://software.intel.com/en-us/node/544454] > > > > > > These stalls are due to the way in which the data_len values are > > > handled in the driver. The lengths are extracted using vector > > > operations, but those 16-bit lengths are then assigned using scalar > > > operations i.e. 16-bit stores. > > > > > > These regular 16-bit stores actually have two effects in the code: > > > * they cause the "Loads blocked by Store Forwards" issues reported > > > * they also cause the previous loads in the RX function to actually > > > be a load followed by a store to an address on the stack, because > > > the 16-bit assignment can't be done to an xmm register. > > > > > > By converting the 16-bit stores operations into a sequence of SSE > > > blend operations, we can ensure that the descriptor loads only occur > > > once, and avoid both the additional store and loads from the stack, > > > as well as the stalls due to the second loads being blocked. > > > > > > Signed-off-by: Bruce Richardson <bruce.richardson at intel.com> > > > > > Self-NAK on this version. The blend instruction used is SSE4.1 so > > breaks the "default" build. > > > > Two obvious options to fix this: > > 1. Keep the old code with SSE4.1 #ifdefs separating old and new 2. > > Update the vpmd requirement to SSE4.1, and factor that in during > > runtime select of the RX code path. > > > > Personally, I prefer the second option. Any objections? > > +1 for second one. > > > > > /Bruce
I am using the "default" build when building in VM's, will both options work for me? Regards, Bernard.