> -----Original Message----- > From: Jianbo Liu [mailto:jianbo.liu at linaro.org] > Sent: Wednesday, September 21, 2016 8:54 PM > To: Wang, Zhihong <zhihong.wang at intel.com> > Cc: Maxime Coquelin <maxime.coquelin at redhat.com>; dev at dpdk.org; > yuanhan.liu at linux.intel.com > Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue > > On 21 September 2016 at 17:27, Wang, Zhihong <zhihong.wang at intel.com> > wrote: > > > > > >> -----Original Message----- > >> From: Jianbo Liu [mailto:jianbo.liu at linaro.org] > >> Sent: Wednesday, September 21, 2016 4:50 PM > >> To: Maxime Coquelin <maxime.coquelin at redhat.com> > >> Cc: Wang, Zhihong <zhihong.wang at intel.com>; dev at dpdk.org; > >> yuanhan.liu at linux.intel.com > >> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue > >> > >> Hi Maxime, > >> > >> On 22 August 2016 at 16:11, Maxime Coquelin > >> <maxime.coquelin at redhat.com> wrote: > >> > Hi Zhihong, > >> > > >> > On 08/19/2016 07:43 AM, Zhihong Wang wrote: > >> >> > >> >> This patch set optimizes the vhost enqueue function. > >> >> > >> ... > >> > >> > > >> > My setup consists of one host running a guest. > >> > The guest generates as much 64bytes packets as possible using > >> > >> Have you tested with other different packet size? > >> My testing shows that performance is dropping when packet size is more > >> than 256. > > > > > > Hi Jianbo, > > > > Thanks for reporting this. > > > > 1. Are you running the vector frontend with mrg_rxbuf=off? > > > > 2. Could you please specify what CPU you're running? Is it Haswell > > or Ivy Bridge? > > > > 3. How many percentage of drop are you seeing? > > > > This is expected by me because I've already found the root cause and > > the way to optimize it, but since it missed the v0 deadline and > > requires changes in eal/memcpy, I postpone it to the next release. > > > > After the upcoming optimization the performance for packets larger > > than 256 will be improved, and the new code will be much faster than > > the current code. > > > > Sorry, I tested on an ARM server, but I wonder if there is the same > issue for x86 platform.
For mrg_rxbuf=off path it might be slight drop for packets larger than 256B (~3% for 512B and ~1% for 1024B), no drop for other cases. This is not a bug or issue, only we need to enhance memcpy to complete the whole optimization, which should be done in a separated patch, unfortunately it misses this release window. > > >> > pktgen-dpdk. The hosts forwards received packets back to the guest > >> > using testpmd on vhost pmd interface. Guest's vCPUs are pinned to > >> > physical CPUs. > >> > > >> > I tested it with and without your v1 patch, with and without > >> > rx-mergeable feature turned ON. > >> > Results are the average of 8 runs of 60 seconds: > >> > > >> > Rx-Mergeable ON : 7.72Mpps > >> > Rx-Mergeable ON + "vhost: optimize enqueue" v1: 9.19Mpps > >> > Rx-Mergeable OFF: 10.52Mpps > >> > Rx-Mergeable OFF + "vhost: optimize enqueue" v1: 10.60Mpps > >> >