> -----Original Message-----
> From: Maxime Coquelin <maxime.coque...@redhat.com>
> Sent: Tuesday, April 28, 2020 10:50 PM
> To: Liu, Yong <yong....@intel.com>; Ye, Xiaolong <xiaolong...@intel.com>;
> Wang, Zhihong <zhihong.w...@intel.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; jer...@marvell.com
> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path
> 
> 
> 
> On 4/28/20 4:43 PM, Liu, Yong wrote:
> >
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coque...@redhat.com>
> >> Sent: Tuesday, April 28, 2020 9:46 PM
> >> To: Liu, Yong <yong....@intel.com>; Ye, Xiaolong
> <xiaolong...@intel.com>;
> >> Wang, Zhihong <zhihong.w...@intel.com>
> >> Cc: dev@dpdk.org; Honnappa Nagarahalli
> >> <honnappa.nagaraha...@arm.com>; jer...@marvell.com
> >> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx
> path
> >>
> >>
> >>
> >> On 4/28/20 3:01 PM, Liu, Yong wrote:
> >>>>> Maxime,
> >>>>> Thanks for point it out, it will add extra cache miss in datapath.
> >>>>> And its impact on performance is around 1% in loopback case.
> >>>> Ok, thanks for doing the test. I'll try to run some PVP benchmarks
> >>>> on my side because when doing IO loopback, the cache pressure is
> >>>> much less important.
> >>>>
> >>>>> While benefit of vectorized path will be more than that number.
> >>>> Ok, but I disagree for two reasons:
> >>>>  1. You have to keep in mind than non-vectorized is the default and
> >>>> encouraged mode to use. Indeed, it takes a lot of shortcuts like not
> >>>> checking header length (so no error stats), etc...
> >>>>
> >>> Ok, I will keep non-vectorized same as before.
> >>>
> >>>>  2. It's like saying it's OK it degrades by 5% on $CPU_VENDOR_A
> because
> >>>> the gain is 20% on $CPU_VENDOR_B.
> >>>>
> >>>> In the case we see more degradation in real-world scenario, you might
> >>>> want to consider using ifdefs to avoid adding padding in the non-
> >>>> vectorized case, like you did to differentiate Virtio PMD to Virtio-user
> >>>> PMD in patch 7.
> >>>>
> >>> Maxime,
> >>> The performance difference is so slight, so I ignored for it look like a
> >> sampling error.
> >>
> >> Agree for IO loopback, but it adds one more cache line access per burst,
> >> which might be see in some real-life use cases.
> >>
> >>> It maybe not suitable to add new configuration for such setting which
> >> only used inside driver.
> >>
> >> Wait, the Virtio-user #ifdef is based on the defconfig options? How can
> >> it work since both Virtio PMD and Virtio-user PMD can be selected at the
> >> same time?
> >>
> >> I thought it was a define set before the headers inclusion and unset
> >> afterwards, but I didn't checked carefully.
> >>
> >
> > Maxime,
> > The difference between virtio PMD and Virtio-user PMD addresses is
> handled by vq->offset.
> >
> > When virtio PMD is running, offset will be set to buf_iova.
> > vq->offset = offsetof(struct rte_mbuf, buf_iova);
> >
> > When virtio_user PMD is running, offset will be set to buf_addr.
> > vq->offset = offsetof(struct rte_mbuf, buf_addr);
> 
> Ok, but below is a build time check:
> 
> +#ifdef RTE_VIRTIO_USER
> +     __m128i flag_offset = _mm_set_epi64x(flags_temp, (uint64_t)vq-
> >offset);
> +#else
> +     __m128i flag_offset = _mm_set_epi64x(flags_temp, 0);
> +#endif
> 
> So how can it work for a single build for both Virtio and Virtio-user?
> 

Sorry, here is an implementation error. vq->offset should be used in descs_base 
for getting the iova address. 
It will work the same as VIRTIO_MBUF_ADDR macro.

> >>> Virtio driver can check whether virtqueue is using vectorized path when
> >> initialization, will use padded structure if it is.
> >>> I have added some tested code and now performance came back.  Since
> >> code has changed in initialization process,  it need some time for
> regression
> >> check.
> >>
> >> Ok, works for me.
> >>
> >> I am investigating a linkage issue with your series, which does not
> >> happen systematically (see below, it happens also with clang). David
> >> pointed me to some Intel patches removing the usage if __rte_weak,
> >> could it be related?
> >>
> >
> > I checked David's patch, it only changed i40e driver. Meanwhile attribute
> __rte_weak should still be in virtio_rxtx.c.
> > I will follow David's patch, eliminate the usage of weak attribute.
> 
> Yeah, I meant below issue could be linked to __rte_weak, not that i40e
> patch was the cause of this problem.
> 

Maxime,
I haven't seen any build issue related to __rte_weak both with gcc and clang.   

Thanks,
Marvin

Reply via email to