On 4/28/20 5:35 PM, Liu, Yong wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coque...@redhat.com>
>> Sent: Tuesday, April 28, 2020 10:50 PM
>> To: Liu, Yong <yong....@intel.com>; Ye, Xiaolong <xiaolong...@intel.com>;
>> Wang, Zhihong <zhihong.w...@intel.com>
>> Cc: dev@dpdk.org; Honnappa Nagarahalli
>> <honnappa.nagaraha...@arm.com>; jer...@marvell.com
>> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path
>>
>>
>>
>> On 4/28/20 4:43 PM, Liu, Yong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coque...@redhat.com>
>>>> Sent: Tuesday, April 28, 2020 9:46 PM
>>>> To: Liu, Yong <yong....@intel.com>; Ye, Xiaolong
>> <xiaolong...@intel.com>;
>>>> Wang, Zhihong <zhihong.w...@intel.com>
>>>> Cc: dev@dpdk.org; Honnappa Nagarahalli
>>>> <honnappa.nagaraha...@arm.com>; jer...@marvell.com
>>>> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx
>> path
>>>>
>>>>
>>>>
>>>> On 4/28/20 3:01 PM, Liu, Yong wrote:
>>>>>>> Maxime,
>>>>>>> Thanks for point it out, it will add extra cache miss in datapath.
>>>>>>> And its impact on performance is around 1% in loopback case.
>>>>>> Ok, thanks for doing the test. I'll try to run some PVP benchmarks
>>>>>> on my side because when doing IO loopback, the cache pressure is
>>>>>> much less important.
>>>>>>
>>>>>>> While benefit of vectorized path will be more than that number.
>>>>>> Ok, but I disagree for two reasons:
>>>>>>  1. You have to keep in mind than non-vectorized is the default and
>>>>>> encouraged mode to use. Indeed, it takes a lot of shortcuts like not
>>>>>> checking header length (so no error stats), etc...
>>>>>>
>>>>> Ok, I will keep non-vectorized same as before.
>>>>>
>>>>>>  2. It's like saying it's OK it degrades by 5% on $CPU_VENDOR_A
>> because
>>>>>> the gain is 20% on $CPU_VENDOR_B.
>>>>>>
>>>>>> In the case we see more degradation in real-world scenario, you might
>>>>>> want to consider using ifdefs to avoid adding padding in the non-
>>>>>> vectorized case, like you did to differentiate Virtio PMD to Virtio-user
>>>>>> PMD in patch 7.
>>>>>>
>>>>> Maxime,
>>>>> The performance difference is so slight, so I ignored for it look like a
>>>> sampling error.
>>>>
>>>> Agree for IO loopback, but it adds one more cache line access per burst,
>>>> which might be see in some real-life use cases.
>>>>
>>>>> It maybe not suitable to add new configuration for such setting which
>>>> only used inside driver.
>>>>
>>>> Wait, the Virtio-user #ifdef is based on the defconfig options? How can
>>>> it work since both Virtio PMD and Virtio-user PMD can be selected at the
>>>> same time?
>>>>
>>>> I thought it was a define set before the headers inclusion and unset
>>>> afterwards, but I didn't checked carefully.
>>>>
>>>
>>> Maxime,
>>> The difference between virtio PMD and Virtio-user PMD addresses is
>> handled by vq->offset.
>>>
>>> When virtio PMD is running, offset will be set to buf_iova.
>>> vq->offset = offsetof(struct rte_mbuf, buf_iova);
>>>
>>> When virtio_user PMD is running, offset will be set to buf_addr.
>>> vq->offset = offsetof(struct rte_mbuf, buf_addr);
>>
>> Ok, but below is a build time check:
>>
>> +#ifdef RTE_VIRTIO_USER
>> +    __m128i flag_offset = _mm_set_epi64x(flags_temp, (uint64_t)vq-
>>> offset);
>> +#else
>> +    __m128i flag_offset = _mm_set_epi64x(flags_temp, 0);
>> +#endif
>>
>> So how can it work for a single build for both Virtio and Virtio-user?
>>
> 
> Sorry, here is an implementation error. vq->offset should be used in 
> descs_base for getting the iova address. 
> It will work the same as VIRTIO_MBUF_ADDR macro.
> 
>>>>> Virtio driver can check whether virtqueue is using vectorized path when
>>>> initialization, will use padded structure if it is.
>>>>> I have added some tested code and now performance came back.  Since
>>>> code has changed in initialization process,  it need some time for
>> regression
>>>> check.
>>>>
>>>> Ok, works for me.
>>>>
>>>> I am investigating a linkage issue with your series, which does not
>>>> happen systematically (see below, it happens also with clang). David
>>>> pointed me to some Intel patches removing the usage if __rte_weak,
>>>> could it be related?
>>>>
>>>
>>> I checked David's patch, it only changed i40e driver. Meanwhile attribute
>> __rte_weak should still be in virtio_rxtx.c.
>>> I will follow David's patch, eliminate the usage of weak attribute.
>>
>> Yeah, I meant below issue could be linked to __rte_weak, not that i40e
>> patch was the cause of this problem.
>>
> 
> Maxime,
> I haven't seen any build issue related to __rte_weak both with gcc and clang. 
>   

Note that this build (which does not fail systematically) is when using
binutils 2.30, which cause AVX512 support to be disabled.

> Thanks,
> Marvin
> 

Reply via email to