On 1/20/2016 2:33 AM, Polehn, Mike A wrote:
> SMP operations can be very expensive, sometimes can impact operations by 100s 
> to 1000s of clock cycles depending on what is the circumstances of the 
> synchronization. It is how you arrange the SMP operations within the tasks at 
> hand across the SMP cores that gives methods for top performance.  Using 
> traditional general purpose SMP methods will result in traditional general 
> purpose performance. Migrating to general libraries (understood by most 
> general purpose programmers) from expert abilities (understood by much 
> smaller group of expert programmers focused on performance) will greatly 
> reduce the value of DPDK since the end result will be lower performance 
> and/or have less predictable operation where rate performance, 
> predictability, and low latency are the primary goals.
>
> The best method to date, is to have multiple outputs to a single port is to 
> use a DPDK queue with multiple producer, single consumer to do an SMP 
> operation for multiple sources to feed a single non SMP task to output to the 
> port (that is why the ports are not SMP protected). Also when considerable 
> contention from multiple sources occur often (data feeding at same time), 
> having DPDK queue with input and output variables  in separate cache lines 
> can have a notable throughput improvement.
>
> Mike 

Mike:
Thanks for detailed explanation. Do you have comment to this patch?

>
> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xie, Huawei
> Sent: Tuesday, January 19, 2016 8:44 AM
> To: Tan, Jianfeng; dev at dpdk.org
> Cc: ann.zhuangyanying at huawei.com
> Subject: Re: [dpdk-dev] [PATCH] vhost: remove lockless enqueue to the virtio 
> ring
>
> On 1/20/2016 12:25 AM, Tan, Jianfeng wrote:
>> Hi Huawei,
>>
>> On 1/4/2016 10:46 PM, Huawei Xie wrote:
>>> This patch removes the internal lockless enqueue implmentation.
>>> DPDK doesn't support receiving/transmitting packets from/to the same 
>>> queue. Vhost PMD wraps vhost device as normal DPDK port. DPDK 
>>> applications normally have their own lock implmentation when enqueue 
>>> packets to the same queue of a port.
>>>
>>> The atomic cmpset is a costly operation. This patch should help 
>>> performance a bit.
>>>
>>> Signed-off-by: Huawei Xie <huawei.xie at intel.com>
>>> ---
>>>   lib/librte_vhost/vhost_rxtx.c | 86
>>> +++++++++++++------------------------------
>>>   1 file changed, 25 insertions(+), 61 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_rxtx.c 
>>> b/lib/librte_vhost/vhost_rxtx.c index bbf3fac..26a1b9c 100644
>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>> I think vhost example will not work well with this patch when
>> vm2vm=software.
>>
>> Test case:
>> Two virtio ports handled by two pmd threads. Thread 0 polls pkts from
>> physical NIC and sends to virtio0, while thread0 receives pkts from
>> virtio1 and routes it to virtio0.
> vhost port will be wrapped as port, by vhost PMD. DPDK APP treats all
> physical and virtual ports as ports equally. When two DPDK threads try
> to enqueue to the same port, the APP needs to consider the contention.
> All the physical PMDs doesn't support concurrent enqueuing/dequeuing.
> Vhost PMD should expose the same behavior unless absolutely necessary
> and we expose the difference of different PMD.
>
>>> -
>>>           *(volatile uint16_t *)&vq->used->idx += entry_success;
>> Another unrelated question: We ever try to move this assignment out of
>> loop to save cost as it's a data contention?
> This operation itself is not that costly, but it has side effect on the
> cache transfer.
> It is outside of the loop for non-mergable case. For mergeable case, it
> is inside the loop.
> Actually it has pro and cons whether we do this in burst or in a smaller
> step. I prefer to move it outside of the loop. Let us address this later.
>
>> Thanks,
>> Jianfeng
>>
>>
>

Reply via email to