SMP operations can be very expensive, sometimes can impact operations by 100s 
to 1000s of clock cycles depending on what is the circumstances of the 
synchronization. It is how you arrange the SMP operations within the tasks at 
hand across the SMP cores that gives methods for top performance.  Using 
traditional general purpose SMP methods will result in traditional general 
purpose performance. Migrating to general libraries (understood by most general 
purpose programmers) from expert abilities (understood by much smaller group of 
expert programmers focused on performance) will greatly reduce the value of 
DPDK since the end result will be lower performance and/or have less 
predictable operation where rate performance, predictability, and low latency 
are the primary goals.

The best method to date, is to have multiple outputs to a single port is to use 
a DPDK queue with multiple producer, single consumer to do an SMP operation for 
multiple sources to feed a single non SMP task to output to the port (that is 
why the ports are not SMP protected). Also when considerable contention from 
multiple sources occur often (data feeding at same time), having DPDK queue 
with input and output variables  in separate cache lines can have a notable 
throughput improvement.

Mike 

-----Original Message-----
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Xie, Huawei
Sent: Tuesday, January 19, 2016 8:44 AM
To: Tan, Jianfeng; dev at dpdk.org
Cc: ann.zhuangyanying at huawei.com
Subject: Re: [dpdk-dev] [PATCH] vhost: remove lockless enqueue to the virtio 
ring

On 1/20/2016 12:25 AM, Tan, Jianfeng wrote:
> Hi Huawei,
>
> On 1/4/2016 10:46 PM, Huawei Xie wrote:
>> This patch removes the internal lockless enqueue implmentation.
>> DPDK doesn't support receiving/transmitting packets from/to the same 
>> queue. Vhost PMD wraps vhost device as normal DPDK port. DPDK 
>> applications normally have their own lock implmentation when enqueue 
>> packets to the same queue of a port.
>>
>> The atomic cmpset is a costly operation. This patch should help 
>> performance a bit.
>>
>> Signed-off-by: Huawei Xie <huawei.xie at intel.com>
>> ---
>>   lib/librte_vhost/vhost_rxtx.c | 86
>> +++++++++++++------------------------------
>>   1 file changed, 25 insertions(+), 61 deletions(-)
>>
>> diff --git a/lib/librte_vhost/vhost_rxtx.c 
>> b/lib/librte_vhost/vhost_rxtx.c index bbf3fac..26a1b9c 100644
>> --- a/lib/librte_vhost/vhost_rxtx.c
>> +++ b/lib/librte_vhost/vhost_rxtx.c
>
> I think vhost example will not work well with this patch when
> vm2vm=software.
>
> Test case:
> Two virtio ports handled by two pmd threads. Thread 0 polls pkts from
> physical NIC and sends to virtio0, while thread0 receives pkts from
> virtio1 and routes it to virtio0.

vhost port will be wrapped as port, by vhost PMD. DPDK APP treats all
physical and virtual ports as ports equally. When two DPDK threads try
to enqueue to the same port, the APP needs to consider the contention.
All the physical PMDs doesn't support concurrent enqueuing/dequeuing.
Vhost PMD should expose the same behavior unless absolutely necessary
and we expose the difference of different PMD.

>
>> -
>>           *(volatile uint16_t *)&vq->used->idx += entry_success;
>
> Another unrelated question: We ever try to move this assignment out of
> loop to save cost as it's a data contention?

This operation itself is not that costly, but it has side effect on the
cache transfer.
It is outside of the loop for non-mergable case. For mergeable case, it
is inside the loop.
Actually it has pro and cons whether we do this in burst or in a smaller
step. I prefer to move it outside of the loop. Let us address this later.

>
> Thanks,
> Jianfeng
>
>

Reply via email to