Re: [dpdk-dev] [PATCH v5 3/4] vhost: support async dequeue for split ring

Maxime Coquelin Fri, 16 Jul 2021 00:46:08 -0700

Hi,

On 7/16/21 3:10 AM, Hu, Jiayu wrote:
> Hi, Maxime,
> 
>> -----Original Message-----
>> From: Maxime Coquelin <[email protected]>
>> Sent: Thursday, July 15, 2021 9:18 PM
>> To: Hu, Jiayu <[email protected]>; Ma, WenwuX <[email protected]>;
>> [email protected]
>> Cc: Xia, Chenbo <[email protected]>; Jiang, Cheng1
>> <[email protected]>; Wang, YuanX <[email protected]>
>> Subject: Re: [PATCH v5 3/4] vhost: support async dequeue for split ring
>>
>>
>>
>> On 7/14/21 8:50 AM, Hu, Jiayu wrote:
>>> Hi Maxime,
>>>
>>> Thanks for your comments. Applies are inline.
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <[email protected]>
>>>> Sent: Tuesday, July 13, 2021 10:30 PM
>>>> To: Ma, WenwuX <[email protected]>; [email protected]
>>>> Cc: Xia, Chenbo <[email protected]>; Jiang, Cheng1
>>>> <[email protected]>; Hu, Jiayu <[email protected]>; Wang, YuanX
>>>> <[email protected]>
>>>> Subject: Re: [PATCH v5 3/4] vhost: support async dequeue for split
>>>> ring
>>>>>  struct async_inflight_info {
>>>>>   struct rte_mbuf *mbuf;
>>>>> - uint16_t descs; /* num of descs inflight */
>>>>> + union {
>>>>> +         uint16_t descs; /* num of descs in-flight */
>>>>> +         struct async_nethdr nethdr;
>>>>> + };
>>>>>   uint16_t nr_buffers; /* num of buffers inflight for packed ring */
>>>>> -};
>>>>> +} __rte_cache_aligned;
>>>>
>>>> Does it really need to be cache aligned?
>>>
>>> How about changing to 32-byte align? So a cacheline can hold 2 objects.
>>
>> Or not forcing any alignment at all? Would there really be a performance
>> regression?
>>
>>>>
>>>>>
>>>>>  /**
>>>>>   *  dma channel feature bit definition @@ -193,4 +201,34 @@
>>>>> __rte_experimental  uint16_t rte_vhost_poll_enqueue_completed(int
>>>>> vid, uint16_t queue_id,
>>>>>           struct rte_mbuf **pkts, uint16_t count);
>>>>>
>>>>> +/**
>>>>> + * This function tries to receive packets from the guest with
>>>>> +offloading
>>>>> + * large copies to the DMA engine. Successfully dequeued packets
>>>>> +are
>>>>> + * transfer completed, either by the CPU or the DMA engine, and
>>>>> +they are
>>>>> + * returned in "pkts". There may be other packets that are sent
>>>>> +from
>>>>> + * the guest but being transferred by the DMA engine, called
>>>>> +in-flight
>>>>> + * packets. The amount of in-flight packets by now is returned in
>>>>> + * "nr_inflight". This function will return in-flight packets only
>>>>> +after
>>>>> + * the DMA engine finishes transferring.
>>>>
>>>> I am not sure to understand that comment. Is it still "in-flight" if
>>>> the DMA transfer is completed?
>>>
>>> "in-flight" means packet copies are submitted to the DMA, but the DMA
>>> hasn't completed copies.
>>>
>>>>
>>>> Are we ensuring packets are not reordered with this way of working?
>>>
>>> There is a threshold can be set by users. If set it to 0, which
>>> presents all packet copies assigned to the DMA, the packets sent from
>>> the guest will not be reordered.
>>
>> Reordering packets is bad in my opinion. We cannot expect the user to know
>> that he should set the threshold to zero to have packets ordered.
>>
>> Maybe we should consider not having threshold, and so have every
>> descriptors handled either by the CPU (sync datapath) or by the DMA (async
>> datapath). Doing so would simplify a lot the code, and would make
>> performance/latency more predictable.
>>
>> I understand that we might not get the best performance for every packet
>> size doing that, but that may be a tradeoff we would make to have the
>> feature maintainable and easily useable by the user.
> 
> I understand and agree in some way. But before changing the existed design
> in async enqueue and dequeue, we need more careful tests, as current design
> is well validated and performance looks good. So I suggest to do it in 21.11.


My understanding was that for enqueue path packets were not reordered,
thinking the used ring was written in order, but it seems I was wrong.

What kind of validation and performance testing has been done? I can
imagine reordering to have a bad impact on L4+ benchmarks.

Let's first fix this for enqueue path, then submit new revision for
dequeue path without packet reordering.

Regards,
Maxime

> Thanks,
> Jiayu
>

Re: [dpdk-dev] [PATCH v5 3/4] vhost: support async dequeue for split ring

Reply via email to