Re: [dpdk-dev] dmadev discussion summary

fengchengwen Fri, 02 Jul 2021 06:45:30 -0700

On 2021/7/1 23:01, Jerin Jacob wrote:
>>   [key point]:
>>       -----------    -----------
>>       | channel |    | channel |
>>       -----------    -----------
>>              \           /
>>               \         /
>>                \       /
>>              ------------
>>              | HW-queue |
>>              ------------
>>                    |
>>                 --------
>>                 |rawdev|
>>                 --------
>>       1) User could create one channel by init context(dpi_dma_queue_ctx_s),
>>          this interface is not standardized and needs to be implemented by
>>          users.
>>       2) Different channels can support different transmissions, e.g. one for
>>          inner m2m, and other for inbound copy.
>>
>>       Overall, I think the 'channel' is similar the 'virt-queue' of 
>> dpaa2_qdma.
>>       The difference is that dpaa2_qdma supports multiple hardware queues. 
>> The
>>       'channel' has following
> 
> If dpaa2_qdma supports more than one HW queue, I think, it is good to
> have the queue notion
> in DPDK just like other DPDK device classes. It will be good to have
> confirmation from dpaa2 folks, @Hemant Agrawal,
> if there are really more than 1 HW queue in dppa device.
> 
> 
> IMO, Channel is a better name than a virtual queue. The reason is,
> virtual queue is more
> implementation-specific notation. No need to have this in API specification.
>


In the DPDK framework, many data-plane API names contain queues. e.g. 
eventdev/crypto..
The concept of virt queues has continuity.

>>       [dma_copy/memset/sg]: all has vq_id input parameter.
>>          Note: I notice dpaa can't support single and sg in one virt-queue, 
>> and
>>                I think it's maybe software implement policy other than HW
>>                restriction because virt-queue could share the same HW-queue.
>>       Here we use vq_id to tackle different scenario, like local-to-local/
>>       local-to-host and etc.
> 
> IMO, The index representation has an additional overhead as one needs
> to translate it
> to memory pointer. I prefer to avoid by having object handle and use
> _lookup() API get to make it work
> in multi-process cases to avoid the additional indirection. Like mempool 
> object.
> 

This solution was first considered, similar to rte_hash returning a handle.
It is not intuitive and has no obvious performance advantage. The number of
jump times of the data-plane API index-driven callback function is not 
optimized.

>>   5) And the dmadev public data-plane API (just prototype):
>>      dma_cookie_t rte_dmadev_memset(dev, vq_id, pattern, dst, len, flags)
>>        -- flags: used as an extended parameter, it could be uint32_t
>>      dma_cookie_t rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags)
>>      dma_cookie_t rte_dmadev_memcpy_sg(dev, vq_id, sg, sg_len, flags)
>>        -- sg: struct dma_scatterlist array
>>      uint16_t rte_dmadev_completed(dev, vq_id, dma_cookie_t *cookie,
>>                                    uint16_t nb_cpls, bool *has_error)
>>        -- nb_cpls: indicate max process operations number
>>        -- has_error: indicate if there is an error
>>        -- return value: the number of successful completed operations.
>>        -- example:
>>           1) If there are already 32 completed ops, and 4th is error, and
>>              nb_cpls is 32, then the ret will be 3(because 1/2/3th is OK), 
>> and
>>              has_error will be true.
>>           2) If there are already 32 completed ops, and all successful
>>              completed, then the ret will be min(32, nb_cpls), and has_error
>>              will be false.
>>           3) If there are already 32 completed ops, and all failed completed,
>>              then the ret will be 0, and has_error will be true.
> 
> +1. IMO, it is better to call ring_idx instead of a cookie. To enforce
> that it the ring index.
> 
>>      uint16_t rte_dmadev_completed_status(dev_id, vq_id, dma_cookie_t 
>> *cookie,
>>                                           uint16_t nb_status, uint32_t 
>> *status)
>>        -- return value: the number of failed completed operations.
> 
> See above. Here we are assuming it is an index otherwise we need to
> pass an array
> cookies.
> 
>>      And here I agree with Morten: we should design API which adapts to DPDK
>>      service scenarios. So we don't support some sound-cards DMA, and 2D 
>> memory
>>      copy which mainly used in video scenarios.
>>   6) The dma_cookie_t is signed int type, when <0 it mean error, it's
>>      monotonically increasing base on HW-queue (other than virt-queue). The
>>      driver needs to make sure this because the damdev framework don't manage
>>      the dma_cookie's creation.
> 
> +1 and see above.
> 
>>   7) Because data-plane APIs are not thread-safe, and user could determine
>>      virt-queue to HW-queue's map (at the queue-setup stage), so it is user's
>>      duty to ensure thread-safe.
> 
> +1. But I am not sure how easy for the fast-path application to have this 
> logic,
> Instead, I think, it is better to tell the capa for queue by driver
> and in channel configuration,
> the application can request for requirement (Is multiple producers enq
> to the same HW queue or not).
> Based on the request, the implementation can pick the correct function
> pointer for enq.(lock vs lockless version if HW does not support
> lockless)
> 

already redesigned. Please check the latest patch.

> 
>>   8) One example:
>>      vq_id = rte_dmadev_queue_setup(dev, config.{HW-queue-index=x, opaque});
>>      if (vq_id < 0) {
>>         // create virt-queue failed
>>         return;
>>      }
>>      // submit memcpy task
>>      cookit = rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags);
>>      if (cookie < 0) {
>>         // submit failed
>>         return;
>>      }
>>      // get complete task
>>      ret = rte_dmadev_completed(dev, vq_id, &cookie, 1, has_error);
>>      if (!has_error && ret == 1) {
>>         // the memcpy successful complete
>>      }
> 
> +1
> 
>>   9) As octeontx2_dma support sg-list which has many valid buffers in
>>      dpi_dma_buf_ptr_s, it could call the rte_dmadev_memcpy_sg API.
> 
> +1
> 
>>   10) As ioat, it could delcare support one HW-queue at dev_configure stage, 
>> and
>>       only support create one virt-queue.
>>   11) As dpaa2_qdma, I think it could migrate to new framework, but still 
>> wait
>>       for dpaa2_qdma guys feedback.
>>   12) About the prototype src/dst parameters of rte_dmadev_memcpy API, we 
>> have
>>       two candidates which are iova and void *, how about introduce 
>> dma_addr_t
>>       type which could be va or iova ?
> 
> I think, conversion looks ugly, better to have void * and share the
> constraints of void *
> as limitation/capability using flag. So that driver can update it.
> 

already change to void *

> 
>>
> 
> .
>

Re: [dpdk-dev] dmadev discussion summary

Reply via email to