On Fri, Jun 18, 2021 at 3:11 PM fengchengwen <fengcheng...@huawei.com> wrote: > > On 2021/6/18 13:52, Jerin Jacob wrote: > > On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson > > <bruce.richard...@intel.com> wrote: > >> > >> On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote: > >>> On Wed, Jun 16, 2021 at 3:47 PM fengchengwen <fengcheng...@huawei.com> > >>> wrote: > >>>> > >>>> On 2021/6/16 15:09, Morten Brørup wrote: > >>>>>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson > >>>>>> Sent: Tuesday, 15 June 2021 18.39 > >>>>>> > >>>>>> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote: > >>>>>>> This patch introduces 'dmadevice' which is a generic type of DMA > >>>>>>> device. > >>>>>>> > >>>>>>> The APIs of dmadev library exposes some generic operations which can > >>>>>>> enable configuration and I/O with the DMA devices. > >>>>>>> > >>>>>>> Signed-off-by: Chengwen Feng <fengcheng...@huawei.com> > >>>>>>> --- > >>>>>> Thanks for sending this. > >>>>>> > >>>>>> Of most interest to me right now are the key data-plane APIs. While we > >>>>>> are > >>>>>> still in the prototyping phase, below is a draft of what we are > >>>>>> thinking > >>>>>> for the key enqueue/perform_ops/completed_ops APIs. > >>>>>> > >>>>>> Some key differences I note in below vs your original RFC: > >>>>>> * Use of void pointers rather than iova addresses. While using iova's > >>>>>> makes > >>>>>> sense in the general case when using hardware, in that it can work > >>>>>> with > >>>>>> both physical addresses and virtual addresses, if we change the APIs > >>>>>> to use > >>>>>> void pointers instead it will still work for DPDK in VA mode, while > >>>>>> at the > >>>>>> same time allow use of software fallbacks in error cases, and also a > >>>>>> stub > >>>>>> driver than uses memcpy in the background. Finally, using iova's > >>>>>> makes the > >>>>>> APIs a lot more awkward to use with anything but mbufs or similar > >>>>>> buffers > >>>>>> where we already have a pre-computed physical address. > >>>>>> * Use of id values rather than user-provided handles. Allowing the > >>>>>> user/app > >>>>>> to manage the amount of data stored per operation is a better > >>>>>> solution, I > >>>>>> feel than proscribing a certain about of in-driver tracking. Some > >>>>>> apps may > >>>>>> not care about anything other than a job being completed, while other > >>>>>> apps > >>>>>> may have significant metadata to be tracked. Taking the user-context > >>>>>> handles out of the API also makes the driver code simpler. > >>>>>> * I've kept a single combined API for completions, which differs from > >>>>>> the > >>>>>> separate error handling completion API you propose. I need to give > >>>>>> the > >>>>>> two function approach a bit of thought, but likely both could work. > >>>>>> If we > >>>>>> (likely) never expect failed ops, then the specifics of error > >>>>>> handling > >>>>>> should not matter that much. > >>>>>> > >>>>>> For the rest, the control / setup APIs are likely to be rather > >>>>>> uncontroversial, I suspect. However, I think that rather than xstats > >>>>>> APIs, > >>>>>> the library should first provide a set of standardized stats like > >>>>>> ethdev > >>>>>> does. If driver-specific stats are needed, we can add xstats later to > >>>>>> the > >>>>>> API. > >>>>>> > >>>>>> Appreciate your further thoughts on this, thanks. > >>>>>> > >>>>>> Regards, > >>>>>> /Bruce > >>>>> > >>>>> I generally agree with Bruce's points above. > >>>>> > >>>>> I would like to share a couple of ideas for further discussion: > >>> > >>> > >>> I believe some of the other requirements and comments for generic DMA > >>> will be > >>> > >>> 1) Support for the _channel_, Each channel may have different > >>> capabilities and functionalities. > >>> Typical cases are, each channel have separate source and destination > >>> devices like > >>> DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe > >>> EP to PCIe EP. > >>> So we need some notion of the channel in the specification. > >>> > >> > >> Can you share a bit more detail on what constitutes a channel in this case? > >> Is it equivalent to a device queue (which we are flattening to individual > >> devices in this API), or to a specific configuration on a queue? > > > > It not a queue. It is one of the attributes for transfer. > > I.e in the same queue, for a given transfer it can specify the > > different "source" and "destination" device. > > Like CPU to Sound card, CPU to network card etc. > > > > > >> > >>> 2) I assume current data plane APIs are not thread-safe. Is it right? > >>> > >> Yes. > >> > >>> > >>> 3) Cookie scheme outlined earlier looks good to me. Instead of having > >>> generic dequeue() API > >>> > >>> 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src, > >>> void * dst, unsigned int length); > >>> to two stage API like, Where one will be used in fastpath and other > >>> one will use used in slowpath. > >>> > >>> - slowpath API will for take channel and take other attributes for > >>> transfer > >>> > >>> Example syantx will be: > >>> > >>> struct rte_dmadev_desc { > >>> channel id; > >>> ops ; // copy, xor, fill etc > >>> other arguments specific to dma transfer // it can be set > >>> based on capability. > >>> > >>> }; > >>> > >>> rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id, struct > >>> rte_dmadev_desc *dec); > >>> > >>> - Fastpath takes arguments that need to change per transfer along with > >>> slow-path handle. > >>> > >>> rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, unsigned > >>> int length, rte_dmadev_desc_t desc) > >>> > >>> This will help to driver to > >>> -Former API form the device-specific descriptors in slow path for a > >>> given channel and fixed attributes per transfer > >>> -Later API blend "variable" arguments such as src, dest address with > >>> slow-path created descriptors > >>> > >> > >> This seems like an API for a context-aware device, where the channel is the > >> config data/context that is preserved across operations - is that correct? > >> At least from the Intel DMA accelerators side, we have no concept of this > >> context, and each operation is completely self-described. The location or > >> type of memory for copies is irrelevant, you just pass the src/dst > >> addresses to reference. > > > > it is not context-aware device. Each HW JOB is self-described. > > You can view it different attributes of transfer. > > > > > >> > >>> The above will give better performance and is the best trade-off c > >>> between performance and per transfer variables. > >> > >> We may need to have different APIs for context-aware and context-unaware > >> processing, with which to use determined by the capabilities discovery. > >> Given that for these DMA devices the offload cost is critical, more so than > >> any other dev class I've looked at before, I'd like to avoid having APIs > >> with extra parameters than need to be passed about since that just adds > >> extra CPU cycles to the offload. > > > > If driver does not support additional attributes and/or the > > application does not need it, rte_dmadev_desc_t can be NULL. > > So that it won't have any cost in the datapath. I think, we can go to > > different API > > cases if we can not abstract problems without performance impact. > > Otherwise, it will be too much > > pain for applications. > > Yes, currently we plan to use different API for different case, e.g. > rte_dmadev_memcpy() -- deal with local to local memcopy > rte_dmadev_memset() -- deal with fill with local memory with pattern > maybe: > rte_dmadev_imm_data() --deal with copy very little data > rte_dmadev_p2pcopy() --deal with peer-to-peer copy of diffenet PCIE addr > > These API capabilities will be reflected in the device capability set so that > application could know by standard API.
There will be a lot of combination of that it will be like M x N cross base case, It won't scale. > > > > > Just to understand, I think, we need to HW capabilities and how to > > have a common API. > > I assume HW will have some HW JOB descriptors which will be filled in > > SW and submitted to HW. > > In our HW, Job descriptor has the following main elements > > > > - Channel // We don't expect the application to change per transfer > > - Source address - It can be scatter-gather too - Will be changed per > > transfer > > - Destination address - It can be scatter-gather too - Will be changed > > per transfer > > - Transfer Length - - It can be scatter-gather too - Will be changed > > per transfer > > - IOVA address where HW post Job completion status PER Job descriptor > > - Will be changed per transfer > > - Another sideband information related to channel // We don't expect > > the application to change per transfer > > - As an option, Job completion can be posted as an event to > > rte_event_queue too // We don't expect the application to change per > > transfer > > The 'option' field looks like a software interface field, but not HW > descriptor. It is in HW descriptor. > > > > > @Richardson, Bruce @fengchengwen @Hemant Agrawal > > > > Could you share the options for your HW descriptors which you are > > planning to expose through API like above so that we can easily > > converge on fastpath API > > > > Kunpeng HW descriptor is self-describing, and don't need refer context info. > > Maybe the fields which was fix with some transfer type could setup by driver, > and > don't expose to application. Yes. I agree.I think, that reason why I though to have rte_dmadev_prep() call to convert DPDK DMA transfer attributes to HW specific descriptors and have single enq() operation with variable argument(through enq parameter) and fix argumenents through rte_dmadev_prep() call object. > > So that we could use more generic way to define the API. > > > > > > >> > >> /Bruce > > > > . > > >