>> Hi Bruce, >> >> >On Sat, May 24, 2025 at 02:43:10PM +0530, <[email protected]> wrote: >> >> From: Pavan Nikhilesh <[email protected]> >> >> >> >> Introduce DMA enqueue/dequeue operations to the DMA device library. >> >> >> >> Add configuration flags to rte_dma_config instead of boolean for >> >> individual features. >> >> >> >> The enqueue/dequeue operations allow applications to communicate with the >> >> DMA device using the rte_dma_op structure, providing a more flexible and >> >> efficient way to manage DMA operations. >> >> >> > >> >While I have no really strong objections to this addition to the dmadev >> >API, I'd appreciate if you could explain WHY or how this method of working >> >is more efficient in your usecase? When designing the dmadev APIs >> >originally, we looked at using both an enqueue-type API as well as the >> >implemented individual-op-based APIs. IIRC at that time testing showed that >> >using the single ops directly was faster than using the enqueue APIs, so >> >I'm wondering what exactly has changed, or is different about your usecase? >> > >> >> Here is an example where we see enqueue/dequeue ops to be useful especially >> when >> integrating with Graph library. >> >> We had to write an entire wrapper[1] for tracking sges with the current >> implementation >> making our nodes[2] very complex. >> > >Can you explain a bit more here. Why do you need the wrapper rather than >just tracking in a circular ring all the copies offloaded? How does having >an enqueue API make this better?
This is what we already do in our wrapper. We found it unnecessary overhead since, the driver already does this internally and we can leverage the existing functionality. This also reduces the memory footprint as in the case below we use a lot of VCHANS. Instead of checking for completions and maintaining the circular ring, we can spend those cycles doing other things in the application. >Can you perhaps give a trivial example >showing the difference it makes here? The examples you give below are >rather long to understand quickly. > The example below is a graph based application which currently uses the wrapper implementation. Which we want to swap with enq/deq ops to reduce overhead. Also, the ops descriptor already existes for eventdev subsystem, we are just importing it to DMA device and reusing it. >Thanks, >/Bruce > >> [1]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MarvellEmbeddedProcessors_dao_blob_dao-2Ddevel_lib_common_dao-5Fdma.h&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=E3SgYMjtKCMVsB-fmvgGV3o-g_fjLhk5Pupi9ijohpc&m=dXtUywAGV8Rir_dtqGP5J-tvRAxN9zQjmM96PeDo6Ke6QybID8eLdPbVwWzlgZFy&s=QryV2vh2_mWEz5yS37615Xb1F6B-gQZHM1uZ3badxoU&e=> >> [2]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MarvellEmbeddedProcessors_dao_blob_3f364261de91e355699bd9af20d60ea6459f7d67_lib_virtio-5Fnet_virtio-5Fnet-5Fdeq-5Fext.c-23L51&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=E3SgYMjtKCMVsB-fmvgGV3o-g_fjLhk5Pupi9ijohpc&m=dXtUywAGV8Rir_dtqGP5J-tvRAxN9zQjmM96PeDo6Ke6QybID8eLdPbVwWzlgZFy&s=Bl2X7g7xXg_XrWvVIjPhMuIZuy3PG7tOM-Eje9i2ITA&e=> >> >> >/Bruce >> >> Thanks, >> Pavan. >>

