> From: Bruce Richardson [mailto:bruce.richard...@intel.com] > Sent: Tuesday, 29 March 2022 19.03 > > On Tue, Mar 29, 2022 at 06:45:19PM +0200, Morten Brørup wrote: > > > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > > > Sent: Tuesday, 29 March 2022 18.24 > > > > > > Hi Morten, > > > > > > On 3/29/22 16:44, Morten Brørup wrote: > > > >> From: Van Haaren, Harry [mailto:harry.van.haa...@intel.com] > > > >> Sent: Tuesday, 29 March 2022 15.02 > > > >> > > > >>> From: Morten Brørup <m...@smartsharesystems.com> > > > >>> Sent: Tuesday, March 29, 2022 1:51 PM > > > >>> > > > >>> Having thought more about it, I think that a completely > different > > > architectural approach is required: > > > >>> > > > >>> Many of the DPDK Ethernet PMDs implement a variety of RX and TX > > > packet burst functions, each optimized for different CPU vector > > > instruction sets. The availability of a DMA engine should be > treated > > > the same way. So I suggest that PMDs copying packet contents, e.g. > > > memif, pcap, vmxnet3, should implement DMA optimized RX and TX > packet > > > burst functions. > > > >>> > > > >>> Similarly for the DPDK vhost library. > > > >>> > > > >>> In such an architecture, it would be the application's job to > > > allocate DMA channels and assign them to the specific PMDs that > should > > > use them. But the actual use of the DMA channels would move down > below > > > the application and into the DPDK PMDs and libraries. > > > >>> > > > >>> > > > >>> Med venlig hilsen / Kind regards, > > > >>> -Morten Brørup > > > >> > > > >> Hi Morten, > > > >> > > > >> That's *exactly* how this architecture is designed & > implemented. > > > >> 1. The DMA configuration and initialization is up to the > application > > > (OVS). > > > >> 2. The VHost library is passed the DMA-dev ID, and its new > async > > > rx/tx APIs, and uses the DMA device to accelerate the copy. > > > >> > > > >> Looking forward to talking on the call that just started. > Regards, - > > > Harry > > > >> > > > > > > > > OK, thanks - as I said on the call, I haven't looked at the > patches. > > > > > > > > Then, I suppose that the TX completions can be handled in the TX > > > function, and the RX completions can be handled in the RX function, > > > just like the Ethdev PMDs handle packet descriptors: > > > > > > > > TX_Burst(tx_packet_array): > > > > 1. Clean up descriptors processed by the NIC chip. --> Process > TX > > > DMA channel completions. (Effectively, the 2nd pipeline stage.) > > > > 2. Pass on the tx_packet_array to the NIC chip descriptors. -- > > Pass > > > on the tx_packet_array to the TX DMA channel. (Effectively, the 1st > > > pipeline stage.) > > > > > > The problem is Tx function might not be called again, so enqueued > > > packets in 2. may never be completed from a Virtio point of view. > IOW, > > > the packets will be copied to the Virtio descriptors buffers, but > the > > > descriptors will not be made available to the Virtio driver. > > > > In that case, the application needs to call TX_Burst() periodically > with an empty array, for completion purposes. > > > > Or some sort of TX_Keepalive() function can be added to the DPDK > library, to handle DMA completion. It might even handle multiple DMA > channels, if convenient - and if possible without locking or other > weird complexity. > > > > Here is another idea, inspired by a presentation at one of the DPDK > Userspace conferences. It may be wishful thinking, though: > > > > Add an additional transaction to each DMA burst; a special > transaction containing the memory write operation that makes the > descriptors available to the Virtio driver. > > > > That is something that can work, so long as the receiver is operating > in > polling mode. For cases where virtio interrupts are enabled, you still > need > to do a write to the eventfd in the kernel in vhost to signal the > virtio > side. That's not something that can be offloaded to a DMA engine, > sadly, so > we still need some form of completion call.
I guess that virtio interrupts is the most widely deployed scenario, so let's ignore the DMA TX completion transaction for now - and call it a possible future optimization for specific use cases. So it seems that some form of completion call is unavoidable.