Hi David, I've made a little Wayland app that uses both SHM and DMA, and I tested it on Weston, Sway, and my own compositor. I also tried it on three different machines: two with Intel i7 CPUs and one with a smaller ARM CPU. These machines had Intel Iris Pro, Nvidia GT525M, and Mali-400 GPUs, respectively.
Here's the code and results for one of the machines: https://github.com/ehopperdietzel/QPainter-SHM-DMA-Benchmark The results show that there's no significant difference in the time it takes for read and write operations using QPainter in SHM and DMA maps. It seems like DMA I/O operations are handled asynchronously by the kernel. The most noticeable improvement is on the compositor side. When using DMA, the experience feels much smoother, especially when moving other windows while the benchmark is running on single-threaded compositors like Weston. There's also a slight increase in the number of frame callbacks returned by the compositors when using DMA, though it doesn't significantly boost the overall FPS. However, there are challenges with implementing DMA: 1. There does not seems to be standard method to create DMA buffers in userspace. I tried creating a GBM bo, obtaining a PRIME fd, and mapping it, but this isn't supported by all GPUs/drivers. For instance, it didn't work with the Mali GPU using the Lima driver. I also experimented with DMA-BUFF heaps, but driver support does not seems to be consistent across all distributions, and accessing /dev/dma-heaps/** often requires superuser privileges. 2. When using DMA, triple buffering is necessary; otherwise, compositors only display partial buffer updates. This could potentially be avoided by using DMA fencing mechanisms (like EGL does under the hood) and protocols like this one: https://wayland.app/protocols/linux-explicit-synchronization-unstable-v1 But it seems that not many compositors have implemented it. To sum it up, while DMA does offer a performance boost, it's not without its issues: - DMA's effectiveness varies depending on hardware. - Implementing DMA can be complex. - The performance gains might not justify the effort. So, as you mentioned earlier, it's probably best to stick with SHM and let the compositor handle uploads using DMA, preferably asynchronously. Cheers, Eduardo Hopperdietzel
-- Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development