XDC allocator workshop and Wayland dmabuf hints
Hi, There were certainly some interesting changes discussed at the allocator workshop during XDC this year, and I'd like to just summarise my thoughts on it and make sure everybody is on the same page. For those who don't know who I am or my stake in this, I'm the maintainer of the DRM and graphics code for the wlroots Wayland compositor library. I'm ascent12 on Github and Freenode. My understanding of the issue Nvidia was trying to solve was the in-place transition between different format modifiers. E.g. if a client is to be scanned out, the buffer would need to be transitioned to a non-compressed format that the display controller can work with, but if the client is to be composited, a compressed format would be used, saving on memory bandwidth. Hardware may have more efficient ways to transition between different formats, so it would be good if we can use these and not rely on having to perform a blit if we don't need to. The problem is more general than this, but that was just the example given. The original solution proposed in James' talk was to add functions to EGL/OpenGL/Vulkan and have the display server perform transitions where required. Discussions during the workshop at the start tended to having libliftoff handle all of this, but would require libliftoff to have its own rendering context, which I think is bloating the purpose of the library. Also discussed was to have libliftoff ask the compositor to perform the transition if it thinks it was possible. Another suggestion I made was to make use of Simon's dmabuf hints patch to the wp_linux_dmabuf protocol [1] and leave it up to the client's GPU driver to handle any transitions. This wasn't adequately represented in the lightning talk summarising the workshop, so I'll go over it here now, making sure everyone understands what it is and why I think it is the way we should go forward. Right now, a Wayland compositor will advertise all of the format+modifier pairs that it supports, but currently does not provide any context for clients as to which one they should actually choose. It's basically up to chance if a client is able to be scanned out and is likely to lead to several suboptimal situations. The dmabuf hints patch adds a way to suggest a better format to use, based on the current context. This is dynamic, and can be sent multiple times over the lifetime of a surface. The patch also adds a way for the compositor to tell the client which GPU its using, which is useful for clients to know in multi GPU situations. These hints are in various "tranches", which are just groups of format+modifier pairs of the same preference. The tranches are ordered from most optimal to least optimal. The most optimal tranche would imply direct scanout, while a less optimal tranche would imply compositing, but is not actually defined like that in the protocol. If a client becomes fullscreen, we would send the format+modifier pairs for the primary plane as the most optimal tranche. If a client is eligible to be scanned out on an overlay plane, we would send the format+modifier pairs for that plane. If a client is partially occluded or otherwise not possible to be scanned out, we'd just have the normal format+modifier pairs that we can use as a texture. Note that the compositor won't send format+modifier pairs which we cannot texture from, even if the plane advertises it's supported. We always need to be able to fall back to compositing. The hard part of figuring out which clients are "eligible" for being scanned out on an overlay plane could be handled by libliftoff (or something similar) and given back to the compositor to forward to clients. For libliftoff to make a properly informed decision, I think the atomic KMS API needs to be changed. We can only TEST_ONLY for valid buffers, testing the immediate configuration, but doesn't allow us to test for a configuration we WANT to go to. We need some sort of fake framebuffer not backed by any real memory, but will allow us to TEST_ONLY it. Without this, we may tell the client format+modifier pairs that we think will work for scanout, but don't due to whatever hardware limitations or transient issues like memory bandwidth, and we could actually make things worse by having the client transition formats. As an aside, I would really like these fake framebuffers for my modesetting set up to be a lot cleaner too. I'm sure this has been discussed before, and I'm not really sure what the implications are from a driver perspective. I'd have to leave it up to people more familiar with KMS and driver internals to comment on this. Even if the solution isn't 100%, something that works most of time would be hugely helpful (especially with RGB formats). Perhaps this is not possible and it would need to live purely in driver-specific code inside of libraries like libliftoff, but it would be nice to not come to that. It seems useful enough for a generic KMS userspace. As to how dmabuf hints would look client-s
XDC allocator workshop and Wayland dmabuf hints
(Sorry to CCs for spam, I made an error in my first posting) Hi, There were certainly some interesting changes discussed at the allocator workshop during XDC this year, and I'd like to just summarise my thoughts on it and make sure everybody is on the same page. For those who don't know who I am or my stake in this, I'm the maintainer of the DRM and graphics code for the wlroots Wayland compositor library. I'm ascent12 on Github and Freenode. My understanding of the issue Nvidia was trying to solve was the in-place transition between different format modifiers. E.g. if a client is to be scanned out, the buffer would need to be transitioned to a non-compressed format that the display controller can work with, but if the client is to be composited, a compressed format would be used, saving on memory bandwidth. Hardware may have more efficient ways to transition between different formats, so it would be good if we can use these and not rely on having to perform a blit if we don't need to. The problem is more general than this, but that was just the example given. The original solution proposed in James' talk was to add functions to EGL/OpenGL/Vulkan and have the display server perform transitions where required. Discussions during the workshop at the start tended to having libliftoff handle all of this, but would require libliftoff to have its own rendering context, which I think is bloating the purpose of the library. Also discussed was to have libliftoff ask the compositor to perform the transition if it thinks it was possible. Another suggestion I made was to make use of Simon's dmabuf hints patch to the wp_linux_dmabuf protocol [1] and leave it up to the client's GPU driver to handle any transitions. This wasn't adequately represented in the lightning talk summarising the workshop, so I'll go over it here now, making sure everyone understands what it is and why I think it is the way we should go forward. Right now, a Wayland compositor will advertise all of the format+modifier pairs that it supports, but currently does not provide any context for clients as to which one they should actually choose. It's basically up to chance if a client is able to be scanned out and is likely to lead to several suboptimal situations. The dmabuf hints patch adds a way to suggest a better format to use, based on the current context. This is dynamic, and can be sent multiple times over the lifetime of a surface. The patch also adds a way for the compositor to tell the client which GPU its using, which is useful for clients to know in multi GPU situations. These hints are in various "tranches", which are just groups of format+modifier pairs of the same preference. The tranches are ordered from most optimal to least optimal. The most optimal tranche would imply direct scanout, while a less optimal tranche would imply compositing, but is not actually defined like that in the protocol. If a client becomes fullscreen, we would send the format+modifier pairs for the primary plane as the most optimal tranche. If a client is eligible to be scanned out on an overlay plane, we would send the format+modifier pairs for that plane. If a client is partially occluded or otherwise not possible to be scanned out, we'd just have the normal format+modifier pairs that we can use as a texture. Note that the compositor won't send format+modifier pairs which we cannot texture from, even if the plane advertises it's supported. We always need to be able to fall back to compositing. The hard part of figuring out which clients are "eligible" for being scanned out on an overlay plane could be handled by libliftoff (or something similar) and given back to the compositor to forward to clients. For libliftoff to make a properly informed decision, I think the atomic KMS API needs to be changed. We can only TEST_ONLY for valid buffers, testing the immediate configuration, but doesn't allow us to test for a configuration we WANT to go to. We need some sort of fake framebuffer not backed by any real memory, but will allow us to TEST_ONLY it. Without this, we may tell the client format+modifier pairs that we think will work for scanout, but don't due to whatever hardware limitations or transient issues like memory bandwidth, and we could actually make things worse by having the client transition formats. As an aside, I would really like these fake framebuffers for my modesetting set up to be a lot cleaner too. I'm sure this has been discussed before, and I'm not really sure what the implications are from a driver perspective. I'd have to leave it up to people more familiar with KMS and driver internals to comment on this. Even if the solution isn't 100%, something that works most of time would be hugely helpful (especially with RGB formats). Perhaps this is not possible, and would need to live inside of driver-specific code inside of libraries like libliftoff, but it would be nice not to come to that. It seems useful enough for a generic KMS