Re: [Mesa-dev] [PATCH 0/3] Implement DRI_PRIME support for Wayland
On Fri, Nov 22, 2013 at 8:52 AM, Axel Davy wrote: > On 11/22/2013 01:16 AM, Kristian Høgsberg wrote: >> I'm not sold on the nested compositor for this use case. It >> introduces another context switch between the game and the main >> compositor and more input and output lag. Instead I think we need to >> consider whether we want a new __DRIimage entry point like: >> >> (*blitImage)(__DRIcontext *ctx, __DRIimage *src, __DRIimage *ctx) >> >> and then do the copy in platform_wayland.c when we have non-matching >> tile-formats. >> >> Kristian >> > > Thanks for the comments. > > There are advantages to both possibilities: > using a nested compositor or doing the copy inside Mesa. > > I imagine doing a blit could be the default, > and rendering directly to the linear buffer could be an option > set by an env var, or driconf. > > I'm deeply convinced we should allow to render to the linear > buffer without copy, and the nested compositor use-case makes sense. > > For the blit, a function > > (*blitImage)(__DRIcontext *ctx, __DRIimage *src, __DRIimage *dst) > > makes sense, but we would need another function > (not related specifically to Prime): > > (*throttle) (__DRIcontext *ctx) > > Because rendering something heavy on non-intel card (intel cards to > throttling automatically) > cause input lag (and It is solved by forcing throttling at every swap). > > And ideally, we could have more control on tiling, > for example if the computer has two AMD cards of the same > generation with same tiling modes, we could always use tiling. > We can actually use 1D tiling across all families since R600 IIRC since the tile size and alignment requirements are not asic dependent. Only 2D tiling as family dependencies. Alex > >>I would like the compositor to still send the classic drm device in >>the wl_drm.device event. The client can then use stat(2) to stat it >>and defer the corresponding render node from that by adding 128 to the >>minor. This way we don't break older mesa versions by sending them a >>render node that they'll then fail to authenticate. > > I do not agree on this: if the compositor does run under a render-node, > there are high chances it can't authenticate clients wanting to run > on the not-render-node device. > Moreover, I believe clients shouldn't use render-nodes by default if > they can avoid it. > > I don't get the point of older mesa versions: why would you use an older > Mesa version inside a more recent Mesa version? > > > Some arguments in favor of allowing the nested compositor case to render > without copy > on another card: > > . XWayland inside would run as if the main device is the device of the > nested compositor. (I can't say how X dri3 will support Prime, so I > can't say yet if this is a big advantage or not). For example if Xrender is > slow on the main card, you can with this system use Xrender on the other > card. > > . In case you are under system compositors (like would KWin), you would > like to be able to render your whole desktop on the card you want, > without an additional copy. > . We could imagine having outputs on different card, the compositor > under system compositors would connect to multiple system compositors > running on each card (and giving access to different outputs). The > compositor would use card X: the system compositor on card X would have > tiled buffers without copies, whereas the other system compositors would > have untiled buffers without copies. > > . The nested compositor could allow the user to choose between capping > the compositing to 60 fps > or not. When we would cap the compositing to 60 fps, we would avoid some > useless copies (while adding a very small latency between when the frame > is sent and when it is displayed) > > (. The nested compositor could have additional features like recording > using the acceleration of the other card ) > > All these arguments can be put in short in: "more flexibility" > > > For an heavy game < 60 fps, I agree it makes much more sense to do the > copy inside Mesa, than using a nested compositor. > > > Axel Davy > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/3] Implement DRI_PRIME support for Wayland
On 11/22/2013 01:16 AM, Kristian Høgsberg wrote: > I'm not sold on the nested compositor for this use case. It > introduces another context switch between the game and the main > compositor and more input and output lag. Instead I think we need to > consider whether we want a new __DRIimage entry point like: > > (*blitImage)(__DRIcontext *ctx, __DRIimage *src, __DRIimage *ctx) > > and then do the copy in platform_wayland.c when we have non-matching > tile-formats. > > Kristian > Thanks for the comments. There are advantages to both possibilities: using a nested compositor or doing the copy inside Mesa. I imagine doing a blit could be the default, and rendering directly to the linear buffer could be an option set by an env var, or driconf. I'm deeply convinced we should allow to render to the linear buffer without copy, and the nested compositor use-case makes sense. For the blit, a function (*blitImage)(__DRIcontext *ctx, __DRIimage *src, __DRIimage *dst) makes sense, but we would need another function (not related specifically to Prime): (*throttle) (__DRIcontext *ctx) Because rendering something heavy on non-intel card (intel cards to throttling automatically) cause input lag (and It is solved by forcing throttling at every swap). And ideally, we could have more control on tiling, for example if the computer has two AMD cards of the same generation with same tiling modes, we could always use tiling. >I would like the compositor to still send the classic drm device in >the wl_drm.device event. The client can then use stat(2) to stat it >and defer the corresponding render node from that by adding 128 to the >minor. This way we don't break older mesa versions by sending them a >render node that they'll then fail to authenticate. I do not agree on this: if the compositor does run under a render-node, there are high chances it can't authenticate clients wanting to run on the not-render-node device. Moreover, I believe clients shouldn't use render-nodes by default if they can avoid it. I don't get the point of older mesa versions: why would you use an older Mesa version inside a more recent Mesa version? Some arguments in favor of allowing the nested compositor case to render without copy on another card: . XWayland inside would run as if the main device is the device of the nested compositor. (I can't say how X dri3 will support Prime, so I can't say yet if this is a big advantage or not). For example if Xrender is slow on the main card, you can with this system use Xrender on the other card. . In case you are under system compositors (like would KWin), you would like to be able to render your whole desktop on the card you want, without an additional copy. . We could imagine having outputs on different card, the compositor under system compositors would connect to multiple system compositors running on each card (and giving access to different outputs). The compositor would use card X: the system compositor on card X would have tiled buffers without copies, whereas the other system compositors would have untiled buffers without copies. . The nested compositor could allow the user to choose between capping the compositing to 60 fps or not. When we would cap the compositing to 60 fps, we would avoid some useless copies (while adding a very small latency between when the frame is sent and when it is displayed) (. The nested compositor could have additional features like recording using the acceleration of the other card ) All these arguments can be put in short in: "more flexibility" For an heavy game < 60 fps, I agree it makes much more sense to do the copy inside Mesa, than using a nested compositor. Axel Davy ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/3] Implement DRI_PRIME support for Wayland
On Thu, Nov 07, 2013 at 05:13:35PM +0100, Axel Davy wrote: > These patches enable using DRI_PRIME to use a different card > than the compositor card (with render-nodes). > > At the time of writing, Mesa Wayland egl backend doesn't > support render-nodes, because it uses the dri2 backend, which > require using GEM names (render-nodes aren't allowed to use GEM > names). But I'm confident this week or next week, the __DRIimage > remplacement will be ready, thanks to Keith Packard, Kristian Hosberg > and Christopher James Halse Rogers. > That's why I'm publishing these patches now, so they have the time > to be reviewed. > > Initially, I wanted to use driconf too, as a complement of DRI_PRIME, > but driconf doesn't support string parameters yet, so it'll come later. > > To choose a specific device, the user has to specify the id_path_tag of > the device he wants to use. We get the id_path_tag with udev. Systemd > didn't fill this field for render-nodes, so it has to be set as an additional > rule. David Herrmann has sent a patch for that for Systemd, but I don't know > if > it is already pushed. > > The choice to use id_path_tag comes to the fact that the id_path is stable, > and that it describes non-pci graphic devices too (usb devices, etc). > > An alternative to choose the device to use is to set DRI_PRIME to "1", > which means "choose any other card than the one used by the compositor". > > If Mesa doesn't find the device asked by the user, it will use the same > card than the Wayland compositor. > > The Wayland Prime support implemented with these patches is different > from X Prime support. > > A client using an other card than the compositor will allocate buffers > with no-tiling to render to, and share them with the compositor, unlike > on X, where it would render to a tiled buffer, not shared with the other card, > and a copy mechanism will make the main card receive an untiled buffer. > > That means that these (Wayland) clients will perform slowly, compared to > if they weren't using Prime. > In fact it is not how the user is supposed to run a game, for example, > on its dedicated card. > > Using a shared, untiled-buffer, but avoiding any copy, is better for > application which wouldn't do much rendering. > > An example of such an application is an embedded Wayland compositor. > > To use an heavy application, the user is supposed to launch an > embedded Wayland compositor on the dedicated card, and run the game > inside. The compositor will render into the shared, untiled buffer, > and will copy the content of the game buffers. > > Note that the game know it is using the same cards than its compositor, > that's why it enables tiling. > > I'm planning to write a Weston shell, designed to run embedded fullscreen > games, > that would make Weston resize to the game size, and close when it closes. I'm not sold on the nested compositor for this use case. It introduces another context switch between the game and the main compositor and more input and output lag. Instead I think we need to consider whether we want a new __DRIimage entry point like: (*blitImage)(__DRIcontext *ctx, __DRIimage *src, __DRIimage *ctx) and then do the copy in platform_wayland.c when we have non-matching tile-formats. Kristian > Pros: > .If you launch a fullscreen Wayland compositor on the dedicated card, > inside a compositor supporting composite bypass, you'll render the whole > desktop on the dedicated card. The integrated card would only display > the buffer generated, without doing any copy. > .More flexibility > > > Cons: > .The user has to use a script to launch a game on the dedicated card. > > Pros over X dri2 Prime support: > .Vsync works, whatever the cards used by the client > .You can understand easily how prime support works > > > As a last note, this Prime support suffers too from the > lack of dma-buf fences (glitches when the client is still writing > on the buffer when the compositor's card reads it). > Using an embedded compositor suppress all the glitches when > it doesn't take (1/refresh_rate) seconds for it to render a frame, > that is when you don't have an input lag. > > > > Axel Davy (3): > Move the code to open the graphic device. Support for > render-nodes. > Create untiled buffers in get_back_bo when needed. > Implement choosing the device to use with DRI_PRIME > > src/egl/drivers/dri2/egl_dri2.h | 1 + > src/egl/drivers/dri2/platform_wayland.c | 262 > +++- > 2 files changed, 226 insertions(+), 37 deletions(-) > > -- > 1.8.1.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] Implement DRI_PRIME support for Wayland
These patches enable using DRI_PRIME to use a different card than the compositor card (with render-nodes). At the time of writing, Mesa Wayland egl backend doesn't support render-nodes, because it uses the dri2 backend, which require using GEM names (render-nodes aren't allowed to use GEM names). But I'm confident this week or next week, the __DRIimage remplacement will be ready, thanks to Keith Packard, Kristian Hosberg and Christopher James Halse Rogers. That's why I'm publishing these patches now, so they have the time to be reviewed. Initially, I wanted to use driconf too, as a complement of DRI_PRIME, but driconf doesn't support string parameters yet, so it'll come later. To choose a specific device, the user has to specify the id_path_tag of the device he wants to use. We get the id_path_tag with udev. Systemd didn't fill this field for render-nodes, so it has to be set as an additional rule. David Herrmann has sent a patch for that for Systemd, but I don't know if it is already pushed. The choice to use id_path_tag comes to the fact that the id_path is stable, and that it describes non-pci graphic devices too (usb devices, etc). An alternative to choose the device to use is to set DRI_PRIME to "1", which means "choose any other card than the one used by the compositor". If Mesa doesn't find the device asked by the user, it will use the same card than the Wayland compositor. The Wayland Prime support implemented with these patches is different from X Prime support. A client using an other card than the compositor will allocate buffers with no-tiling to render to, and share them with the compositor, unlike on X, where it would render to a tiled buffer, not shared with the other card, and a copy mechanism will make the main card receive an untiled buffer. That means that these (Wayland) clients will perform slowly, compared to if they weren't using Prime. In fact it is not how the user is supposed to run a game, for example, on its dedicated card. Using a shared, untiled-buffer, but avoiding any copy, is better for application which wouldn't do much rendering. An example of such an application is an embedded Wayland compositor. To use an heavy application, the user is supposed to launch an embedded Wayland compositor on the dedicated card, and run the game inside. The compositor will render into the shared, untiled buffer, and will copy the content of the game buffers. Note that the game know it is using the same cards than its compositor, that's why it enables tiling. I'm planning to write a Weston shell, designed to run embedded fullscreen games, that would make Weston resize to the game size, and close when it closes. Pros: .If you launch a fullscreen Wayland compositor on the dedicated card, inside a compositor supporting composite bypass, you'll render the whole desktop on the dedicated card. The integrated card would only display the buffer generated, without doing any copy. .More flexibility Cons: .The user has to use a script to launch a game on the dedicated card. Pros over X dri2 Prime support: .Vsync works, whatever the cards used by the client .You can understand easily how prime support works As a last note, this Prime support suffers too from the lack of dma-buf fences (glitches when the client is still writing on the buffer when the compositor's card reads it). Using an embedded compositor suppress all the glitches when it doesn't take (1/refresh_rate) seconds for it to render a frame, that is when you don't have an input lag. Axel Davy (3): Move the code to open the graphic device. Support for render-nodes. Create untiled buffers in get_back_bo when needed. Implement choosing the device to use with DRI_PRIME src/egl/drivers/dri2/egl_dri2.h | 1 + src/egl/drivers/dri2/platform_wayland.c | 262 +++- 2 files changed, 226 insertions(+), 37 deletions(-) -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev