Re: [Mesa-dev] [PATCH 0/3] Implement DRI_PRIME support for Wayland

2013-11-24 Thread Alex Deucher
On Fri, Nov 22, 2013 at 8:52 AM, Axel Davy  wrote:
> On 11/22/2013 01:16 AM, Kristian Høgsberg wrote:
>> I'm not sold on the nested compositor for this use case.  It
>> introduces another context switch between the game and the main
>> compositor and more input and output lag.  Instead I think we need to
>> consider whether we want a new __DRIimage entry point like:
>>
>>   (*blitImage)(__DRIcontext *ctx, __DRIimage *src, __DRIimage *ctx)
>>
>> and then do the copy in platform_wayland.c when we have non-matching
>> tile-formats.
>>
>> Kristian
>>
>
> Thanks for the comments.
>
> There are advantages to both possibilities:
> using a nested compositor or doing the copy inside Mesa.
>
> I imagine doing a blit could be the default,
> and rendering directly to the linear buffer could be an option
> set by an env var, or driconf.
>
> I'm deeply convinced we should allow to render to the linear
> buffer without copy, and the nested compositor use-case makes sense.
>
> For the blit, a function
>
> (*blitImage)(__DRIcontext *ctx, __DRIimage *src, __DRIimage *dst)
>
> makes sense, but we would need another function
> (not related specifically to Prime):
>
> (*throttle) (__DRIcontext *ctx)
>
> Because rendering something heavy on non-intel card (intel cards to
> throttling automatically)
> cause input lag (and It is solved by forcing throttling at every swap).
>
> And ideally, we could have more control on tiling,
> for example if the computer has two AMD cards of the same
> generation with same tiling modes, we could always use tiling.
>

We can actually use 1D tiling across all families since R600 IIRC
since the tile size and alignment requirements are not asic dependent.
 Only 2D tiling as family dependencies.

Alex

>
>>I would like the compositor to still send the classic drm device in
>>the wl_drm.device event.  The client can then use stat(2) to stat it
>>and defer the corresponding render node from that by adding 128 to the
>>minor.  This way we don't break older mesa versions by sending them a
>>render node that they'll then fail to authenticate.
>
> I do not agree on this: if the compositor does run under a render-node,
> there are high chances it can't authenticate clients wanting to run
> on the not-render-node device.
> Moreover, I believe clients shouldn't use render-nodes by default if
> they can avoid it.
>
> I don't get the point of older mesa versions: why would you use an older
> Mesa version inside a more recent Mesa version?
>
>
> Some arguments in favor of allowing the nested compositor case to render
> without copy
> on another card:
>
> . XWayland inside would run as if the main device is the device of the
> nested compositor. (I can't say how X dri3 will support Prime, so I
> can't say yet if this is a big advantage or not). For example if Xrender is
> slow on the main card, you can with this system use Xrender on the other
> card.
>
> . In case you are under system compositors (like would KWin), you would
> like to be able to render your whole desktop on the card you want,
> without an additional copy.
> . We could imagine having outputs on different card, the compositor
> under system compositors would connect to multiple system compositors
> running on each card (and giving access to different outputs). The
> compositor would use card X: the system compositor on card X would have
> tiled buffers without copies, whereas the other system compositors would
> have untiled buffers without copies.
>
> . The nested compositor could allow the user to choose between capping
> the compositing to 60 fps
> or not. When we would cap the compositing to 60 fps, we would avoid some
> useless copies (while adding a very small latency between when the frame
> is sent and when it is displayed)
>
> (. The nested compositor could have additional features like recording
> using the acceleration of the other card )
>
> All these arguments can be put in short in: "more flexibility"
>
>
> For an heavy game < 60 fps, I agree it makes much more sense to do the
> copy inside Mesa, than using a nested compositor.
>
>
> Axel Davy
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] Implement DRI_PRIME support for Wayland

2013-11-22 Thread Axel Davy
On 11/22/2013 01:16 AM, Kristian Høgsberg wrote:
> I'm not sold on the nested compositor for this use case.  It
> introduces another context switch between the game and the main
> compositor and more input and output lag.  Instead I think we need to
> consider whether we want a new __DRIimage entry point like:
>
>   (*blitImage)(__DRIcontext *ctx, __DRIimage *src, __DRIimage *ctx)
>
> and then do the copy in platform_wayland.c when we have non-matching
> tile-formats.
>
> Kristian
>

Thanks for the comments.

There are advantages to both possibilities:
using a nested compositor or doing the copy inside Mesa.

I imagine doing a blit could be the default,
and rendering directly to the linear buffer could be an option
set by an env var, or driconf.

I'm deeply convinced we should allow to render to the linear
buffer without copy, and the nested compositor use-case makes sense.

For the blit, a function

(*blitImage)(__DRIcontext *ctx, __DRIimage *src, __DRIimage *dst)

makes sense, but we would need another function
(not related specifically to Prime):

(*throttle) (__DRIcontext *ctx)

Because rendering something heavy on non-intel card (intel cards to
throttling automatically)
cause input lag (and It is solved by forcing throttling at every swap).

And ideally, we could have more control on tiling,
for example if the computer has two AMD cards of the same
generation with same tiling modes, we could always use tiling.


>I would like the compositor to still send the classic drm device in
>the wl_drm.device event.  The client can then use stat(2) to stat it
>and defer the corresponding render node from that by adding 128 to the
>minor.  This way we don't break older mesa versions by sending them a
>render node that they'll then fail to authenticate.

I do not agree on this: if the compositor does run under a render-node,
there are high chances it can't authenticate clients wanting to run
on the not-render-node device.
Moreover, I believe clients shouldn't use render-nodes by default if
they can avoid it.

I don't get the point of older mesa versions: why would you use an older
Mesa version inside a more recent Mesa version?


Some arguments in favor of allowing the nested compositor case to render
without copy
on another card:

. XWayland inside would run as if the main device is the device of the
nested compositor. (I can't say how X dri3 will support Prime, so I
can't say yet if this is a big advantage or not). For example if Xrender is
slow on the main card, you can with this system use Xrender on the other
card.

. In case you are under system compositors (like would KWin), you would
like to be able to render your whole desktop on the card you want,
without an additional copy.
. We could imagine having outputs on different card, the compositor
under system compositors would connect to multiple system compositors
running on each card (and giving access to different outputs). The
compositor would use card X: the system compositor on card X would have
tiled buffers without copies, whereas the other system compositors would
have untiled buffers without copies.

. The nested compositor could allow the user to choose between capping
the compositing to 60 fps
or not. When we would cap the compositing to 60 fps, we would avoid some
useless copies (while adding a very small latency between when the frame
is sent and when it is displayed)

(. The nested compositor could have additional features like recording
using the acceleration of the other card )

All these arguments can be put in short in: "more flexibility"


For an heavy game < 60 fps, I agree it makes much more sense to do the
copy inside Mesa, than using a nested compositor.


Axel Davy
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] Implement DRI_PRIME support for Wayland

2013-11-21 Thread Kristian Høgsberg
On Thu, Nov 07, 2013 at 05:13:35PM +0100, Axel Davy wrote:
> These patches enable using DRI_PRIME to use a different card
> than the compositor card (with render-nodes).
> 
> At the time of writing, Mesa Wayland egl backend doesn't
> support render-nodes, because it uses the dri2 backend, which
> require using GEM names (render-nodes aren't allowed to use GEM
> names). But I'm confident this week or next week, the __DRIimage
> remplacement will be ready, thanks to Keith Packard, Kristian Hosberg
> and Christopher James Halse Rogers.
> That's why I'm publishing these patches now, so they have the time
> to be reviewed.
> 
> Initially, I wanted to use driconf too, as a complement of DRI_PRIME,
> but driconf doesn't support string parameters yet, so it'll come later.
> 
> To choose a specific device, the user has to specify the id_path_tag of
> the device he wants to use. We get the id_path_tag with udev. Systemd
> didn't fill this field for render-nodes, so it has to be set as an additional
> rule. David Herrmann has sent a patch for that for Systemd, but I don't know 
> if
> it is already pushed.
> 
> The choice to use id_path_tag comes to the fact that the id_path is stable,
> and that it describes non-pci graphic devices too (usb devices, etc). 
> 
> An alternative to choose the device to use is to set DRI_PRIME to "1",
> which means "choose any other card than the one used by the compositor".
> 
> If Mesa doesn't find the device asked by the user, it will use the same
> card than the Wayland compositor.
> 
> The Wayland Prime support implemented with these patches is different
> from X Prime support.
> 
> A client using an other card than the compositor will allocate buffers
> with no-tiling to render to, and share them with the compositor, unlike
> on X, where it would render to a tiled buffer, not shared with the other card,
> and a copy mechanism will make the main card receive an untiled buffer.
> 
> That means that these (Wayland) clients will perform slowly, compared to
> if they weren't using Prime.
> In fact it is not how the user is supposed to run a game, for example,
> on its dedicated card.
> 
> Using a shared, untiled-buffer, but avoiding any copy, is better for 
> application which wouldn't do much rendering.
> 
> An example of such an application is an embedded Wayland compositor.
> 
> To use an heavy application, the user is supposed to launch an
> embedded Wayland compositor on the dedicated card, and run the game
> inside. The compositor will render into the shared, untiled buffer,
> and will copy the content of the game buffers.
> 
> Note that the game know it is using the same cards than its compositor,
> that's why it enables tiling.
> 
> I'm planning to write a Weston shell, designed to run embedded fullscreen 
> games,
> that would make Weston resize to the game size, and close when it closes.

I'm not sold on the nested compositor for this use case.  It
introduces another context switch between the game and the main
compositor and more input and output lag.  Instead I think we need to
consider whether we want a new __DRIimage entry point like:

  (*blitImage)(__DRIcontext *ctx, __DRIimage *src, __DRIimage *ctx)

and then do the copy in platform_wayland.c when we have non-matching
tile-formats.

Kristian

> Pros:
> .If you launch a fullscreen Wayland compositor on the dedicated card,
> inside a compositor supporting composite bypass, you'll render the whole
> desktop on the dedicated card. The integrated card would only display
> the buffer generated, without doing any copy.
> .More flexibility
> 
> 
> Cons: 
> .The user has to use a script to launch a game on the dedicated card.
> 
> Pros over X dri2 Prime support:
> .Vsync works, whatever the cards used by the client 
> .You can understand easily how prime support works
> 
> 
> As a last note, this Prime support suffers too from the
> lack of dma-buf fences (glitches when the client is still writing
> on the buffer when the compositor's card reads it).
> Using an embedded compositor suppress all the glitches when
> it doesn't take (1/refresh_rate) seconds for it to render a frame,
> that is when you don't have an input lag.
> 
> 
> 
> Axel Davy (3):
>   Move the code to open the graphic device. Support for
> render-nodes.
>   Create untiled buffers in get_back_bo when needed.
>   Implement choosing the device to use with DRI_PRIME
> 
>  src/egl/drivers/dri2/egl_dri2.h |   1 +
>  src/egl/drivers/dri2/platform_wayland.c | 262 
> +++-
>  2 files changed, 226 insertions(+), 37 deletions(-)
> 
> -- 
> 1.8.1.2
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3] Implement DRI_PRIME support for Wayland

2013-11-07 Thread Axel Davy
These patches enable using DRI_PRIME to use a different card
than the compositor card (with render-nodes).

At the time of writing, Mesa Wayland egl backend doesn't
support render-nodes, because it uses the dri2 backend, which
require using GEM names (render-nodes aren't allowed to use GEM
names). But I'm confident this week or next week, the __DRIimage
remplacement will be ready, thanks to Keith Packard, Kristian Hosberg
and Christopher James Halse Rogers.
That's why I'm publishing these patches now, so they have the time
to be reviewed.

Initially, I wanted to use driconf too, as a complement of DRI_PRIME,
but driconf doesn't support string parameters yet, so it'll come later.

To choose a specific device, the user has to specify the id_path_tag of
the device he wants to use. We get the id_path_tag with udev. Systemd
didn't fill this field for render-nodes, so it has to be set as an additional
rule. David Herrmann has sent a patch for that for Systemd, but I don't know if
it is already pushed.

The choice to use id_path_tag comes to the fact that the id_path is stable,
and that it describes non-pci graphic devices too (usb devices, etc). 

An alternative to choose the device to use is to set DRI_PRIME to "1",
which means "choose any other card than the one used by the compositor".

If Mesa doesn't find the device asked by the user, it will use the same
card than the Wayland compositor.

The Wayland Prime support implemented with these patches is different
from X Prime support.

A client using an other card than the compositor will allocate buffers
with no-tiling to render to, and share them with the compositor, unlike
on X, where it would render to a tiled buffer, not shared with the other card,
and a copy mechanism will make the main card receive an untiled buffer.

That means that these (Wayland) clients will perform slowly, compared to
if they weren't using Prime.
In fact it is not how the user is supposed to run a game, for example,
on its dedicated card.

Using a shared, untiled-buffer, but avoiding any copy, is better for 
application which wouldn't do much rendering.

An example of such an application is an embedded Wayland compositor.

To use an heavy application, the user is supposed to launch an
embedded Wayland compositor on the dedicated card, and run the game
inside. The compositor will render into the shared, untiled buffer,
and will copy the content of the game buffers.

Note that the game know it is using the same cards than its compositor,
that's why it enables tiling.

I'm planning to write a Weston shell, designed to run embedded fullscreen games,
that would make Weston resize to the game size, and close when it closes.

Pros:
.If you launch a fullscreen Wayland compositor on the dedicated card,
inside a compositor supporting composite bypass, you'll render the whole
desktop on the dedicated card. The integrated card would only display
the buffer generated, without doing any copy.
.More flexibility


Cons: 
.The user has to use a script to launch a game on the dedicated card.

Pros over X dri2 Prime support:
.Vsync works, whatever the cards used by the client 
.You can understand easily how prime support works


As a last note, this Prime support suffers too from the
lack of dma-buf fences (glitches when the client is still writing
on the buffer when the compositor's card reads it).
Using an embedded compositor suppress all the glitches when
it doesn't take (1/refresh_rate) seconds for it to render a frame,
that is when you don't have an input lag.



Axel Davy (3):
  Move the code to open the graphic device. Support for
render-nodes.
  Create untiled buffers in get_back_bo when needed.
  Implement choosing the device to use with DRI_PRIME

 src/egl/drivers/dri2/egl_dri2.h |   1 +
 src/egl/drivers/dri2/platform_wayland.c | 262 +++-
 2 files changed, 226 insertions(+), 37 deletions(-)

-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev