Re: Does gbm_bo_map() implicitly synchronise?

2024-06-25 Thread Lucas Stach
Am Dienstag, dem 25.06.2024 um 09:56 +0200 schrieb Michel Dänzer:
> On 2024-06-24 21:08, James Jones wrote:
> > FWIW, the NVIDIA binary driver's implementation of gbm_bo_map/unmap()
> > 
> > 1) Don't do any synchronization against in-flight work. The assumption is 
> > that if the content is going to be read, the API writing the data has 
> > established that coherence. Likewise, if it's going to be written, the API 
> > reading it afterwards does any invalidates or whatever are needed for 
> > coherence.
> > 
> > 2) We don't blit anything or format convert, because our GBM implementation 
> > has no DMA engine access, and I'd like to keep it that way. Setting up a 
> > DMA-capable driver instance is much more expensive as far as runtime 
> > resources than setting up a simple allocator+mmap driver, at least in our 
> > driver architecture. Our GBM map just does an mmap(), and if it's not 
> > linear, you're not going to be able to interpret the data unless you've 
> > read up on our tiling formats. I'm aware this is different from Mesa, and 
> > no one has complained thus far.
> 
> I've seen at least one webkitgtk issue report about gbm_bo_map not working as 
> intended with nvidia.
> 
> gbm_bo_map definitely has to handle tiling, that's one of its main purposes.
> 
Unfortunately gbm_bo_map is severely underspecified in that regard.
Gallium drivers always handled tiling as the map has been implemented
as a transfer, but i965 also didn't handle tiling and just returned a
mapping of the raw tiled storage.

> It also really has to handle implicit synchronization, since there's no GBM 
> API for explicit synchronization.
> 
One could demand that the caller does something like eglClientWaitSync
on a sync object fencing the hardware operations. Implicit sync on
gbm_bo_map already is kind of a gray area, as GBM uses a different
context to implement the transfer than the rendering API. It will only
synchronize with commands flushed from the rendering context. Anything
still buffered in the rendering context is invisible to gbm.

Again, none of this is really specified anywhere. But I guess most
users at this point assume the Mesa behavior and will break if another
implementation doesn't do the same.

Regards,
Lucas



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-25 Thread Michel Dänzer
On 2024-06-24 21:08, James Jones wrote:
> FWIW, the NVIDIA binary driver's implementation of gbm_bo_map/unmap()
> 
> 1) Don't do any synchronization against in-flight work. The assumption is 
> that if the content is going to be read, the API writing the data has 
> established that coherence. Likewise, if it's going to be written, the API 
> reading it afterwards does any invalidates or whatever are needed for 
> coherence.
> 
> 2) We don't blit anything or format convert, because our GBM implementation 
> has no DMA engine access, and I'd like to keep it that way. Setting up a 
> DMA-capable driver instance is much more expensive as far as runtime 
> resources than setting up a simple allocator+mmap driver, at least in our 
> driver architecture. Our GBM map just does an mmap(), and if it's not linear, 
> you're not going to be able to interpret the data unless you've read up on 
> our tiling formats. I'm aware this is different from Mesa, and no one has 
> complained thus far.

I've seen at least one webkitgtk issue report about gbm_bo_map not working as 
intended with nvidia.

gbm_bo_map definitely has to handle tiling, that's one of its main purposes.

It also really has to handle implicit synchronization, since there's no GBM API 
for explicit synchronization.


Just doing a direct mmap for gbm_bo_map can be bad for other reasons as well. 
E.g. if the BO storage is in VRAM and the application does CPU reads, it'll 
fall down a performance cliff.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-25 Thread Christian König

Am 24.06.24 um 21:08 schrieb James Jones:

FWIW, the NVIDIA binary driver's implementation of gbm_bo_map/unmap()

1) Don't do any synchronization against in-flight work. The assumption 
is that if the content is going to be read, the API writing the data 
has established that coherence. Likewise, if it's going to be written, 
the API reading it afterwards does any invalidates or whatever are 
needed for coherence.


That matches my assumption of what this function does, but is just the 
opposite of what Michel explained what it does.


Is it somewhere documented if gbm_bo_map() should wait for in-flight 
work or not?


Regards,
Christian.



2) We don't blit anything or format convert, because our GBM 
implementation has no DMA engine access, and I'd like to keep it that 
way. Setting up a DMA-capable driver instance is much more expensive 
as far as runtime resources than setting up a simple allocator+mmap 
driver, at least in our driver architecture. Our GBM map just does an 
mmap(), and if it's not linear, you're not going to be able to 
interpret the data unless you've read up on our tiling formats. I'm 
aware this is different from Mesa, and no one has complained thus far. 
If we were forced to fix it, I imagine we'd do something like ask a 
shared engine in the kernel to do the blit on userspace's behalf, 
which would probably be slow but save resources.


Basically, don't use gbm_bo_map() for anything non-trivial on our 
implementation. It's not the right tool for e.g., reading back or 
populating OpenGL textures or X pixmaps. If you don't want to run on 
the NV implementation, feel free to ignore this advice, but I'd still 
suggest it's not the best tool for most jobs.


Thanks,
-James

On 6/17/24 03:29, Pierre Ossman wrote:

On 17/06/2024 10:13, Christian König wrote:


Let me try to clarify a couple of things:

The DMA_BUF_IOCTL_SYNC function is to flush and invalidate caches so 
that the GPU can see values written by the CPU and the CPU can see 
values written by the GPU. But that IOCTL does *not* wait for any 
async GPU operation to finish.


If you want to wait for async GPU operations you either need to call 
the OpenGL functions to read pixels or do a select() (or poll, epoll 
etc...) call on the DMA-buf file descriptor.




Thanks for the clarification!

Just to avoid any uncertainty, are both of these things done 
implicitly by gbm_bo_map()/gbm_bo_unmap()?


I did test adding those steps just in case, but unfortunately did not 
see an improvement. My order was:


1. gbm_bo_import(GBM_BO_USE_RENDERING)
2. gbm_bo_get_fd()
3. Wait for client to request displaying the buffer
4. gbm_bo_map(GBM_BO_TRANSFER_READ)
5. select(fd+1, , NULL, NULL, NULL)
6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | 
DMA_BUF_SYNC_READ })

7. pixman_blt()
8. gbm_bo_unmap()

So if you want to do some rendering with OpenGL and then see the 
result in a buffer memory mapping the correct sequence would be the 
following:


1. Issue OpenGL rendering commands.
2. Call glFlush() to make sure the hw actually starts working on the 
rendering.
3. Call select() on the DMA-buf file descriptor to wait for the 
rendering to complete.

4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.



What I want to do is implement the X server side of DRI3 in just CPU. 
It works for every application I've tested except gnome-shell.


I would assume that 1. and 2. are supposed to be done by the X 
client, i.e. gnome-shell?


What I need to be able to do is access the result of that, once the X 
client tries to draw using that GBM backed pixmap (e.g. using 
PresentPixmap).


So far, we've only tested Intel GPUs, but we are setting up Nvidia 
and AMD GPUs at the moment. It will be interesting to see if the 
issue remains on those or not.


Regards




Re: Does gbm_bo_map() implicitly synchronise?

2024-06-24 Thread James Jones

FWIW, the NVIDIA binary driver's implementation of gbm_bo_map/unmap()

1) Don't do any synchronization against in-flight work. The assumption 
is that if the content is going to be read, the API writing the data has 
established that coherence. Likewise, if it's going to be written, the 
API reading it afterwards does any invalidates or whatever are needed 
for coherence.


2) We don't blit anything or format convert, because our GBM 
implementation has no DMA engine access, and I'd like to keep it that 
way. Setting up a DMA-capable driver instance is much more expensive as 
far as runtime resources than setting up a simple allocator+mmap driver, 
at least in our driver architecture. Our GBM map just does an mmap(), 
and if it's not linear, you're not going to be able to interpret the 
data unless you've read up on our tiling formats. I'm aware this is 
different from Mesa, and no one has complained thus far. If we were 
forced to fix it, I imagine we'd do something like ask a shared engine 
in the kernel to do the blit on userspace's behalf, which would probably 
be slow but save resources.


Basically, don't use gbm_bo_map() for anything non-trivial on our 
implementation. It's not the right tool for e.g., reading back or 
populating OpenGL textures or X pixmaps. If you don't want to run on the 
NV implementation, feel free to ignore this advice, but I'd still 
suggest it's not the best tool for most jobs.


Thanks,
-James

On 6/17/24 03:29, Pierre Ossman wrote:

On 17/06/2024 10:13, Christian König wrote:


Let me try to clarify a couple of things:

The DMA_BUF_IOCTL_SYNC function is to flush and invalidate caches so 
that the GPU can see values written by the CPU and the CPU can see 
values written by the GPU. But that IOCTL does *not* wait for any 
async GPU operation to finish.


If you want to wait for async GPU operations you either need to call 
the OpenGL functions to read pixels or do a select() (or poll, epoll 
etc...) call on the DMA-buf file descriptor.




Thanks for the clarification!

Just to avoid any uncertainty, are both of these things done implicitly 
by gbm_bo_map()/gbm_bo_unmap()?


I did test adding those steps just in case, but unfortunately did not 
see an improvement. My order was:


1. gbm_bo_import(GBM_BO_USE_RENDERING)
2. gbm_bo_get_fd()
3. Wait for client to request displaying the buffer
4. gbm_bo_map(GBM_BO_TRANSFER_READ)
5. select(fd+1, , NULL, NULL, NULL)
6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | 
DMA_BUF_SYNC_READ })

7. pixman_blt()
8. gbm_bo_unmap()

So if you want to do some rendering with OpenGL and then see the 
result in a buffer memory mapping the correct sequence would be the 
following:


1. Issue OpenGL rendering commands.
2. Call glFlush() to make sure the hw actually starts working on the 
rendering.
3. Call select() on the DMA-buf file descriptor to wait for the 
rendering to complete.

4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.



What I want to do is implement the X server side of DRI3 in just CPU. It 
works for every application I've tested except gnome-shell.


I would assume that 1. and 2. are supposed to be done by the X client, 
i.e. gnome-shell?


What I need to be able to do is access the result of that, once the X 
client tries to draw using that GBM backed pixmap (e.g. using 
PresentPixmap).


So far, we've only tested Intel GPUs, but we are setting up Nvidia and 
AMD GPUs at the moment. It will be interesting to see if the issue 
remains on those or not.


Regards


Re: Does gbm_bo_map() implicitly synchronise?

2024-06-20 Thread Pierre Ossman

On 6/20/24 15:59, Pierre Ossman wrote:
We recently identified that it has an issue[2] with synchronization on 
the server side when after glFlush() in the client side the command 
list takes too much (several seconds) to finish the rendering.


[2] https://gitlab.freedesktop.org/mesa/mesa/-/issues/11228



Oh. I can try to test it here. We don't seem to have any synchronisation 
issues now that we got that VNC bug resolved.




I just tested here, and could not see the issue with our implementation 
with either an AMD iGPU or Nvidia dGPU. They might be too fast to 
trigger the issue? I have a Pi4 here as well, but it's not set up for 
this yet.


Regards
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-20 Thread Pierre Ossman

On 6/20/24 11:04, Chema Casanova wrote:


You can have a look at the Open MR we created two years ago for Xserver 
[1] "modesetting: Add DRI3 support to modesetting driver with glamor 
disabled". We are using it downstream for Raspberry Pi OS to enable on 
RPi1-3 GPU accelerated client applications, while the Xserver is using 
software composition with pixman.


[1] https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/945



I did actually look at that to get some idea of how things are 
connected. But the comments suggested that the design wasn't robust, so 
we ended up trying a different approach.


Our work is now available in the latest TigerVNC beta, via this PR:

https://github.com/TigerVNC/tigervnc/pull/1771

We recently identified that it has an issue[2] with synchronization on 
the server side when after glFlush() in the client side the command list 
takes too much (several seconds) to finish the rendering.


[2] https://gitlab.freedesktop.org/mesa/mesa/-/issues/11228



Oh. I can try to test it here. We don't seem to have any synchronisation 
issues now that we got that VNC bug resolved.


The two big issues we have presently is the SIGBUS crash I opened a 
separate thread about, and getting glvnd to choose correctly when the 
Nvidia driver is used.


Regards
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-20 Thread Chema Casanova

El 17/6/24 a las 12:29, Pierre Ossman escribió:


So if you want to do some rendering with OpenGL and then see the 
result in a buffer memory mapping the correct sequence would be the 
following:


1. Issue OpenGL rendering commands.
2. Call glFlush() to make sure the hw actually starts working on the 
rendering.
3. Call select() on the DMA-buf file descriptor to wait for the 
rendering to complete.

4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.



What I want to do is implement the X server side of DRI3 in just CPU. 
It works for every application I've tested except gnome-shell.


You can have a look at the Open MR we created two years ago for Xserver 
[1] "modesetting: Add DRI3 support to modesetting driver with glamor 
disabled". We are using it downstream for Raspberry Pi OS to enable on 
RPi1-3 GPU accelerated client applications, while the Xserver is using 
software composition with pixman.


[1] https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/945

We recently identified that it has an issue[2] with synchronization on 
the server side when after glFlush() in the client side the command list 
takes too much (several seconds) to finish the rendering.


[2] https://gitlab.freedesktop.org/mesa/mesa/-/issues/11228



I would assume that 1. and 2. are supposed to be done by the X client, 
i.e. gnome-shell?


What I need to be able to do is access the result of that, once the X 
client tries to draw using that GBM backed pixmap (e.g. using 
PresentPixmap).


So far, we've only tested Intel GPUs, but we are setting up Nvidia and 
AMD GPUs at the moment. It will be interesting to see if the issue 
remains on those or not.



Regards,

Chema Casanova



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-18 Thread Pierre Ossman

On 17/06/2024 19:18, Pierre Ossman wrote:


Hmm... The source of the blit is CopyWindow being called as a result of 
the window moving. But I would have expected that to be inhibited by the 
fact that a compositor is active. It's also surprising that this only 
happens if DRI3 is involved.


I would also have expected something similar with software rendering. 
Albeit with a PutImage instead of PresentPixmap for the correct data. 
But everything works there.


I will need to dig further.



Well, this is embarrassing. The issue was not in GNOME, Mesa or Xorg. 
They rendered everything absolutely correctly. The issue was in the VNC 
code that didn't pay attention to the fact that the window was 
redirected and so sent bogus rendering instructions to the VNC client. :/


With that fixed everything renders perfectly fine!

Still, thank you for all the insight given regarding GBM!

Regards
--
Pierre Ossman   Software Development
Cendio AB   http://cendio.com
Teknikringen 8  http://twitter.com/ThinLinc
583 30 Linköpinghttp://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-18 Thread Michel Dänzer
On 2024-06-17 19:18, Pierre Ossman wrote:
> On 17/06/2024 18:09, Michel Dänzer wrote:
>>>
>>> Can I know whether it is needed or not? Or should I be cautious and always 
>>> do it?
>>
>> Assuming GBM in the X server uses the GPU HW driver, I'd say it shouldn't be 
>> needed.

Let me revise that statement: It shouldn't be needed, period. If llvmpipe needs 
it, it should happen as part of gbm_bo_map. (Not sure this is implemented at 
this time, I'd argue it's a Mesa bug if not though)


> It does not (except the driver libgbm loads). We're trying to use this in 
> Xvnc, so it's all CPU.

Mesa's GBM backend (built into libgbm) is essentially a frontend for Gallium 
drivers. It initializes a suitable driver for the DRM fd passed to 
gbm_create_device. This could be the GPU HW driver, which might explain why the 
contents from gnome-shell are displayed correctly (eventually).


> We're just trying to make sure the applications can use the full power of the 
> GPU to render their stuff before handing it over to the X server. :)

A note on architecture:

Mutter supports running as a headless Wayland compositor, and supports remote 
desktop (including remote login as of GNOME 46) via gnome-remote-desktop and 
RDP. This allows both Wayland and X (via Xwayland) clients to run with full HW 
acceleration.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Christian König

Am 18.06.24 um 07:01 schrieb Pierre Ossman:

On 17/06/2024 20:18, Christian König wrote:

Am 17.06.24 um 19:18 schrieb Pierre Ossman:

On 17/06/2024 18:09, Michel Dänzer wrote:


Can I know whether it is needed or not? Or should I be cautious 
and always do it?


Assuming GBM in the X server uses the GPU HW driver, I'd say it 
shouldn't be needed.




It does not (except the driver libgbm loads). We're trying to use 
this in Xvnc, so it's all CPU. We're just trying to make sure the 
applications can use the full power of the GPU to render their stuff 
before handing it over to the X server. :)


That whole approach won't work.

When you don't have a HW driver loaded or at least tell the client 
that it should render into a linear buffer somehow then the data in 
the buffer will be tilled in a hw specific format.


As far as I know you can't read that vendor agnostic with the CPU, 
you need the hw driver for that.




I'm confused. What's the goal of the GBM abstraction and specifically 
gbm_bo_map() if it's not a hardware-agnostic way of accessing buffers?


There is no hardware agnostic way of accessing buffers which contain hw 
specific data.


You always need a hw specific backend for that or use the linear flag 
which makes the data hw agnostic.




In practice, we are getting linear buffers. At least on Intel and AMD 
GPUs. Nvidia are being a bit difficult getting GBM working, so we 
haven't tested that yet.


That's either because you have a linear buffer for some reason or the 
hardware specific gbm backend has inserted a blit as Michel described.


I see there is the GBM_BO_USE_LINEAR flag. We have not used it yet, as 
we haven't seen a need for it. What is the effect of that? Would it 
guarantee what we are just lucky to see at the moment?


Michel and/or Marek need to answer that. I'm coming from the kernel side 
and maintaining the DMA-buf implementation backing all this, but I'm not 
an expert on gbm.


Regards,
Christian.



Regards




Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Pierre Ossman

On 17/06/2024 20:18, Christian König wrote:

Am 17.06.24 um 19:18 schrieb Pierre Ossman:

On 17/06/2024 18:09, Michel Dänzer wrote:


Can I know whether it is needed or not? Or should I be cautious and 
always do it?


Assuming GBM in the X server uses the GPU HW driver, I'd say it 
shouldn't be needed.




It does not (except the driver libgbm loads). We're trying to use this 
in Xvnc, so it's all CPU. We're just trying to make sure the 
applications can use the full power of the GPU to render their stuff 
before handing it over to the X server. :)


That whole approach won't work.

When you don't have a HW driver loaded or at least tell the client that 
it should render into a linear buffer somehow then the data in the 
buffer will be tilled in a hw specific format.


As far as I know you can't read that vendor agnostic with the CPU, you 
need the hw driver for that.




I'm confused. What's the goal of the GBM abstraction and specifically 
gbm_bo_map() if it's not a hardware-agnostic way of accessing buffers?


In practice, we are getting linear buffers. At least on Intel and AMD 
GPUs. Nvidia are being a bit difficult getting GBM working, so we 
haven't tested that yet.


I see there is the GBM_BO_USE_LINEAR flag. We have not used it yet, as 
we haven't seen a need for it. What is the effect of that? Would it 
guarantee what we are just lucky to see at the moment?


Regards
--
Pierre Ossman   Software Development
Cendio AB   http://cendio.com
Teknikringen 8  http://twitter.com/ThinLinc
583 30 Linköpinghttp://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Christian König

Am 17.06.24 um 19:18 schrieb Pierre Ossman:

On 17/06/2024 18:09, Michel Dänzer wrote:


Can I know whether it is needed or not? Or should I be cautious and 
always do it?


Assuming GBM in the X server uses the GPU HW driver, I'd say it 
shouldn't be needed.




It does not (except the driver libgbm loads). We're trying to use this 
in Xvnc, so it's all CPU. We're just trying to make sure the 
applications can use the full power of the GPU to render their stuff 
before handing it over to the X server. :)


That whole approach won't work.

When you don't have a HW driver loaded or at least tell the client that 
it should render into a linear buffer somehow then the data in the 
buffer will be tilled in a hw specific format.


As far as I know you can't read that vendor agnostic with the CPU, you 
need the hw driver for that.


Regards,
Christian.





A recording of the issue is available here, in case the behaviour 
rings a bell for anyone:


http://www.cendio.com/~ossman/dri3/Screencast%20from%202024-06-17%2017-06-50.webm 



Interesting. Looks like the surroundings (drop shadow region?) of the 
window move along with it first, then the surroundings get fixed up 
in the next frame.


As far as I know, mutter doesn't move window contents like that on 
the client side; it always redraws the damaged output region from 
scratch. So I wonder if the initial move together with surroundings 
is actually a blit on the X server side (possibly triggered by mutter 
moving the X window in its function as window manager). And then the 
surroundings fixing themselves up is the correct output from mutter 
via DRI3/Present.


If so, the issue isn't synchronization, it's that the first blit 
happens at all.




Hmm... The source of the blit is CopyWindow being called as a result 
of the window moving. But I would have expected that to be inhibited 
by the fact that a compositor is active. It's also surprising that 
this only happens if DRI3 is involved.


I would also have expected something similar with software rendering. 
Albeit with a PutImage instead of PresentPixmap for the correct data. 
But everything works there.


I will need to dig further.

Regards,




Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Pierre Ossman

On 17/06/2024 18:09, Michel Dänzer wrote:


Can I know whether it is needed or not? Or should I be cautious and always do 
it?


Assuming GBM in the X server uses the GPU HW driver, I'd say it shouldn't be 
needed.



It does not (except the driver libgbm loads). We're trying to use this 
in Xvnc, so it's all CPU. We're just trying to make sure the 
applications can use the full power of the GPU to render their stuff 
before handing it over to the X server. :)





A recording of the issue is available here, in case the behaviour rings a bell 
for anyone:

http://www.cendio.com/~ossman/dri3/Screencast%20from%202024-06-17%2017-06-50.webm


Interesting. Looks like the surroundings (drop shadow region?) of the window 
move along with it first, then the surroundings get fixed up in the next frame.

As far as I know, mutter doesn't move window contents like that on the client 
side; it always redraws the damaged output region from scratch. So I wonder if 
the initial move together with surroundings is actually a blit on the X server 
side (possibly triggered by mutter moving the X window in its function as 
window manager). And then the surroundings fixing themselves up is the correct 
output from mutter via DRI3/Present.

If so, the issue isn't synchronization, it's that the first blit happens at all.



Hmm... The source of the blit is CopyWindow being called as a result of 
the window moving. But I would have expected that to be inhibited by the 
fact that a compositor is active. It's also surprising that this only 
happens if DRI3 is involved.


I would also have expected something similar with software rendering. 
Albeit with a PutImage instead of PresentPixmap for the correct data. 
But everything works there.


I will need to dig further.

Regards,
--
Pierre Ossman   Software Development
Cendio AB   http://cendio.com
Teknikringen 8  http://twitter.com/ThinLinc
583 30 Linköpinghttp://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Michel Dänzer
On 2024-06-17 17:27, Pierre Ossman wrote:
> On 17/06/2024 16:50, Michel Dänzer wrote:
>> On 2024-06-17 12:29, Pierre Ossman wrote:
>>>
>>> Just to avoid any uncertainty, are both of these things done implicitly by 
>>> gbm_bo_map()/gbm_bo_unmap()?
>>>
>>> I did test adding those steps just in case, but unfortunately did not see 
>>> an improvement. My order was:
>>>
>>> 1. gbm_bo_import(GBM_BO_USE_RENDERING)
>>> 2. gbm_bo_get_fd()
>>> 3. Wait for client to request displaying the buffer
>>> 4. gbm_bo_map(GBM_BO_TRANSFER_READ)
>>> 5. select(fd+1, , NULL, NULL, NULL)
>>
>> *If* select() is needed, it needs to be before gbm_bo_map(), because the 
>> latter may perform a blit from the real BO to a staging one for CPU access.
>>
> 
> Can I know whether it is needed or not? Or should I be cautious and always do 
> it?

Assuming GBM in the X server uses the GPU HW driver, I'd say it shouldn't be 
needed.


> A recording of the issue is available here, in case the behaviour rings a 
> bell for anyone:
> 
> http://www.cendio.com/~ossman/dri3/Screencast%20from%202024-06-17%2017-06-50.webm

Interesting. Looks like the surroundings (drop shadow region?) of the window 
move along with it first, then the surroundings get fixed up in the next frame.

As far as I know, mutter doesn't move window contents like that on the client 
side; it always redraws the damaged output region from scratch. So I wonder if 
the initial move together with surroundings is actually a blit on the X server 
side (possibly triggered by mutter moving the X window in its function as 
window manager). And then the surroundings fixing themselves up is the correct 
output from mutter via DRI3/Present.

If so, the issue isn't synchronization, it's that the first blit happens at all.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Pierre Ossman

On 17/06/2024 16:50, Michel Dänzer wrote:

On 2024-06-17 12:29, Pierre Ossman wrote:


Just to avoid any uncertainty, are both of these things done implicitly by 
gbm_bo_map()/gbm_bo_unmap()?

I did test adding those steps just in case, but unfortunately did not see an 
improvement. My order was:

1. gbm_bo_import(GBM_BO_USE_RENDERING)
2. gbm_bo_get_fd()
3. Wait for client to request displaying the buffer
4. gbm_bo_map(GBM_BO_TRANSFER_READ)
5. select(fd+1, , NULL, NULL, NULL)


*If* select() is needed, it needs to be before gbm_bo_map(), because the latter 
may perform a blit from the real BO to a staging one for CPU access.



Can I know whether it is needed or not? Or should I be cautious and 
always do it?


I also assumed I should do select() with readfds set when I want to 
read, and writefds set when I want to write?


Still, after moving it before the map the issue unfortunately remains. :/

A recording of the issue is available here, in case the behaviour rings 
a bell for anyone:


http://www.cendio.com/~ossman/dri3/Screencast%20from%202024-06-17%2017-06-50.webm

(tried to include it as an attachment, but that email was filtered out 
somewhere)


Regards,
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Michel Dänzer
On 2024-06-17 16:52, Christian König wrote:
> Am 17.06.24 um 16:50 schrieb Michel Dänzer:
>> On 2024-06-17 12:29, Pierre Ossman wrote:
>>> Just to avoid any uncertainty, are both of these things done implicitly by 
>>> gbm_bo_map()/gbm_bo_unmap()?
>>>
>>> I did test adding those steps just in case, but unfortunately did not see 
>>> an improvement. My order was:
>>>
>>> 1. gbm_bo_import(GBM_BO_USE_RENDERING)
>>> 2. gbm_bo_get_fd()
>>> 3. Wait for client to request displaying the buffer
>>> 4. gbm_bo_map(GBM_BO_TRANSFER_READ)
>>> 5. select(fd+1, , NULL, NULL, NULL)
>> *If* select() is needed, it needs to be before gbm_bo_map(), because the 
>> latter may perform a blit from the real BO to a staging one for CPU access.
> 
> But don't you then need to wait for the blit to finish?

No, gbm_bo_map() must handle that internally. When it returns, the CPU must see 
the correct contents.


>>> 6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | 
>>> DMA_BUF_SYNC_READ })
>> gbm_bo_map() should do this internally if needed.
>>
>>
>>> 7. pixman_blt()
>>> 8. gbm_bo_unmap()
>>
> 

-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Christian König

Am 17.06.24 um 16:55 schrieb Michel Dänzer:

On 2024-06-17 16:52, Christian König wrote:

Am 17.06.24 um 16:50 schrieb Michel Dänzer:

On 2024-06-17 12:29, Pierre Ossman wrote:

Just to avoid any uncertainty, are both of these things done implicitly by 
gbm_bo_map()/gbm_bo_unmap()?

I did test adding those steps just in case, but unfortunately did not see an 
improvement. My order was:

1. gbm_bo_import(GBM_BO_USE_RENDERING)
2. gbm_bo_get_fd()
3. Wait for client to request displaying the buffer
4. gbm_bo_map(GBM_BO_TRANSFER_READ)
5. select(fd+1, , NULL, NULL, NULL)

*If* select() is needed, it needs to be before gbm_bo_map(), because the latter 
may perform a blit from the real BO to a staging one for CPU access.

But don't you then need to wait for the blit to finish?

No, gbm_bo_map() must handle that internally. When it returns, the CPU must see 
the correct contents.


Ah, ok in that case that function does more than I expected.

Thanks,
Christian.





6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ 
})

gbm_bo_map() should do this internally if needed.



7. pixman_blt()
8. gbm_bo_unmap()




Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Michel Dänzer
On 2024-06-17 12:29, Pierre Ossman wrote:
>
> Just to avoid any uncertainty, are both of these things done implicitly by 
> gbm_bo_map()/gbm_bo_unmap()?
> 
> I did test adding those steps just in case, but unfortunately did not see an 
> improvement. My order was:
> 
> 1. gbm_bo_import(GBM_BO_USE_RENDERING)
> 2. gbm_bo_get_fd()
> 3. Wait for client to request displaying the buffer
> 4. gbm_bo_map(GBM_BO_TRANSFER_READ)
> 5. select(fd+1, , NULL, NULL, NULL)

*If* select() is needed, it needs to be before gbm_bo_map(), because the latter 
may perform a blit from the real BO to a staging one for CPU access.


> 6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | 
> DMA_BUF_SYNC_READ })

gbm_bo_map() should do this internally if needed.


> 7. pixman_blt()
> 8. gbm_bo_unmap()


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Christian König

Am 17.06.24 um 16:50 schrieb Michel Dänzer:

On 2024-06-17 12:29, Pierre Ossman wrote:

Just to avoid any uncertainty, are both of these things done implicitly by 
gbm_bo_map()/gbm_bo_unmap()?

I did test adding those steps just in case, but unfortunately did not see an 
improvement. My order was:

1. gbm_bo_import(GBM_BO_USE_RENDERING)
2. gbm_bo_get_fd()
3. Wait for client to request displaying the buffer
4. gbm_bo_map(GBM_BO_TRANSFER_READ)
5. select(fd+1, , NULL, NULL, NULL)

*If* select() is needed, it needs to be before gbm_bo_map(), because the latter 
may perform a blit from the real BO to a staging one for CPU access.


But don't you then need to wait for the blit to finish?

Regards,
Christian.





6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ 
})

gbm_bo_map() should do this internally if needed.



7. pixman_blt()
8. gbm_bo_unmap()






Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Christian König

Am 17.06.24 um 12:29 schrieb Pierre Ossman:

On 17/06/2024 10:13, Christian König wrote:


Let me try to clarify a couple of things:

The DMA_BUF_IOCTL_SYNC function is to flush and invalidate caches so 
that the GPU can see values written by the CPU and the CPU can see 
values written by the GPU. But that IOCTL does *not* wait for any 
async GPU operation to finish.


If you want to wait for async GPU operations you either need to call 
the OpenGL functions to read pixels or do a select() (or poll, epoll 
etc...) call on the DMA-buf file descriptor.




Thanks for the clarification!

Just to avoid any uncertainty, are both of these things done 
implicitly by gbm_bo_map()/gbm_bo_unmap()?


gbm_bo_map() is *not* doing any synchronization whatsoever as far as I 
know. It just does the steps necessary for the mmap().




I did test adding those steps just in case, but unfortunately did not 
see an improvement. My order was:


1. gbm_bo_import(GBM_BO_USE_RENDERING)
2. gbm_bo_get_fd()
3. Wait for client to request displaying the buffer
4. gbm_bo_map(GBM_BO_TRANSFER_READ)
5. select(fd+1, , NULL, NULL, NULL)
6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | 
DMA_BUF_SYNC_READ })

7. pixman_blt()
8. gbm_bo_unmap()


At least of hand that looks like it should work.



So if you want to do some rendering with OpenGL and then see the 
result in a buffer memory mapping the correct sequence would be the 
following:


1. Issue OpenGL rendering commands.
2. Call glFlush() to make sure the hw actually starts working on the 
rendering.
3. Call select() on the DMA-buf file descriptor to wait for the 
rendering to complete.

4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.



What I want to do is implement the X server side of DRI3 in just CPU. 
It works for every application I've tested except gnome-shell.


I would assume that 1. and 2. are supposed to be done by the X client, 
i.e. gnome-shell?


Yes, exactly that.



What I need to be able to do is access the result of that, once the X 
client tries to draw using that GBM backed pixmap (e.g. using 
PresentPixmap).


No idea why that doesn't work.

Regards,
Christian.



So far, we've only tested Intel GPUs, but we are setting up Nvidia and 
AMD GPUs at the moment. It will be interesting to see if the issue 
remains on those or not.


Regards




Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Pierre Ossman

On 17/06/2024 10:13, Christian König wrote:


Let me try to clarify a couple of things:

The DMA_BUF_IOCTL_SYNC function is to flush and invalidate caches so 
that the GPU can see values written by the CPU and the CPU can see 
values written by the GPU. But that IOCTL does *not* wait for any async 
GPU operation to finish.


If you want to wait for async GPU operations you either need to call the 
OpenGL functions to read pixels or do a select() (or poll, epoll etc...) 
call on the DMA-buf file descriptor.




Thanks for the clarification!

Just to avoid any uncertainty, are both of these things done implicitly 
by gbm_bo_map()/gbm_bo_unmap()?


I did test adding those steps just in case, but unfortunately did not 
see an improvement. My order was:


1. gbm_bo_import(GBM_BO_USE_RENDERING)
2. gbm_bo_get_fd()
3. Wait for client to request displaying the buffer
4. gbm_bo_map(GBM_BO_TRANSFER_READ)
5. select(fd+1, , NULL, NULL, NULL)
6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | 
DMA_BUF_SYNC_READ })

7. pixman_blt()
8. gbm_bo_unmap()

So if you want to do some rendering with OpenGL and then see the result 
in a buffer memory mapping the correct sequence would be the following:


1. Issue OpenGL rendering commands.
2. Call glFlush() to make sure the hw actually starts working on the 
rendering.
3. Call select() on the DMA-buf file descriptor to wait for the 
rendering to complete.

4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.



What I want to do is implement the X server side of DRI3 in just CPU. It 
works for every application I've tested except gnome-shell.


I would assume that 1. and 2. are supposed to be done by the X client, 
i.e. gnome-shell?


What I need to be able to do is access the result of that, once the X 
client tries to draw using that GBM backed pixmap (e.g. using 
PresentPixmap).


So far, we've only tested Intel GPUs, but we are setting up Nvidia and 
AMD GPUs at the moment. It will be interesting to see if the issue 
remains on those or not.


Regards
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Christian König

Am 17.06.24 um 09:32 schrieb Pierre Ossman:

On 15/06/2024 13:35, Marek Olšák wrote:

It's probably driver-specific. Some drivers might need glFlush before
you use gbm_bo_map because gbm might only wait for work that has been
flushed.



That would be needed on the "writing" side, right? So if I'm seeing 
issues when mapping for reading, then it would indicate a bug in the 
other peer? Which would be gnome-shell in my case.


Any way I could test this? Can I force extra syncs/flushes in some way 
and see if the issue goes away?


Well the primary question here is what do you want to wait for?

As Marek wrote GBM and the kernel can only see work which has been 
flushed and is not queued up inside the OpenGL library for example.


I tried adding a sleep of 10ms before reading the data, but did not 
see any improvement. Which would make sense if the commands are still 
sitting in an application buffer somewhere, rather than with the GPU.


Let me try to clarify a couple of things:

The DMA_BUF_IOCTL_SYNC function is to flush and invalidate caches so 
that the GPU can see values written by the CPU and the CPU can see 
values written by the GPU. But that IOCTL does *not* wait for any async 
GPU operation to finish.


If you want to wait for async GPU operations you either need to call the 
OpenGL functions to read pixels or do a select() (or poll, epoll etc...) 
call on the DMA-buf file descriptor.


So if you want to do some rendering with OpenGL and then see the result 
in a buffer memory mapping the correct sequence would be the following:


1. Issue OpenGL rendering commands.
2. Call glFlush() to make sure the hw actually starts working on the 
rendering.
3. Call select() on the DMA-buf file descriptor to wait for the 
rendering to complete.

4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.

Regards,
Christian.



Regards




Re: Does gbm_bo_map() implicitly synchronise?

2024-06-17 Thread Pierre Ossman

On 15/06/2024 13:35, Marek Olšák wrote:

It's probably driver-specific. Some drivers might need glFlush before
you use gbm_bo_map because gbm might only wait for work that has been
flushed.



That would be needed on the "writing" side, right? So if I'm seeing 
issues when mapping for reading, then it would indicate a bug in the 
other peer? Which would be gnome-shell in my case.


Any way I could test this? Can I force extra syncs/flushes in some way 
and see if the issue goes away?


I tried adding a sleep of 10ms before reading the data, but did not see 
any improvement. Which would make sense if the commands are still 
sitting in an application buffer somewhere, rather than with the GPU.


Regards
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-15 Thread Marek Olšák
It's probably driver-specific. Some drivers might need glFlush before
you use gbm_bo_map because gbm might only wait for work that has been
flushed.

Marek

On Sat, Jun 15, 2024 at 4:29 AM Pierre Ossman  wrote:
>
> On 15/06/2024 07:54, Marek Olšák wrote:
> > gbm_bo_map synchronizes if it needs to move memory to make the buffer
> > readable by the CPU or if the buffer is being used/written by the GPU.
> >
>
> Great, thanks! That means I need to look elsewhere for the source of my
> issue.
>
> I was concerned that since I was accessing the data using gbm_bo_map(),
> rather than using OpenGL, I was missing out on some synchronisation step
> and getting data before the GPU had finished any queued rendering.
>
> Regards
> --
> Pierre Ossman   Software Development
> Cendio AB   http://cendio.com
> Teknikringen 8  http://twitter.com/ThinLinc
> 583 30 Linköpinghttp://facebook.com/ThinLinc
> Phone: +46-13-214600
>
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
>


Re: Does gbm_bo_map() implicitly synchronise?

2024-06-15 Thread Pierre Ossman

On 15/06/2024 07:54, Marek Olšák wrote:

gbm_bo_map synchronizes if it needs to move memory to make the buffer
readable by the CPU or if the buffer is being used/written by the GPU.



Great, thanks! That means I need to look elsewhere for the source of my 
issue.


I was concerned that since I was accessing the data using gbm_bo_map(), 
rather than using OpenGL, I was missing out on some synchronisation step 
and getting data before the GPU had finished any queued rendering.


Regards
--
Pierre Ossman   Software Development
Cendio AB   http://cendio.com
Teknikringen 8  http://twitter.com/ThinLinc
583 30 Linköpinghttp://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-14 Thread Marek Olšák
gbm_bo_map synchronizes if it needs to move memory to make the buffer
readable by the CPU or if the buffer is being used/written by the GPU.

Marek

On Sat, Jun 15, 2024 at 1:12 AM Pierre Ossman  wrote:
>
> I'm experimenting with DRI3 and its use of GBM to share buffers. It
> mostly works fine, but I'm seeing some issues that have me concerned
> there might be a synchronisation issue.
>
> The documentation isn't entirely clear, so my question is if
> gbm_bo_map() handles all the implicit synchronisation for me, or if
> there is something more I can do?
>
> I tried doing gbm_bo_get_fd() followed by a select() and
> ioctl(DMA_BUF_IOCTL_SYNC), but my issue did not go away. Now I'm unsure
> if I'm doing it wrong, or if I'm chasing the wrong theory.
>
> Anyone with insight on what's needed for stable synchronisation?
>
> Regards,
> --
> Pierre Ossman   Software Development
> Cendio AB   https://cendio.com
> Teknikringen 8  https://twitter.com/ThinLinc
> 583 30 Linköpinghttps://facebook.com/ThinLinc
> Phone: +46-13-214600
>
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?


Does gbm_bo_map() implicitly synchronise?

2024-06-14 Thread Pierre Ossman
I'm experimenting with DRI3 and its use of GBM to share buffers. It 
mostly works fine, but I'm seeing some issues that have me concerned 
there might be a synchronisation issue.


The documentation isn't entirely clear, so my question is if 
gbm_bo_map() handles all the implicit synchronisation for me, or if 
there is something more I can do?


I tried doing gbm_bo_get_fd() followed by a select() and 
ioctl(DMA_BUF_IOCTL_SYNC), but my issue did not go away. Now I'm unsure 
if I'm doing it wrong, or if I'm chasing the wrong theory.


Anyone with insight on what's needed for stable synchronisation?

Regards,
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?