Re: time for amber2 branch?

2024-06-20 Thread Faith Ekstrand
On Thu, Jun 20, 2024 at 12:30 PM Adam Jackson  wrote:

> On Thu, Jun 20, 2024 at 10:20 AM Erik Faye-Lund <
> erik.faye-l...@collabora.com> wrote:
>
>> When we did Amber, we had a lot better reason to do so than "these
>> drivers cause pain when doing big tree updates". The maintenance burden
>> imposed by the drivers proposed for removal here is much, much smaller,
>> and doesn't really let us massively clean up things in a way comparable
>> to last time.
>>
>
> Yeah, amber was primarily about mothballing src/mesa/drivers/ in my
> opinion. It happened to correlate well with the GL 1.x vs 2.0 generational
> divide, but that was largely because we had slowly migrated all the GL2
> hardware to gallium drivers (iris and crocus and i915g and r300g were a lot
> of work, let's do remember), so the remaining "classic" drivers were only
> the best choice for fixed function hardware. Nice bright line in the sand,
> there, between the register bank of an overgrown SGI Indy as your state
> vector, and the threat of a Turing-complete shader engine.
>
> I have a harder time finding that line in the sand today. ES3? Compute
> shaders? Vulkan 1.0? I'm not sure any of these so fundamentally change the
> device programming model, or the baseline API assumptions, that we would
> benefit by requiring it of the hardware. I'm happy to be wrong about that!
> We're using compute shaders internally in more and more ways, for example,
> maybe being able to assume them would be a win. If there's a better design
> to be had past some feature level, then by all means let's have that
> discussion.
>
> But if the issue is we don't like how many drivers there are then I am
> sorry but at some level that is simply the dimension of the problem. Mesa's
> breadth of hardware coverage is at the core of its success. You'd be
> hard-pressed to find a GLES1 part anymore, but there are brand-new systems
> with Mali-400 MP GPUs, and there's no reason the world's finest GLES2
> implementation should stop working there.
>

Same. I kinda think the next major cut will be when we go Vulkan-only and
leave Zink and a bunch of legacy drivers in a GL branch. That's probably
not going to happen for another 5 years at least.

~Faith


Re: time for amber2 branch?

2024-06-20 Thread Triang3l

On 19/06/2024 20:34, Mike Blumenkrantz wrote:
> Terakan is not a Mesa driver, and Mesa has no obligation to cater to 
out-of-tree projects which use its internal API. For everything else, 
see above.


I don't think, however, that it can simply be dismissed like it doesn't 
exist when it's:
 • striving to become a part of Mesa among the "cool" drivers with 
broad extension support like RADV, Anvil, Turnip, and now NVK;
 • actively developed nearly every day (albeit for around 2 hours per 
day on average because it's a free time project);
 • trying to explore horizons Mesa hasn't been to yet (submitting 
hardware commands directly on Windows).


As for R600g, it's one thing to drop the constraints imposed by some 
Direct3D 9 level GPUs that, for instance, don't even support integers in 
shaders or something like that (if that's even actually causing issues 
in reality that slow down development of everything else significantly — 
the broad hardware support is something that I absolutely LOVE Mesa and 
overall open source infrastructure for, and I think that's the case for 
many others too), but here we're talking about Direct3D 11 (or 10, but 
programmed largely the same way) class hardware with OpenGL 4.5 already 
supported, and 4.6 being straightforward to implement.


This means that, with the exception of OpenCL-specific global addressing 
issues (R9xx can have a 4 GB "global memory" binding though possibly), 
the interface contract between Gallium's internals and R600g shouldn't 
differ that much from that of the more modern drivers — the _hardware_ 
architecture itself doesn't really warrant dropping active support in 
common code.


Incidents like one change suddenly breaking vertex strides are thus 
mainly a problem in how _the driver itself_ is written, and that's of 
course another story… While I can't say much about Gallium interactions 
specifically, I keep encountering more and more things that are 
unhandled or broken in how the driver actually works with the GPU, and 
there are many Piglit tests that fail. I can imagine the way R600g is 
integrated into Gallium isn't in a much better state.


So I think it may make sense (even though I definitely don't see any 
serious necessity) to **temporarily** place R600g in a more stable 
environment where regressions in it are less likely to happen, but then 
once it's brought up to modern Mesa quality standards, and when it 
becomes more friendly to the rest of Mesa, to **move it back** to the 
main branch (but that may stumble upon a huge lot of interface version 
conflicts, who knows). Some of the things we can do to clean it up are:


 • Make patterns of interaction with other subsystems of Gallium more 
similar to those used by other drivers. Maybe use RadeonSI as the 
primary example because of their shared roots.
 • Fix some GPU configuration bugs — that I described in my previous 
message, as well as some other ones, such as these small ones:
   • Emit all viewports and scissors at once without using the dirty 
mask because the hardware requires that (already handled years ago in 
RadeonSI).
   • Fix gl_VertexID in indirect draws — the DRAW_INDIRECT packets 
write the base to SQ_VTX_BASE_VTX_LOC, which has an effect on vertex 
fetch instructions, but not on the vertex ID input; instead switch from 
SQ_VTX_FETCH_VERTEX_DATA to SQ_VTX_FETCH_NO_INDEX_OFFSET, and COPY_DW 
the base to VGT_INDX_OFFSET.
   • Properly configure the export format of the pixel shader DB export 
vector (gl_FragDepth, gl_FragStencilRefARB, gl_SampleMask).
   • Investigate how queries currently work if the command buffer was 
split in the middle of a query, add the necessary stitching where needed.
 • Make Piglit squeal less. I remember trying to experiment with 
glDispatchComputeIndirect, only to find out that the test I wanted to 
run to verify my solution was broken for another reason. Oink oink.
 • If needed, remove the remaining references to TGSI enums, and also 
switch to the NIR transform feedback interface that, as far as I 
understand, is compatible with Nine and D3D10 frontends (or maybe it's 
the other way around (= either way, make that consistent).

 • Do some cleanup in common areas:
   • Register, packet and shader structures can be moved to JSON 
definitions similar to those used for GCN/RDNA, but with more clear 
indication of the architecture revisions they can be used on (without 
splitting into r600d.h and evergreend.h). I've already stumbled upon a 
typo in that probably hand-written S_/G_/C_ #define soup that has caused 
weird Vulkan CTS failures once, specifically in 
C_028780_BLEND_CONTROL_ENABLE in evergreend.h, and who knows what other 
surprises may be there. Some fields there are apparently just for the 
wrong architecture revisions (though maybe actually present, but 
undocumented, I don't know, given the [RESERVED] situation with the 
documentation for anisotropic filtering and maybe non-1D/2D_THIN tiling 
modes, for example, and that we 

Re: time for amber2 branch?

2024-06-20 Thread Adam Jackson
On Thu, Jun 20, 2024 at 10:20 AM Erik Faye-Lund <
erik.faye-l...@collabora.com> wrote:

> When we did Amber, we had a lot better reason to do so than "these
> drivers cause pain when doing big tree updates". The maintenance burden
> imposed by the drivers proposed for removal here is much, much smaller,
> and doesn't really let us massively clean up things in a way comparable
> to last time.
>

Yeah, amber was primarily about mothballing src/mesa/drivers/ in my
opinion. It happened to correlate well with the GL 1.x vs 2.0 generational
divide, but that was largely because we had slowly migrated all the GL2
hardware to gallium drivers (iris and crocus and i915g and r300g were a lot
of work, let's do remember), so the remaining "classic" drivers were only
the best choice for fixed function hardware. Nice bright line in the sand,
there, between the register bank of an overgrown SGI Indy as your state
vector, and the threat of a Turing-complete shader engine.

I have a harder time finding that line in the sand today. ES3? Compute
shaders? Vulkan 1.0? I'm not sure any of these so fundamentally change the
device programming model, or the baseline API assumptions, that we would
benefit by requiring it of the hardware. I'm happy to be wrong about that!
We're using compute shaders internally in more and more ways, for example,
maybe being able to assume them would be a win. If there's a better design
to be had past some feature level, then by all means let's have that
discussion.

But if the issue is we don't like how many drivers there are then I am
sorry but at some level that is simply the dimension of the problem. Mesa's
breadth of hardware coverage is at the core of its success. You'd be
hard-pressed to find a GLES1 part anymore, but there are brand-new systems
with Mali-400 MP GPUs, and there's no reason the world's finest GLES2
implementation should stop working there.

- ajax


Re: SIGBUS with gbm_bo_map() and Intel ARC

2024-06-20 Thread Pierre Ossman

On 6/20/24 16:29, Pierre Ossman wrote:

On 6/19/24 11:36, Pierre Ossman wrote:


Is there something special I need to pay attention to when doing cross 
GPU stuff? I would have assumed that gbm_bo_import() would have 
complained if this was an incompatible setup.




It does indeed look like some step is missing. If I examine 
/proc//maps, I can see that the accessed memory address is 
associated with the wrong render node:


Crash reading 0x7fffe4176000
7fffe4176000-7fffe440 rw-s 100602000 00:06 500 /dev/dri/renderD128



In cross-GPU combinations where it works, I'm seeing this map instead:

7fffef30e000-7fffef408000 rw-s 100056000 00:0b 533 
/dmabuf:


Regards
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: SIGBUS with gbm_bo_map() and Intel ARC

2024-06-20 Thread Pierre Ossman

On 6/19/24 11:36, Pierre Ossman wrote:


Is there something special I need to pay attention to when doing cross 
GPU stuff? I would have assumed that gbm_bo_import() would have 
complained if this was an incompatible setup.




It does indeed look like some step is missing. If I examine 
/proc//maps, I can see that the accessed memory address is 
associated with the wrong render node:


Crash reading 0x7fffe4176000
7fffe4176000-7fffe440 rw-s 100602000 00:06 500 
/dev/dri/renderD128


The X server is using renderD128, but the client is using renderD129.

This works with other X servers, so I assume there is some way to 
resolve this. But where do I start looking?


The fd I'm getting is a DMA-BUF fd, I assume? I can't find many ioctls 
for that. But that's also all I'm getting, so there must be something 
I'm supposed to do with that fd?


Help! :/

Regards
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: time for amber2 branch?

2024-06-20 Thread Erik Faye-Lund
On Wed, 2024-06-19 at 10:33 -0400, Mike Blumenkrantz wrote:
> In looking at the gallium tree, I'm wondering if it isn't time for a
> second amber branch to prune some of the drivers that cause pain when
> doing big tree updates:
> 
> * nv30
> * r300
> * r600
> * lima
> * virgl
> * tegra
> * ???
> 
> There's nothing stopping these drivers from continuing to develop in
> an amber branch, but the risk of them being broken by other tree
> refactorings is lowered, and then we are able to delete lots of
> legacy code in the main branch.
> 
> Thoughts?

When we did Amber, we had a lot better reason to do so than "these
drivers cause pain when doing big tree updates". The maintenance burden
imposed by the drivers proposed for removal here is much, much smaller,
and doesn't really let us massively clean up things in a way comparable
to last time.

I'm not convinced that this is a good idea. Most (if not all) of these
drivers are still useful, and several of them are actively maintained.
Pulling them out of main makes very little sense to me.

What exactly are you hoping to gain from this? If it's just that
they're old hardware with less capabilities, perhaps we can address the
problems from that in a different way, by (for instance) introducing a
"legacy hw" gallium layer, so legacy HW details doesn't have to leak
out into the rest of gallium...


Re: Does gbm_bo_map() implicitly synchronise?

2024-06-20 Thread Pierre Ossman

On 6/20/24 15:59, Pierre Ossman wrote:
We recently identified that it has an issue[2] with synchronization on 
the server side when after glFlush() in the client side the command 
list takes too much (several seconds) to finish the rendering.


[2] https://gitlab.freedesktop.org/mesa/mesa/-/issues/11228



Oh. I can try to test it here. We don't seem to have any synchronisation 
issues now that we got that VNC bug resolved.




I just tested here, and could not see the issue with our implementation 
with either an AMD iGPU or Nvidia dGPU. They might be too fast to 
trigger the issue? I have a Pi4 here as well, but it's not set up for 
this yet.


Regards
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-20 Thread Pierre Ossman

On 6/20/24 11:04, Chema Casanova wrote:


You can have a look at the Open MR we created two years ago for Xserver 
[1] "modesetting: Add DRI3 support to modesetting driver with glamor 
disabled". We are using it downstream for Raspberry Pi OS to enable on 
RPi1-3 GPU accelerated client applications, while the Xserver is using 
software composition with pixman.


[1] https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/945



I did actually look at that to get some idea of how things are 
connected. But the comments suggested that the design wasn't robust, so 
we ended up trying a different approach.


Our work is now available in the latest TigerVNC beta, via this PR:

https://github.com/TigerVNC/tigervnc/pull/1771

We recently identified that it has an issue[2] with synchronization on 
the server side when after glFlush() in the client side the command list 
takes too much (several seconds) to finish the rendering.


[2] https://gitlab.freedesktop.org/mesa/mesa/-/issues/11228



Oh. I can try to test it here. We don't seem to have any synchronisation 
issues now that we got that VNC bug resolved.


The two big issues we have presently is the SIGBUS crash I opened a 
separate thread about, and getting glvnd to choose correctly when the 
Nvidia driver is used.


Regards
--
Pierre Ossman   Software Development
Cendio AB   https://cendio.com
Teknikringen 8  https://twitter.com/ThinLinc
583 30 Linköpinghttps://facebook.com/ThinLinc
Phone: +46-13-214600

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?



Re: Does gbm_bo_map() implicitly synchronise?

2024-06-20 Thread Chema Casanova

El 17/6/24 a las 12:29, Pierre Ossman escribió:


So if you want to do some rendering with OpenGL and then see the 
result in a buffer memory mapping the correct sequence would be the 
following:


1. Issue OpenGL rendering commands.
2. Call glFlush() to make sure the hw actually starts working on the 
rendering.
3. Call select() on the DMA-buf file descriptor to wait for the 
rendering to complete.

4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.



What I want to do is implement the X server side of DRI3 in just CPU. 
It works for every application I've tested except gnome-shell.


You can have a look at the Open MR we created two years ago for Xserver 
[1] "modesetting: Add DRI3 support to modesetting driver with glamor 
disabled". We are using it downstream for Raspberry Pi OS to enable on 
RPi1-3 GPU accelerated client applications, while the Xserver is using 
software composition with pixman.


[1] https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/945

We recently identified that it has an issue[2] with synchronization on 
the server side when after glFlush() in the client side the command list 
takes too much (several seconds) to finish the rendering.


[2] https://gitlab.freedesktop.org/mesa/mesa/-/issues/11228



I would assume that 1. and 2. are supposed to be done by the X client, 
i.e. gnome-shell?


What I need to be able to do is access the result of that, once the X 
client tries to draw using that GBM backed pixmap (e.g. using 
PresentPixmap).


So far, we've only tested Intel GPUs, but we are setting up Nvidia and 
AMD GPUs at the moment. It will be interesting to see if the issue 
remains on those or not.



Regards,

Chema Casanova