Re: time for amber2 branch?
On Thu, Jun 20, 2024 at 12:30 PM Adam Jackson wrote: > On Thu, Jun 20, 2024 at 10:20 AM Erik Faye-Lund < > erik.faye-l...@collabora.com> wrote: > >> When we did Amber, we had a lot better reason to do so than "these >> drivers cause pain when doing big tree updates". The maintenance burden >> imposed by the drivers proposed for removal here is much, much smaller, >> and doesn't really let us massively clean up things in a way comparable >> to last time. >> > > Yeah, amber was primarily about mothballing src/mesa/drivers/ in my > opinion. It happened to correlate well with the GL 1.x vs 2.0 generational > divide, but that was largely because we had slowly migrated all the GL2 > hardware to gallium drivers (iris and crocus and i915g and r300g were a lot > of work, let's do remember), so the remaining "classic" drivers were only > the best choice for fixed function hardware. Nice bright line in the sand, > there, between the register bank of an overgrown SGI Indy as your state > vector, and the threat of a Turing-complete shader engine. > > I have a harder time finding that line in the sand today. ES3? Compute > shaders? Vulkan 1.0? I'm not sure any of these so fundamentally change the > device programming model, or the baseline API assumptions, that we would > benefit by requiring it of the hardware. I'm happy to be wrong about that! > We're using compute shaders internally in more and more ways, for example, > maybe being able to assume them would be a win. If there's a better design > to be had past some feature level, then by all means let's have that > discussion. > > But if the issue is we don't like how many drivers there are then I am > sorry but at some level that is simply the dimension of the problem. Mesa's > breadth of hardware coverage is at the core of its success. You'd be > hard-pressed to find a GLES1 part anymore, but there are brand-new systems > with Mali-400 MP GPUs, and there's no reason the world's finest GLES2 > implementation should stop working there. > Same. I kinda think the next major cut will be when we go Vulkan-only and leave Zink and a bunch of legacy drivers in a GL branch. That's probably not going to happen for another 5 years at least. ~Faith
Re: time for amber2 branch?
On 19/06/2024 20:34, Mike Blumenkrantz wrote: > Terakan is not a Mesa driver, and Mesa has no obligation to cater to out-of-tree projects which use its internal API. For everything else, see above. I don't think, however, that it can simply be dismissed like it doesn't exist when it's: • striving to become a part of Mesa among the "cool" drivers with broad extension support like RADV, Anvil, Turnip, and now NVK; • actively developed nearly every day (albeit for around 2 hours per day on average because it's a free time project); • trying to explore horizons Mesa hasn't been to yet (submitting hardware commands directly on Windows). As for R600g, it's one thing to drop the constraints imposed by some Direct3D 9 level GPUs that, for instance, don't even support integers in shaders or something like that (if that's even actually causing issues in reality that slow down development of everything else significantly — the broad hardware support is something that I absolutely LOVE Mesa and overall open source infrastructure for, and I think that's the case for many others too), but here we're talking about Direct3D 11 (or 10, but programmed largely the same way) class hardware with OpenGL 4.5 already supported, and 4.6 being straightforward to implement. This means that, with the exception of OpenCL-specific global addressing issues (R9xx can have a 4 GB "global memory" binding though possibly), the interface contract between Gallium's internals and R600g shouldn't differ that much from that of the more modern drivers — the _hardware_ architecture itself doesn't really warrant dropping active support in common code. Incidents like one change suddenly breaking vertex strides are thus mainly a problem in how _the driver itself_ is written, and that's of course another story… While I can't say much about Gallium interactions specifically, I keep encountering more and more things that are unhandled or broken in how the driver actually works with the GPU, and there are many Piglit tests that fail. I can imagine the way R600g is integrated into Gallium isn't in a much better state. So I think it may make sense (even though I definitely don't see any serious necessity) to **temporarily** place R600g in a more stable environment where regressions in it are less likely to happen, but then once it's brought up to modern Mesa quality standards, and when it becomes more friendly to the rest of Mesa, to **move it back** to the main branch (but that may stumble upon a huge lot of interface version conflicts, who knows). Some of the things we can do to clean it up are: • Make patterns of interaction with other subsystems of Gallium more similar to those used by other drivers. Maybe use RadeonSI as the primary example because of their shared roots. • Fix some GPU configuration bugs — that I described in my previous message, as well as some other ones, such as these small ones: • Emit all viewports and scissors at once without using the dirty mask because the hardware requires that (already handled years ago in RadeonSI). • Fix gl_VertexID in indirect draws — the DRAW_INDIRECT packets write the base to SQ_VTX_BASE_VTX_LOC, which has an effect on vertex fetch instructions, but not on the vertex ID input; instead switch from SQ_VTX_FETCH_VERTEX_DATA to SQ_VTX_FETCH_NO_INDEX_OFFSET, and COPY_DW the base to VGT_INDX_OFFSET. • Properly configure the export format of the pixel shader DB export vector (gl_FragDepth, gl_FragStencilRefARB, gl_SampleMask). • Investigate how queries currently work if the command buffer was split in the middle of a query, add the necessary stitching where needed. • Make Piglit squeal less. I remember trying to experiment with glDispatchComputeIndirect, only to find out that the test I wanted to run to verify my solution was broken for another reason. Oink oink. • If needed, remove the remaining references to TGSI enums, and also switch to the NIR transform feedback interface that, as far as I understand, is compatible with Nine and D3D10 frontends (or maybe it's the other way around (= either way, make that consistent). • Do some cleanup in common areas: • Register, packet and shader structures can be moved to JSON definitions similar to those used for GCN/RDNA, but with more clear indication of the architecture revisions they can be used on (without splitting into r600d.h and evergreend.h). I've already stumbled upon a typo in that probably hand-written S_/G_/C_ #define soup that has caused weird Vulkan CTS failures once, specifically in C_028780_BLEND_CONTROL_ENABLE in evergreend.h, and who knows what other surprises may be there. Some fields there are apparently just for the wrong architecture revisions (though maybe actually present, but undocumented, I don't know, given the [RESERVED] situation with the documentation for anisotropic filtering and maybe non-1D/2D_THIN tiling modes, for example, and that we
Re: time for amber2 branch?
On Thu, Jun 20, 2024 at 10:20 AM Erik Faye-Lund < erik.faye-l...@collabora.com> wrote: > When we did Amber, we had a lot better reason to do so than "these > drivers cause pain when doing big tree updates". The maintenance burden > imposed by the drivers proposed for removal here is much, much smaller, > and doesn't really let us massively clean up things in a way comparable > to last time. > Yeah, amber was primarily about mothballing src/mesa/drivers/ in my opinion. It happened to correlate well with the GL 1.x vs 2.0 generational divide, but that was largely because we had slowly migrated all the GL2 hardware to gallium drivers (iris and crocus and i915g and r300g were a lot of work, let's do remember), so the remaining "classic" drivers were only the best choice for fixed function hardware. Nice bright line in the sand, there, between the register bank of an overgrown SGI Indy as your state vector, and the threat of a Turing-complete shader engine. I have a harder time finding that line in the sand today. ES3? Compute shaders? Vulkan 1.0? I'm not sure any of these so fundamentally change the device programming model, or the baseline API assumptions, that we would benefit by requiring it of the hardware. I'm happy to be wrong about that! We're using compute shaders internally in more and more ways, for example, maybe being able to assume them would be a win. If there's a better design to be had past some feature level, then by all means let's have that discussion. But if the issue is we don't like how many drivers there are then I am sorry but at some level that is simply the dimension of the problem. Mesa's breadth of hardware coverage is at the core of its success. You'd be hard-pressed to find a GLES1 part anymore, but there are brand-new systems with Mali-400 MP GPUs, and there's no reason the world's finest GLES2 implementation should stop working there. - ajax
Re: SIGBUS with gbm_bo_map() and Intel ARC
On 6/20/24 16:29, Pierre Ossman wrote: On 6/19/24 11:36, Pierre Ossman wrote: Is there something special I need to pay attention to when doing cross GPU stuff? I would have assumed that gbm_bo_import() would have complained if this was an incompatible setup. It does indeed look like some step is missing. If I examine /proc//maps, I can see that the accessed memory address is associated with the wrong render node: Crash reading 0x7fffe4176000 7fffe4176000-7fffe440 rw-s 100602000 00:06 500 /dev/dri/renderD128 In cross-GPU combinations where it works, I'm seeing this map instead: 7fffef30e000-7fffef408000 rw-s 100056000 00:0b 533 /dmabuf: Regards -- Pierre Ossman Software Development Cendio AB https://cendio.com Teknikringen 8 https://twitter.com/ThinLinc 583 30 Linköpinghttps://facebook.com/ThinLinc Phone: +46-13-214600 A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
Re: SIGBUS with gbm_bo_map() and Intel ARC
On 6/19/24 11:36, Pierre Ossman wrote: Is there something special I need to pay attention to when doing cross GPU stuff? I would have assumed that gbm_bo_import() would have complained if this was an incompatible setup. It does indeed look like some step is missing. If I examine /proc//maps, I can see that the accessed memory address is associated with the wrong render node: Crash reading 0x7fffe4176000 7fffe4176000-7fffe440 rw-s 100602000 00:06 500 /dev/dri/renderD128 The X server is using renderD128, but the client is using renderD129. This works with other X servers, so I assume there is some way to resolve this. But where do I start looking? The fd I'm getting is a DMA-BUF fd, I assume? I can't find many ioctls for that. But that's also all I'm getting, so there must be something I'm supposed to do with that fd? Help! :/ Regards -- Pierre Ossman Software Development Cendio AB https://cendio.com Teknikringen 8 https://twitter.com/ThinLinc 583 30 Linköpinghttps://facebook.com/ThinLinc Phone: +46-13-214600 A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
Re: time for amber2 branch?
On Wed, 2024-06-19 at 10:33 -0400, Mike Blumenkrantz wrote: > In looking at the gallium tree, I'm wondering if it isn't time for a > second amber branch to prune some of the drivers that cause pain when > doing big tree updates: > > * nv30 > * r300 > * r600 > * lima > * virgl > * tegra > * ??? > > There's nothing stopping these drivers from continuing to develop in > an amber branch, but the risk of them being broken by other tree > refactorings is lowered, and then we are able to delete lots of > legacy code in the main branch. > > Thoughts? When we did Amber, we had a lot better reason to do so than "these drivers cause pain when doing big tree updates". The maintenance burden imposed by the drivers proposed for removal here is much, much smaller, and doesn't really let us massively clean up things in a way comparable to last time. I'm not convinced that this is a good idea. Most (if not all) of these drivers are still useful, and several of them are actively maintained. Pulling them out of main makes very little sense to me. What exactly are you hoping to gain from this? If it's just that they're old hardware with less capabilities, perhaps we can address the problems from that in a different way, by (for instance) introducing a "legacy hw" gallium layer, so legacy HW details doesn't have to leak out into the rest of gallium...
Re: Does gbm_bo_map() implicitly synchronise?
On 6/20/24 15:59, Pierre Ossman wrote: We recently identified that it has an issue[2] with synchronization on the server side when after glFlush() in the client side the command list takes too much (several seconds) to finish the rendering. [2] https://gitlab.freedesktop.org/mesa/mesa/-/issues/11228 Oh. I can try to test it here. We don't seem to have any synchronisation issues now that we got that VNC bug resolved. I just tested here, and could not see the issue with our implementation with either an AMD iGPU or Nvidia dGPU. They might be too fast to trigger the issue? I have a Pi4 here as well, but it's not set up for this yet. Regards -- Pierre Ossman Software Development Cendio AB https://cendio.com Teknikringen 8 https://twitter.com/ThinLinc 583 30 Linköpinghttps://facebook.com/ThinLinc Phone: +46-13-214600 A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
Re: Does gbm_bo_map() implicitly synchronise?
On 6/20/24 11:04, Chema Casanova wrote: You can have a look at the Open MR we created two years ago for Xserver [1] "modesetting: Add DRI3 support to modesetting driver with glamor disabled". We are using it downstream for Raspberry Pi OS to enable on RPi1-3 GPU accelerated client applications, while the Xserver is using software composition with pixman. [1] https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/945 I did actually look at that to get some idea of how things are connected. But the comments suggested that the design wasn't robust, so we ended up trying a different approach. Our work is now available in the latest TigerVNC beta, via this PR: https://github.com/TigerVNC/tigervnc/pull/1771 We recently identified that it has an issue[2] with synchronization on the server side when after glFlush() in the client side the command list takes too much (several seconds) to finish the rendering. [2] https://gitlab.freedesktop.org/mesa/mesa/-/issues/11228 Oh. I can try to test it here. We don't seem to have any synchronisation issues now that we got that VNC bug resolved. The two big issues we have presently is the SIGBUS crash I opened a separate thread about, and getting glvnd to choose correctly when the Nvidia driver is used. Regards -- Pierre Ossman Software Development Cendio AB https://cendio.com Teknikringen 8 https://twitter.com/ThinLinc 583 30 Linköpinghttps://facebook.com/ThinLinc Phone: +46-13-214600 A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
Re: Does gbm_bo_map() implicitly synchronise?
El 17/6/24 a las 12:29, Pierre Ossman escribió: So if you want to do some rendering with OpenGL and then see the result in a buffer memory mapping the correct sequence would be the following: 1. Issue OpenGL rendering commands. 2. Call glFlush() to make sure the hw actually starts working on the rendering. 3. Call select() on the DMA-buf file descriptor to wait for the rendering to complete. 4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible. What I want to do is implement the X server side of DRI3 in just CPU. It works for every application I've tested except gnome-shell. You can have a look at the Open MR we created two years ago for Xserver [1] "modesetting: Add DRI3 support to modesetting driver with glamor disabled". We are using it downstream for Raspberry Pi OS to enable on RPi1-3 GPU accelerated client applications, while the Xserver is using software composition with pixman. [1] https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/945 We recently identified that it has an issue[2] with synchronization on the server side when after glFlush() in the client side the command list takes too much (several seconds) to finish the rendering. [2] https://gitlab.freedesktop.org/mesa/mesa/-/issues/11228 I would assume that 1. and 2. are supposed to be done by the X client, i.e. gnome-shell? What I need to be able to do is access the result of that, once the X client tries to draw using that GBM backed pixmap (e.g. using PresentPixmap). So far, we've only tested Intel GPUs, but we are setting up Nvidia and AMD GPUs at the moment. It will be interesting to see if the issue remains on those or not. Regards, Chema Casanova