On 19/06/2024 20:34, Mike Blumenkrantz wrote:
> Terakan is not a Mesa driver, and Mesa has no obligation to cater to out-of-tree projects which use its internal API. For everything else, see above.

I don't think, however, that it can simply be dismissed like it doesn't exist when it's:  • striving to become a part of Mesa among the "cool" drivers with broad extension support like RADV, Anvil, Turnip, and now NVK;  • actively developed nearly every day (albeit for around 2 hours per day on average because it's a free time project);  • trying to explore horizons Mesa hasn't been to yet (submitting hardware commands directly on Windows).

As for R600g, it's one thing to drop the constraints imposed by some Direct3D 9 level GPUs that, for instance, don't even support integers in shaders or something like that (if that's even actually causing issues in reality that slow down development of everything else significantly — the broad hardware support is something that I absolutely LOVE Mesa and overall open source infrastructure for, and I think that's the case for many others too), but here we're talking about Direct3D 11 (or 10, but programmed largely the same way) class hardware with OpenGL 4.5 already supported, and 4.6 being straightforward to implement.

This means that, with the exception of OpenCL-specific global addressing issues (R9xx can have a 4 GB "global memory" binding though possibly), the interface contract between Gallium's internals and R600g shouldn't differ that much from that of the more modern drivers — the _hardware_ architecture itself doesn't really warrant dropping active support in common code.

Incidents like one change suddenly breaking vertex strides are thus mainly a problem in how _the driver itself_ is written, and that's of course another story… While I can't say much about Gallium interactions specifically, I keep encountering more and more things that are unhandled or broken in how the driver actually works with the GPU, and there are many Piglit tests that fail. I can imagine the way R600g is integrated into Gallium isn't in a much better state.

So I think it may make sense (even though I definitely don't see any serious necessity) to **temporarily** place R600g in a more stable environment where regressions in it are less likely to happen, but then once it's brought up to modern Mesa quality standards, and when it becomes more friendly to the rest of Mesa, to **move it back** to the main branch (but that may stumble upon a huge lot of interface version conflicts, who knows). Some of the things we can do to clean it up are:

 • Make patterns of interaction with other subsystems of Gallium more similar to those used by other drivers. Maybe use RadeonSI as the primary example because of their shared roots.  • Fix some GPU configuration bugs — that I described in my previous message, as well as some other ones, such as these small ones:    • Emit all viewports and scissors at once without using the dirty mask because the hardware requires that (already handled years ago in RadeonSI).    • Fix gl_VertexID in indirect draws — the DRAW_INDIRECT packets write the base to SQ_VTX_BASE_VTX_LOC, which has an effect on vertex fetch instructions, but not on the vertex ID input; instead switch from SQ_VTX_FETCH_VERTEX_DATA to SQ_VTX_FETCH_NO_INDEX_OFFSET, and COPY_DW the base to VGT_INDX_OFFSET.    • Properly configure the export format of the pixel shader DB export vector (gl_FragDepth, gl_FragStencilRefARB, gl_SampleMask).    • Investigate how queries currently work if the command buffer was split in the middle of a query, add the necessary stitching where needed.  • Make Piglit squeal less. I remember trying to experiment with glDispatchComputeIndirect, only to find out that the test I wanted to run to verify my solution was broken for another reason. Oink oink.  • If needed, remove the remaining references to TGSI enums, and also switch to the NIR transform feedback interface that, as far as I understand, is compatible with Nine and D3D10 frontends (or maybe it's the other way around (= either way, make that consistent).
 • Do some cleanup in common areas:
   • Register, packet and shader structures can be moved to JSON definitions similar to those used for GCN/RDNA, but with more clear indication of the architecture revisions they can be used on (without splitting into r600d.h and evergreend.h). I've already stumbled upon a typo in that probably hand-written S_/G_/C_ #define soup that has caused weird Vulkan CTS failures once, specifically in C_028780_BLEND_CONTROL_ENABLE in evergreend.h, and who knows what other surprises may be there. Some fields there are apparently just for the wrong architecture revisions (though maybe actually present, but undocumented, I don't know, given the [RESERVED] situation with the documentation for anisotropic filtering and maybe non-1D/2D_THIN tiling modes, for example, and that we have the reference for the 3D registers, but not for compute).    • A lot of format information can be shared between vertex fetch, texture fetch, and color/storage attachments. I'm finishing writing some common format code for Terakan currently, that may be adopted by R600g.    • Carefully make sure virtual memory is properly supported in all places on R9xx (using virtual addresses, and not emitting relocation NOPs that are harmless but wasteful — moreover, this part deserves some common function that will make it easier to port R600g to other platforms, such as by making it write D3DKMTRender patch locations on Windows).  • Unify R6xx/R7xx and R8xx/R9xx code wherever possible. There's r600_state.c that is over 100 KB large, and evergreen_state.c that's even bigger, but in many places it's just the same code, just including r600d.h in one file and evergreend.h in another — and how much technical debt we already have in the R6xx/R7xx code is an interesting question. To me, there doesn't seem to be any necessity to abandon R6xx/R7xx support completely currently considering that the programming differences from R8xx/R9xx are pretty minor. At least as long as someone occasionally runs tests on the older generations.

Maybe that will involve some small-scale changes, maybe that will end up being more like a rewrite, but still it's totally possible that R600g may have a new beginning at this point, especially with Gert Wollny's compiler, and me visiting every aspect of the interface of those GPUs, rather than an ending. At some point we may even start exposing R600-specific functionality such as D3DFMT_D24FS8 in Gallium Nine on R6xx/R7xx.

However, I don't like the whole idea of moving drivers away from the main branch because that affects not only development, but also users of Mesa. It'd be necessary to ensure that Linux distribution maintainers are well-notified of the new branch, but even then that may still cause issues. Like, what if the amber2 drivers end up in a separate package in a distribution — and that'll possibly mean that after some `apt-get dist-upgrade`, users will suddenly lose GPU acceleration on their systems for an unobvious reason. And we definitely shouldn't be underestimating the number of users of that old hardware outside Linux developer circles — especially TeraScale (I think Firefox regularly gets issue reports from Nvidia Rankine/Curie users?) I occasionally see people on Reddit and other platforms discussing the status of Terakan, and I'd expect that the people who talk about some software are just a small fraction of those who use it at all. And sometimes weird things just happen like Bringus Studios bringing up a Xi3 Piston out of semi-vaporware nowhere…

Regarding CI, I can't promise anything right now, but I think that's not an unsolvable issue. Overall just one machine with a Trinity APU, an R6xx/R7xx card, and an R8xx card (one of them preferably being RV670, RV770, or Cypress/Hemlock, to be able to test co-issuing of float64 instructions with a transcendental one when that's implemented) likely should cover most of our regression testing needs — at least in Gallium interaction most definitely.

Terakan development will surely continue being based on the main branch, partly because the original reason behind the split suggestion mostly doesn't apply to it. I do need recent Vulkan headers and all the WSI improvements at the very least — and there are areas where Terakan itself may contribute something new to the common Vulkan runtime code. I already have some WSI-demanded binary-over-timeline sync type enhancements on my branch, and if my Windows experiments go forward, there will likely be a lot of what can be added to the common code, such as WDDM 1 synchronization primitives (even though WDDM 2's timeline semaphores aka monitored fences are more important to modern drivers, there's no WDDM 2 on Windows older than 10), as well as paths for zero-copy presentation (primarily for WDDM 1 level configurations — like via sharing images with Direct3D 10/11, or with OpenGL to take advantage of the "exclusive borderless" driver hack, or maybe even via D3DKMTPresent where possible).

On 20/06/2024 20:30, Adam Jackson wrote:
> We're using compute shaders internally in more and more ways, for example, maybe being able to assume them would be a win.

I'd imagine that compute shader usage scenarios in common Gallium code are optional, and depending on the hardware, compute shaders can even be the less optimal approach to things like image copying/resolving (where specialized copy hardware is available) from the perspective of performance or maybe format support (early, or maybe actually all, I don't know for sure yet, AMD R8xx hardware, for instance, hangs with linear storage images according to one comment in R800AddrLib, and that's why a quad with a color target may be preferable for copying — and it also has fast resolves inside its color buffer hardware, as well as a DMA engine).

— Triang3l

Reply via email to