On Thu, Jan 9, 2020 at 3:56 AM Luke Kenneth Casson Leighton <l...@lkcl.net> wrote: > > On 1/9/20, Jason Ekstrand <ja...@jlekstrand.net> wrote: > >> 2. as a flexible Vector Processor, soft-programmable, then over time if > >> the industry moves to dropping vec4, so can we. > >> > > > > That's very nice. My primary reason for sending the first e-mail was that > > SwiftShader vs. Mesa is a pretty big decision that's hard to reverse after > > someone has poured several months into working on a driver and the argument > > you gave in favor of Mesa was that it supports vec4. > > not quite :) i garbled it (jacob spent some time explaining it, a few > months back, so it's 3rd hand if you know what i mean). what i can > recall of what he said was: it's something to do with the data types, > particularly predication, being maintained as part of SPIR-V (and > NIR), which, if you drop that information, you have to use > auto-vectorisation and other rather awful tricks to get it back when > you get to the assembly level. > > jacob perhaps you could clarify, here?
So the major issue with the approach AMDGPU took where the SIMT to predicated vector translation is done by the LLVM backend is that LLVM doesn't really maintain a reducible CFG, which is needed to correctly vectorize the code without devolving to a switch-in-a-loop. This kinda-sorta works for AMDGPU because the backend can specifically tell the optimization passes to try to maintain a reducible CFG. However, that won't work for Libre-RISCV's GPU because we don't have a separate GPU ISA (it's just RISC-V or Power, we're still deciding), so the backends don't tell the optimization passes that they need to maintain a reducible CFG, additionally, the AMDGPU vectorization is done as part of the translation from LLVM IR to MIR, which makes it very hard to adapt to a different ISA. Because of all of those issues, I decided that it would be better to vectorize before translating to LLVM IR, since that way, the CFG reducibility can be easily maintained. This also gives the benefit that it's much easier to substitute a different backend compiler such as gccjit or cranelift, since all of the required SIMT-specific transformations are already completed before the code goes to the backend. Both NIR and the IR I'm currently implementing in Kazan (the non-Mesa Vulkan driver for libre-riscv) maintain a reducible CFG throughout the optimization process. In fact, the IR I'm implementing can't express non-reducible CFGs since it's built as a tree of loops and code blocks where control transfer operations can only continue a loop or exit a loop or block. Switches work by having a nested set of blocks and the switch instruction picks which block to break out of. Hopefully, that all made sense. :) Jacob Lifshay _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev