Hi, On Thu, 2015-07-16 at 08:15 -0700, Jason Ekstrand wrote: > > On Jul 15, 2015 11:20 PM, "Iago Toral" <ito...@igalia.com> wrote: > > > > On Wed, 2015-07-15 at 11:02 -0700, Connor Abbott wrote: > > > On Wed, Jul 15, 2015 at 7:49 AM, Iago Toral <ito...@igalia.com> > wrote: > > > > Hi, > > > > > > > > when we sent the patches for the new nir->vec4 backend we > mentioned that > > > > we had a few dEQP tests that would fail to link because of > register > > > > spilling. Now that we have added GS support we see a few > instances of > > > > this problem popping up in a few GS piglit tests too, for > example this > > > > one: > > > > > > > > > tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test > > > > > > > > I have been looking into what is going on with these tests and I > came to > > > > the conclusion that the problem is a consequence of various > factors, but > > > > probably the main thing contributing to it is the way our SSA > pass > > > > works. That said, I am not that experienced with NIR, so it > could also > > > > be that my analysis is missing something and I am just arriving > to wrong > > > > conclusions, so I'll explain my thoughts below and hopefully > someone > > > > else with more NIR experience can jump in and confirm or reject > my > > > > analysis. > > > > > > > > The GS code in that test looks like this: > > > > > > > > for (int p = 0; p < 3; p++) { > > > > color = ((index >= ins[p].m1.length() ? > > > > ins[p].m2[index-ins[p].m1.length()] : > > > > ins[p].m1[index]) == expect) ? > > > > vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0, > 1.0); > > > > gl_Position = gl_in[p].gl_Position; > > > > EmitVertex(); > > > > } > > > > > > > > One thing that is immediately contributing to the register > pressure is > > > > some really awful code generated because of the indirect array > indexing > > > > on the inputs inside the loop. This is because of the > > > > lower_variable_index_to_cond_assign lowering pass called from > > > > brw_shader.cpp. This pass will convert that color assignment > into a > > > > bunch of nested if/else statements which makes the generated > GLSL IR > > > > code rather large, involving plenty of temporaries too. This is > only > > > > made worse by the fact that loop unrolling will replicate that 3 > times. > > > > The result is a huge pile of GLSL IR with a few dozens of nested > if/else > > > > statements and temporaries that looks like [1] (that is only a > fragment > > > > of the GLSL IR). > > > > > > > > One thing that is particularly relevant in that code is that it > has > > > > multiple conditional assignments to the same variable > > > > (dereference_array_value) as a consequence of this lowering > pass. > > > > > > > > That much, however, is common to the NIR and non-NIR paths. The > problem > > > > in the NIR case is that all these assignments generate new SSA > values, > > > > which then become new registers in the final NIR form. This > leads to NIR > > > > code like [2]. In contrast, the old vec4 visitor path, is able > to have > > > > writes to the same variable write to the same register. > > > > > > > > As a result, if I print the code right before register > allocation in the > > > > NIR path [3] and I compare that to what we get with the old vec4 > visitor > > > > path at that same point [4], it is clearly visible that this > difference > > > > is allowing the vec4 visitor path to reduce register pressure > (see how > > > > in [4] we have multiple writes to vgrf5, while in [3] we always > write to > > > > a new vgrf every time). > > > > > > > > So, am I missing something or is this kind of result expected > with NIR > > > > programs? Is there anything in the nir->vec4 pass that we can do > to fix > > > > this or does this need to be fixed when going out of SSA moe > inside NIR? > > > > > > > > Iago > > > > > > > > [1] http://pastebin.com/5uA8ex2S > > > > [2] http://pastebin.com/pqLfvAVN > > > > [3] http://pastebin.com/64nSuUH8 > > > > [4] http://pastebin.com/WCrdYxzt > > > > > > > > _______________________________________________ > > > > mesa-dev mailing list > > > > mesa-dev@lists.freedesktop.org > > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > > > Hi Iago, > > > > > > Indeed, NIR does convert conditional writes to conditional > selectss -- > > > it's a required part of the conversion to SSA, and since our HW > has a > > > conditional select instruction that's just as fast as doing a > > > conditional move, we haven't bothered much to try and change it > back > > > during out-of-SSA. However, doing this shouldn't make things > worse. In > > > your example, vgrf9, vgrf15, and vgrf17 all have very short live > > > intervals and don't interfere with vgrf11 (unless there's another > use > > > of them somewhere after the snippet you pasted), which means that > the > > > register allocator is free to allocate the destinations of all the > > > selects to the same register. > > > > > > What's happening, though, is that you're running into our terrible > > > liveness analysis. After doing the proper liveness analysis, we > figure > > > out the place each register first becomes live and last becomes > dead, > > > and then we consider registers that have overlapping ranges to > > > interfere. So we consider vgrf11 to interfere with vgrf15 and > vgrf17, > > > even though it really doesn't. The trouble with making it do the > right > > > thing is that we may actually need to extend the live ranges of > > > registers when the exec masks don't match up, either because one > uses > > > writemask_all or because they have incompatible exec masks due to > > > containing different datatypes (half-float vs. float, etc.). For > > > example, in your snippet, if vgrf15 were to be written with > > > force_writemask_all, then it actually would interfere with vgrf11. > But > > > right now we're simply being conservative and assuming everything > is > > > incompatible, which produces extra register pressure in situations > > > like the one you have. We have the same problem with FS, and I > have > > > some idea how to make it better there, but I'm not sure if we > would > > > care enough to port it to vec4. > > > > > > So the long answer is that there are some unfortunate features of > the > > > backend that are making things much worse than they should be wrt > > > register allocation, and separating things out like SSA does seems > to > > > make it worse in certain situations. The short answer is: go fix > > > register spilling! > > > > > > Connor > > > > I see, thanks for the detailed reply Connor. I'll have a look at the > > vec4 register spilling code and Ben's work in that area and see what > I > > can do. > > Thanks for working on that, it is the core problem here. In the FS, > we have an INTEL_DEBUG=spill option that triggers tells it to spill > *everything*. That helps a lot when debugging spilling. > --Jason
FYI, I have just fixed this. The problem was not related to register spilling though but to incorrect live analysis for conditional writes with the select opcode. I have just sent this patch for review: http://lists.freedesktop.org/archives/mesa-dev/2015-July/089219.html That fixes all the piglit and dEQP tests that failed to link due to excessive register spilling with the NIR-vec4 backend. I can still have a go at fixing issues with register spilling though. Looking at previous work from Ben it looks like he was trying to make it work with register sizes >1. Is that what you all had in mind or is there something else? Iago _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev