On Mon, Jan 28, 2019 at 3:32 AM Eduardo Lima Mitev <el...@igalia.com> wrote: > > On 1/26/19 12:42 AM, Rob Clark wrote: > > On Fri, Jan 25, 2019 at 10:48 AM Eduardo Lima Mitev <el...@igalia.com> > > wrote: > >> > >> There are a bunch of instructions emitted on ir3_compiler_nir related to > >> offset computations for IO opcodes (ssbo, image, etc). This small series > >> explores the possibility of moving these instructions to NIR, where we > >> have higher chances of optimizing them. > >> > >> The series introduces a new, freedreno specific NIR pass, > >> 'ir3_nir_lower_sampler_io' (final name not set). The pass is executed > >> early on ir3_optimize_nir(), and the goal is to centralize all these > >> computations there, hoping that later NIR passes will produce better > >> code than what is currently emitted. > > > > I can think of a few other things that would be interesting to lower > > to driver specific nir opcodes (imul and various lowering for tex > > instructions come to mind.. but probably also ubo and ssbo address > > calculation.. maybe it could even make sense for some of the single > > src alu instructions that translate into multiple ir3 instructions, > > not sure).. > > > > Yes, the plan is to abstract to NIR whatever brings us a benefit in > instruction stats. There is also the question of just simplifying the > backend compiler, provided we don't hurt produced code.
yeah, simplifying nir->ir3 would itself be a worthwhile thing > > Are you thinking about having separate passes for each? I guess at > > least for alu instructions we might able to use nir_algebraic so > > having things split up might be easier. > > > > I haven't thought too much about this yet, but it seems to make sense > having at least 2 passes, one for I/O and one for ALUs. I haven't thought about it too much, but I think this makes sense > >> So far, we have just implemented byte-offset computation for image store > >> and atomics. This seemed like a good first target given the amount of > >> instructions being emitted for it by the backend. > >> > >> This is an RFC series because there are a few open questions, but we > >> wanted to gather feedback already now, in case this effort is something > >> not worth it; and also hoping that somebody else will give it a try > >> against other shaders and on other gens (we have just tried this on > >> a5xx). > >> > >> * We have so far been unable to see any improvement in generated code > >> (not a penalty either). shader-db has not been specially useful. Few > >> shaders there exercise image store or image atomic ops, and of those > >> that do, most require higher versions of GLSL than what freedreno > >> supports, so they get skipped. The few that do actually run, don't > >> show any meaningful difference. > > > > I guess it would be easy enough to construct shaders that would > > benefit from this, but maybe that is cheating.. > > > > Possibly UBO and SSBO is a better target, I guess there you might be > > more likely to see patterns of access of successive elements (ie. > > foo[idx], foo[idx+1], etc)? > > > > I took a first stab at SSBO's load/store/atomic, where the offset is > divided by 4 in the backend, but was bitten by IR3_STGB requiring both > the original byte-offset and the dword-offset (in src1 and src2 > respectively). So trivially emitting a nir_shr on the offset didn't buy > us anything. I have in the backlog to revisit this, turning the offset > into a 2-component reg so we can hold the original byte-offset and the > offset divided by 4. hmm, or just two src's? Anyways, it seems like there is some potential to fold that shift into other math, perhaps if you had a shader that did something like 'somessbo[val << 1]'? fwiw, I think ssbo's and images get a bit simpler, or rather less alu instructions.. a5xx and earlier wanted both idx/coords and offset (Ie. idx/coords used for clamping out of bounds access, but offset is what is actually used to fetch/store). With a6xx it becomes just idx/coords, no offset calc.. which *hopefully* means we can deal with tiled images more easily. But a6xx has it's own challenge with one of the instruction srcs being a vecN with both input and output values. See 'instr_cat6_a6xx_t'. (note: we don't use ldc for UBOs yet, still sticking with the general purpose ldg. Maybe we should switch to get rid of some address calculation.) > > Anyways, since we don't try to do (and I'd rather not do) any sort of > > CSE post nir->ir3 I think starting to introduce more ir3 specific > > nir->nir lowering seems like a thing we need, so I'm pretty happy that > > someone is looking at this :-) > > > > Thanks, that's encouraging. Lets see how far we can get :). > cool BR, -R _______________________________________________ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno