(I had to stop reading to go home last Tuesday, so here are the remaining comments.)
On Tue, May 15, 2018 at 01:05:21PM +0200, Iago Toral Quiroga wrote: > NIR assumes that all booleans are 32-bit but Intel hardware produces > booleans of the same size as the operands to the CMP instruction, so we > can actually have 8-bit and 16-bit booleans. To work around this > mismatch between NIR and the hardware, we emit boolean conversions to > 32-bit right after emitting the CMP instruction during the NIR->FS > pass, which makes interfacing with NIR a lot easier, but can leave > unnecessary boolean conversions in the shader code. Question: have you explored handling this at the NIR->FS conversion? I.e. instead of generate the cmp + mov and then look for the cmp + mov to fix up; when generating a cmp, perform those checks (at nir level) and then pick the right bitsize. > +/** > + * Propagates the bit-size of the destination of a boolean instruction to > + * all its consumers. If propagate_from_source is True, then the producer > + * is a conversion MOV from a low bit-size boolean to 32-bit, and in that > + * case the propagation happens from the source of the instruction instead > + * of its destination. > + */ > +static bool > +propagate_bool_bit_size(fs_inst *inst, bool propagate_from_source) > +{ > + assert(!propagate_from_source || inst->opcode == BRW_OPCODE_MOV); > + > + bool progress = false; > + > + const unsigned bit_size = 8 * (propagate_from_source ? > + type_sz(inst->src[0].type) : type_sz(inst->dst.type)); > + > + /* Look for any follow-up instructions that sources from the boolean > + * result of the producer instruction and rewrite them to use the correct > + * bit-size. > + */ > + foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst) { > + if (!inst_supports_boolean(fixup_inst)) > + continue; Should we care about other instruction clobbering the contents of inst->dst, or at this point of the optimization we can count on it not being? > + /* If it is a plain boolean conversion to 32-bit, then look for any > + * follow-up instructions that source from the 32-bit boolean and > + * rewrite them to source from the output of the CMP (which is the > + * source of the conversion instruction) directly if possible. > + */ > + progress = propagate_bool_bit_size(conv_inst, true) || progress; > + } > +#if 0 > + else if (inst_supports_boolean(inst) && inst->sources > 1) { If you end up enabling this section, I suggest move the inst_supports_boolean() check to the beginning of the for-loop, as an early return. Makes the condition for the cases we are handling cleaner. > + /* For all logical instructions that can take more than one operand > + * we need to ensure that all of them have matching bit-sizes. If > they > + * don't, it means that the original shader code is operating > boolean > + * expressions with different native bit-sizes and we need to choose > + * a canonical boolean form for all the operands, which requires to > + * inject conversions to temporaries. We choose the bit-size of the > + * destination as the canonical form (which must be a 32-bit boolean > + * since we only propagate smaller bit-sizes to the destination if > we > + * managed to convert all the operands to the same bit-size) because > + * that way we don't need to worry about propagating the destination > + * bit-size down the line. > + */ To make this comment less heavy, I'd move the assumption about the destination being 32-bit right above the assert, which is kind of an explanation of the assertion. > @@ -6240,6 +6471,7 @@ fs_visitor::optimize() > > OPT(opt_drop_redundant_mov_to_flags); > OPT(remove_extra_rounding_modes); > + OPT(opt_bool_bit_size); It could be useful to have a short comment here about the importance of calling opt_bool_bit_size at this point. Thanks, Caio _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev