Re: [Mesa-dev] [PATCH 3/3] intel/compiler: add an optimization pass for booleans

2018-07-06 Thread Iago Toral
On Thu, 2018-07-05 at 15:47 -0700, Caio Marcelo de Oliveira Filho
wrote:
> (I had to stop reading to go home last Tuesday, so here are the
> remaining comments.)
> 
> 
> On Tue, May 15, 2018 at 01:05:21PM +0200, Iago Toral Quiroga wrote:
> > NIR assumes that all booleans are 32-bit but Intel hardware
> > produces
> > booleans of the same size as the operands to the CMP instruction,
> > so we
> > can actually have 8-bit and 16-bit booleans. To work around this
> > mismatch between NIR and the hardware, we emit boolean conversions
> > to
> > 32-bit right after emitting the CMP instruction during the NIR->FS
> > pass, which makes interfacing with NIR a lot easier, but can leave
> > unnecessary boolean conversions in the shader code.
> 
> Question: have you explored handling this at the NIR->FS conversion?
> I.e. instead of generate the cmp + mov and then look for the cmp +
> mov
> to fix up; when generating a cmp, perform those checks (at nir level)
> and then pick the right bitsize.

It is not that easy. The problem is that NIR will continue to come at
us with 32-bit boolean instructions after the CMP+MOV, so instead of
prpagating forward for every conversion, now, for every bool we find in
IR we'd need to go back in the FS program to check if it is a real 32-
bit boolean or not to decide what to emit. I don't see any benefit,
plus we would be coupling all this with the translation implementation,
which I think is less nice than having it being a completely separate
thing.

Anyway, there is a major issue with the current patch that I have found
this week thanks to some new CTS tests: when we propagate the bitsize
of a logical instruction to its destination, that affects all its
consumers even outside the current block, so we need to handle
propagation across blocks, which adds a few more problems, so I still
need to figure out how to deal with that properly and whether that is
something we want to do (there is a reason why no other opt/lowering
passes do cross-block changes...).

Iago

> 
> > +/**
> > + * Propagates the bit-size of the destination of a boolean
> > instruction to
> > + * all its consumers. If propagate_from_source is True, then the
> > producer
> > + * is a conversion MOV from a low bit-size boolean to 32-bit, and
> > in that
> > + * case the propagation happens from the source of the instruction
> > instead
> > + * of its destination.
> > + */
> > +static bool
> > +propagate_bool_bit_size(fs_inst *inst, bool propagate_from_source)
> > +{
> > +   assert(!propagate_from_source || inst->opcode ==
> > BRW_OPCODE_MOV);
> > +
> > +   bool progress = false;
> > +
> > +   const unsigned bit_size = 8 * (propagate_from_source ?
> > +  type_sz(inst->src[0].type) : type_sz(inst->dst.type));
> > +
> > +   /* Look for any follow-up instructions that sources from the
> > boolean
> > +* result of the producer instruction and rewrite them to use
> > the correct
> > +* bit-size.
> > +*/
> > +   foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst)
> > {
> > +  if (!inst_supports_boolean(fixup_inst))
> > + continue;
> 
> Should we care about other instruction clobbering the contents of
> inst->dst, or at this point of the optimization we can count on it
> not
> being?
> 
> 
> > + /* If it is a plain boolean conversion to 32-bit, then
> > look for any
> > +  * follow-up instructions that source from the 32-bit
> > boolean and
> > +  * rewrite them to source from the output of the CMP
> > (which is the
> > +  * source of the conversion instruction) directly if
> > possible.
> > +  */
> > + progress = propagate_bool_bit_size(conv_inst, true) ||
> > progress;
> > +  }
> > +#if 0
> > +   else if (inst_supports_boolean(inst) && inst->sources > 1)
> > {
> 
> If you end up enabling this section, I suggest move the
> inst_supports_boolean() check to the beginning of the for-loop, as an
> early return. Makes the condition for the cases we are handling
> cleaner.
> 
> 
> 
> > + /* For all logical instructions that can take more than
> > one operand
> > +  * we need to ensure that all of them have matching bit-
> > sizes. If they
> > +  * don't, it means that the original shader code is
> > operating boolean
> > +  * expressions with different native bit-sizes and we
> > need to choose
> > +  * a canonical boolean form for all the operands, which
> > requires to
> > +  * inject conversions to temporaries. We choose the bit-
> > size of the
> > +  * destination as the canonical form (which must be a 32-
> > bit boolean
> > +  * since we only propagate smaller bit-sizes to the
> > destination if we
> > +  * managed to convert all the operands to the same bit-
> > size) because
> > +  * that way we don't need to worry about propagating the
> > destination
> > +  * bit-size down the line.
> > +  */
> 
> To make this comment less heavy, I'd 

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: add an optimization pass for booleans

2018-07-05 Thread Caio Marcelo de Oliveira Filho
(I had to stop reading to go home last Tuesday, so here are the
remaining comments.)


On Tue, May 15, 2018 at 01:05:21PM +0200, Iago Toral Quiroga wrote:
> NIR assumes that all booleans are 32-bit but Intel hardware produces
> booleans of the same size as the operands to the CMP instruction, so we
> can actually have 8-bit and 16-bit booleans. To work around this
> mismatch between NIR and the hardware, we emit boolean conversions to
> 32-bit right after emitting the CMP instruction during the NIR->FS
> pass, which makes interfacing with NIR a lot easier, but can leave
> unnecessary boolean conversions in the shader code.

Question: have you explored handling this at the NIR->FS conversion?
I.e. instead of generate the cmp + mov and then look for the cmp + mov
to fix up; when generating a cmp, perform those checks (at nir level)
and then pick the right bitsize.



> +/**
> + * Propagates the bit-size of the destination of a boolean instruction to
> + * all its consumers. If propagate_from_source is True, then the producer
> + * is a conversion MOV from a low bit-size boolean to 32-bit, and in that
> + * case the propagation happens from the source of the instruction instead
> + * of its destination.
> + */
> +static bool
> +propagate_bool_bit_size(fs_inst *inst, bool propagate_from_source)
> +{
> +   assert(!propagate_from_source || inst->opcode == BRW_OPCODE_MOV);
> +
> +   bool progress = false;
> +
> +   const unsigned bit_size = 8 * (propagate_from_source ?
> +  type_sz(inst->src[0].type) : type_sz(inst->dst.type));
> +
> +   /* Look for any follow-up instructions that sources from the boolean
> +* result of the producer instruction and rewrite them to use the correct
> +* bit-size.
> +*/
> +   foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst) {
> +  if (!inst_supports_boolean(fixup_inst))
> + continue;

Should we care about other instruction clobbering the contents of
inst->dst, or at this point of the optimization we can count on it not
being?


> + /* If it is a plain boolean conversion to 32-bit, then look for any
> +  * follow-up instructions that source from the 32-bit boolean and
> +  * rewrite them to source from the output of the CMP (which is the
> +  * source of the conversion instruction) directly if possible.
> +  */
> + progress = propagate_bool_bit_size(conv_inst, true) || progress;
> +  }
> +#if 0
> +   else if (inst_supports_boolean(inst) && inst->sources > 1) {

If you end up enabling this section, I suggest move the
inst_supports_boolean() check to the beginning of the for-loop, as an
early return. Makes the condition for the cases we are handling
cleaner.



> + /* For all logical instructions that can take more than one operand
> +  * we need to ensure that all of them have matching bit-sizes. If 
> they
> +  * don't, it means that the original shader code is operating 
> boolean
> +  * expressions with different native bit-sizes and we need to choose
> +  * a canonical boolean form for all the operands, which requires to
> +  * inject conversions to temporaries. We choose the bit-size of the
> +  * destination as the canonical form (which must be a 32-bit boolean
> +  * since we only propagate smaller bit-sizes to the destination if 
> we
> +  * managed to convert all the operands to the same bit-size) because
> +  * that way we don't need to worry about propagating the destination
> +  * bit-size down the line.
> +  */

To make this comment less heavy, I'd move the assumption about the
destination being 32-bit right above the assert, which is kind of an
explanation of the assertion.


> @@ -6240,6 +6471,7 @@ fs_visitor::optimize()
>  
> OPT(opt_drop_redundant_mov_to_flags);
> OPT(remove_extra_rounding_modes);
> +   OPT(opt_bool_bit_size);

It could be useful to have a short comment here about the importance
of calling opt_bool_bit_size at this point.



Thanks,
Caio
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] intel/compiler: add an optimization pass for booleans

2018-07-04 Thread Iago Toral
On Tue, 2018-07-03 at 18:45 -0700, Caio Marcelo de Oliveira Filho
wrote:
> Hi,
> 
> 
> > +   /* Look for any follow-up instructions that sources from the
> > boolean
> > +* result of the producer instruction and rewrite them to use
> > the correct
> > +* bit-size.
> > +*/
> > +   foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst)
> > {
> > +  if (!inst_supports_boolean(fixup_inst))
> > + continue;
> > +
> > +  /* For MOV instructions we can always rewrite the boolean
> > source
> > +   * if the instrucion reads the same region we produced in
> > the
> > +   * 32-bit conversion.
> > +   */
> > +  if (fixup_inst->opcode == BRW_OPCODE_MOV &&
> > +  region_match(inst->dst, inst->size_written,
> > +   fixup_inst->src[0], fixup_inst-
> > >size_read(0))) {
> > + if (propagate_from_source) {
> > +fixup_inst->src[0].file = inst->src[0].file;
> > +fixup_inst->src[0].nr = inst->src[0].nr;
> > + }
> > + fixup_inst->src[0] =
> > +fix_bool_reg_bit_size(fixup_inst->src[0], bit_size);
> > + progress = true;
> > + continue;
> > +  }
> 
> It seems the rest of the code assumes that instruction is not MOV, so
> you would need to ensure continue is called regardless the region
> match.

Right, although if the region doesn't match the rest of the code won't
do anything anyway.

> Idea: it seems we could just remove this section above (handling
> MOV),
> and slightly change the section below so that MOV can be dealt with
> it
> too.
> 
> - Drop the section above;
> - Rename progress_logical to local_progress;
> - Add a "fixup_inst->opcode == BRW_OPCODE_MOV" to the

The recursive call executes for logical instructions, not for MOV, so
this should be !=.

>   condition that controls the recursive call;
> - Update comments accordingly.

Sounds like a good idea, thanks for the feedback.

Iago

> 
> > +
> > +  /* For logical instructions we have the same restriction as
> > for MOVs,
> > +   * and we also need to:
> > +   *
> > +   * 1. Propagate the bit-size to the boolean destination of
> > the
> > +   *instruction.
> > +   * 2. Rewrite any instruction that reads the destination to
> > use
> > +   *the new bit-size.
> > +   *
> > +   * However, we can only do these if we can rewrite all the
> > operands
> > +   * to use the same bit-size.
> > +   */
> > +  bool progress_logical = false;
> > +  bool same_bit_size = true;
> > +  for (unsigned i = 0; i < fixup_inst->sources; i++) {
> > + if (region_match(inst->dst, inst->size_written,
> > +  fixup_inst->src[i], fixup_inst-
> > >size_read(i))) {
> > +if (propagate_from_source) {
> > +   fixup_inst->src[i].file = inst->src[0].file;
> > +   fixup_inst->src[i].nr = inst->src[0].nr;
> > +}
> > +fixup_inst->src[i] =
> > +   fix_bool_reg_bit_size(fixup_inst->src[i],
> > bit_size);
> > +progress_logical = true;
> > +progress = true;
> > + }
> > +
> > + if (i > 0 &&
> > + type_sz(fixup_inst->src[i].type) !=
> > + type_sz(fixup_inst->src[i - 1].type)) {
> > +same_bit_size = false;
> > + }
> > +  }
> > +
> > +  /* If we have successfully rewritten a logical instruction
> > operand
> > +   * to use a smaller bit-size boolean and all the operands in
> > the
> > +   * instruction have the same small bit-size, then propagate
> > the
> > +   * new bit-size to the destination boolean and do the same
> > for all
> > +   * follow-up instructions that read from it.
> > +   */
> > +  if (progress_logical && same_bit_size) {
> > + fixup_inst->dst = retype(fixup_inst->dst, fixup_inst-
> > >src[0].type);
> > + propagate_bool_bit_size(fixup_inst, false);
> > +  }
> > +   }
> > +
> > +   return progress;
> > +}
> 
> 
> 
> 
> Thanks,
> Caio
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] intel/compiler: add an optimization pass for booleans

2018-07-03 Thread Caio Marcelo de Oliveira Filho
Hi,


> +   /* Look for any follow-up instructions that sources from the boolean
> +* result of the producer instruction and rewrite them to use the correct
> +* bit-size.
> +*/
> +   foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst) {
> +  if (!inst_supports_boolean(fixup_inst))
> + continue;
> +
> +  /* For MOV instructions we can always rewrite the boolean source
> +   * if the instrucion reads the same region we produced in the
> +   * 32-bit conversion.
> +   */
> +  if (fixup_inst->opcode == BRW_OPCODE_MOV &&
> +  region_match(inst->dst, inst->size_written,
> +   fixup_inst->src[0], fixup_inst->size_read(0))) {
> + if (propagate_from_source) {
> +fixup_inst->src[0].file = inst->src[0].file;
> +fixup_inst->src[0].nr = inst->src[0].nr;
> + }
> + fixup_inst->src[0] =
> +fix_bool_reg_bit_size(fixup_inst->src[0], bit_size);
> + progress = true;
> + continue;
> +  }

It seems the rest of the code assumes that instruction is not MOV, so
you would need to ensure continue is called regardless the region
match.

Idea: it seems we could just remove this section above (handling MOV),
and slightly change the section below so that MOV can be dealt with it
too.

- Drop the section above;
- Rename progress_logical to local_progress;
- Add a "fixup_inst->opcode == BRW_OPCODE_MOV" to the
  condition that controls the recursive call;
- Update comments accordingly.


> +
> +  /* For logical instructions we have the same restriction as for MOVs,
> +   * and we also need to:
> +   *
> +   * 1. Propagate the bit-size to the boolean destination of the
> +   *instruction.
> +   * 2. Rewrite any instruction that reads the destination to use
> +   *the new bit-size.
> +   *
> +   * However, we can only do these if we can rewrite all the operands
> +   * to use the same bit-size.
> +   */
> +  bool progress_logical = false;
> +  bool same_bit_size = true;
> +  for (unsigned i = 0; i < fixup_inst->sources; i++) {
> + if (region_match(inst->dst, inst->size_written,
> +  fixup_inst->src[i], fixup_inst->size_read(i))) {
> +if (propagate_from_source) {
> +   fixup_inst->src[i].file = inst->src[0].file;
> +   fixup_inst->src[i].nr = inst->src[0].nr;
> +}
> +fixup_inst->src[i] =
> +   fix_bool_reg_bit_size(fixup_inst->src[i], bit_size);
> +progress_logical = true;
> +progress = true;
> + }
> +
> + if (i > 0 &&
> + type_sz(fixup_inst->src[i].type) !=
> + type_sz(fixup_inst->src[i - 1].type)) {
> +same_bit_size = false;
> + }
> +  }
> +
> +  /* If we have successfully rewritten a logical instruction operand
> +   * to use a smaller bit-size boolean and all the operands in the
> +   * instruction have the same small bit-size, then propagate the
> +   * new bit-size to the destination boolean and do the same for all
> +   * follow-up instructions that read from it.
> +   */
> +  if (progress_logical && same_bit_size) {
> + fixup_inst->dst = retype(fixup_inst->dst, fixup_inst->src[0].type);
> + propagate_bool_bit_size(fixup_inst, false);
> +  }
> +   }
> +
> +   return progress;
> +}




Thanks,
Caio

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev