[Mesa-dev] [PATCH 0/3] intel: implement an optimization pass to clean-up boolean conversions

2018-05-15 Thread Iago Toral Quiroga
NIR assumes that all booleans are 32-bit, so drivers need to produce 32-bit
booleans even if they can produce native booleans of a different bit-size, like
Intel does. This means that if we have a 16-bit CMP instruction, we generate a
16-bit boolean that we immediately convert to 32-bit, since that is the bit-size
expected by NIR for all consumers of the boolean.

This backend optimization pass identifies these cases after we are done
translating from NIR to FS IR, and propagates the lower bit-size booleans
to allow DCE to remove the 32-bit conversions. The pass should run early
after translating from NIR, since it assumes that boolean conversions to
32-bit take place immediately after the corresponding CMP instructions.

This has been tested with existing and work-in-progress CTS tests as well
as some had-hoc VkRunner I wrote.

For more context you can read this discussion:
https://lists.freedesktop.org/archives/mesa-dev/2018-April/192751.html

One point raised by Jason during the discussion linked above was that we might
need to canonicalize booleans of different native bit-sizes when they are
combined in boolean expressions. However, as indicated in the commit log for the
last patch in the series, my interpretation of the PRM is that the hardware can
handle this situation without us having to do anything about it. The last patch
contains canonicalization code under a disabled #if guard anyway, just in case
reviewers think this is needed in the end and want to have a look at what it
could look like.

Alternatively to what is being done here, we could also change the way
we construct CMP instructions to take advantage of the PRM documentation that
says that CMP instructions can mix and match *B, *W and *D for their source
and destination arguments since gen5 to always produce canonical 32-bit bools
like NIR expects. However, since all hardware gens still produce 16-bit booleans
for half-float, we would still need to handle that case specially with a similar
pass so we would not gaining much from that. Also, in that case we would always
operate with 32-bit booleans, losing the possibility to emit native 16-bit
boolean instructions where possible.

Iago Toral Quiroga (3):
  intel/compiler: make brw_reg_type_from_bit_size usable from other
places
  intel/compiler: add a region_match() helper
  intel/compiler: add an optimization pass for booleans

 src/intel/compiler/brw_fs.cpp | 291 ++
 src/intel/compiler/brw_fs.h   |   5 +
 src/intel/compiler/brw_fs_nir.cpp |  59 
 src/intel/compiler/brw_ir_fs.h|  13 ++
 4 files changed, 309 insertions(+), 59 deletions(-)

-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] intel: implement an optimization pass to clean-up boolean conversions

2018-06-05 Thread Iago Toral
This isn't reviewed yet, any feedback?

Iago

On Tue, 2018-05-15 at 13:05 +0200, Iago Toral Quiroga wrote:
> NIR assumes that all booleans are 32-bit, so drivers need to produce
> 32-bit
> booleans even if they can produce native booleans of a different bit-
> size, like
> Intel does. This means that if we have a 16-bit CMP instruction, we
> generate a
> 16-bit boolean that we immediately convert to 32-bit, since that is
> the bit-size
> expected by NIR for all consumers of the boolean.
> 
> This backend optimization pass identifies these cases after we are
> done
> translating from NIR to FS IR, and propagates the lower bit-size
> booleans
> to allow DCE to remove the 32-bit conversions. The pass should run
> early
> after translating from NIR, since it assumes that boolean conversions
> to
> 32-bit take place immediately after the corresponding CMP
> instructions.
> 
> This has been tested with existing and work-in-progress CTS tests as
> well
> as some had-hoc VkRunner I wrote.
> 
> For more context you can read this discussion:
> https://lists.freedesktop.org/archives/mesa-dev/2018-April/192751.htm
> l
> 
> One point raised by Jason during the discussion linked above was that
> we might
> need to canonicalize booleans of different native bit-sizes when they
> are
> combined in boolean expressions. However, as indicated in the commit
> log for the
> last patch in the series, my interpretation of the PRM is that the
> hardware can
> handle this situation without us having to do anything about it. The
> last patch
> contains canonicalization code under a disabled #if guard anyway,
> just in case
> reviewers think this is needed in the end and want to have a look at
> what it
> could look like.
> 
> Alternatively to what is being done here, we could also change the
> way
> we construct CMP instructions to take advantage of the PRM
> documentation that
> says that CMP instructions can mix and match *B, *W and *D for their
> source
> and destination arguments since gen5 to always produce canonical 32-
> bit bools
> like NIR expects. However, since all hardware gens still produce 16-
> bit booleans
> for half-float, we would still need to handle that case specially
> with a similar
> pass so we would not gaining much from that. Also, in that case we
> would always
> operate with 32-bit booleans, losing the possibility to emit native
> 16-bit
> boolean instructions where possible.
> 
> Iago Toral Quiroga (3):
>   intel/compiler: make brw_reg_type_from_bit_size usable from other
> places
>   intel/compiler: add a region_match() helper
>   intel/compiler: add an optimization pass for booleans
> 
>  src/intel/compiler/brw_fs.cpp | 291
> ++
>  src/intel/compiler/brw_fs.h   |   5 +
>  src/intel/compiler/brw_fs_nir.cpp |  59 
>  src/intel/compiler/brw_ir_fs.h|  13 ++
>  4 files changed, 309 insertions(+), 59 deletions(-)
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev