NIR assumes that all booleans are 32-bit, so drivers need to produce 32-bit booleans even if they can produce native booleans of a different bit-size, like Intel does. This means that if we have a 16-bit CMP instruction, we generate a 16-bit boolean that we immediately convert to 32-bit, since that is the bit-size expected by NIR for all consumers of the boolean.
This backend optimization pass identifies these cases after we are done translating from NIR to FS IR, and propagates the lower bit-size booleans to allow DCE to remove the 32-bit conversions. The pass should run early after translating from NIR, since it assumes that boolean conversions to 32-bit take place immediately after the corresponding CMP instructions. This has been tested with existing and work-in-progress CTS tests as well as some had-hoc VkRunner I wrote. For more context you can read this discussion: https://lists.freedesktop.org/archives/mesa-dev/2018-April/192751.html One point raised by Jason during the discussion linked above was that we might need to canonicalize booleans of different native bit-sizes when they are combined in boolean expressions. However, as indicated in the commit log for the last patch in the series, my interpretation of the PRM is that the hardware can handle this situation without us having to do anything about it. The last patch contains canonicalization code under a disabled #if guard anyway, just in case reviewers think this is needed in the end and want to have a look at what it could look like. Alternatively to what is being done here, we could also change the way we construct CMP instructions to take advantage of the PRM documentation that says that CMP instructions can mix and match *B, *W and *D for their source and destination arguments since gen5 to always produce canonical 32-bit bools like NIR expects. However, since all hardware gens still produce 16-bit booleans for half-float, we would still need to handle that case specially with a similar pass so we would not gaining much from that. Also, in that case we would always operate with 32-bit booleans, losing the possibility to emit native 16-bit boolean instructions where possible. Iago Toral Quiroga (3): intel/compiler: make brw_reg_type_from_bit_size usable from other places intel/compiler: add a region_match() helper intel/compiler: add an optimization pass for booleans src/intel/compiler/brw_fs.cpp | 291 ++++++++++++++++++++++++++++++++++++++ src/intel/compiler/brw_fs.h | 5 + src/intel/compiler/brw_fs_nir.cpp | 59 -------- src/intel/compiler/brw_ir_fs.h | 13 ++ 4 files changed, 309 insertions(+), 59 deletions(-) -- 2.14.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev