Here is a version 2 of adding support for 16-bit float instructions in the shader compiler. Unlike the first version which did all the analysis at glsl level here one adds the notion of precision to NIR variables and does the analysis and precision lowering in NIR level.
This lives in: gitlab.freedesktop.org:tpohjola/mesa and branch fp16. This is now mature enough to be able to use 16-bit precision for all instructions except a few special cases for gfxbench trex and alu2. (Unfortunately I'm not seeing any performance benefit. This is not that surprising as I got to the same point with the glsl-based solution and was able to measure the performance already back then). Hence I thought it is time to share it. While this is still work-in-progress I didn't want to flood the list with the full set of patches but instead included the very last where I try to outline the logic and its current shortcomings. There is also a short list of TODO items. In addition to those I need to examine couple of Intel specific misrenderings. I haven't gotten that deep yet but it looks I'm missing something with 16-bit inot and mad/mac lowered interpolation. Unfortunately I get corrupted rendering only with hardware while simulator is happy. Mostly I'm afraid how to test all of this properly. I haven't written any unit tests but that is high on my list. This is mostly because I've been uncertain about my design choices. So far I've used shader runner tests that I've written for specific cases. These are useful for development purposes but don't bring much value for regression testing. Alejandro PiƱeiro (1): intel/compiler/fs: Use half_precision data_format on 16-bit fb writes Jose Maria Casanova Crespo (2): intel/compiler/fs: Include support for RT data_format bit intel/compiler/disasm: Show half-precision data_format on rt_writes Topi Pohjolainen (58): intel/compiler/fs: Set 16-bit sampler return format intel/compiler/disasm: Show half-precision for sampler messages intel/compiler/fs: Skip tex-inst early in conversion lowering intel/compiler/fs: Support for dumping 16-bit IMM values intel/compiler: Allow 16-bit math intel/compiler/fs: Add helpers for 16-bit null regs intel/compiler/fs: Use two SIMD8 instructions for 16-bit math intel/compiler/fs: Use 16-bit null dest with 16-bit math intel/compiler/fs: Use 16-bit null dest with 16-bit compare intel/compiler/fs: Add 16-bit type support for nir_if intel/compiler/eu: Prepare 3-src-op for 16-bit sources intel/compiler/eu: Prepare 3-src-op for 16-bit dst intel/compiler/eu: Allow 3-src-op with mixed precision (HF/F) sources intel/compiler/disasm: Print mixed precision 3-src types correctly intel/compiler/disasm: Print 16-bit IMM values intel/compiler/fs: Support for combining 16-bit immediates intel/compiler/fs: Set tex type for generator to flag fp16 intel/compiler/fs: Use component_size() instead of open coded intel/compiler/fs: Add register padding support intel/compiler/fs: Pad 16-bit texture return payloads intel/compiler/fs: Pad 16-bit output (store/fb write) payloads intel/compiler/fs: Pad 16-bit nir vec* components into full reg intel/compiler/fs: Pad 16-bit nir intrinsic dest into full reg intel/compiler/fs: Pad 16-bit const loads into full regs intel/compiler/fs: Pad 16-bit load payload lowering nir: Lower also 16-bit lrp() if needed intel/compiler: Lower 16-bit lrp() nir: Recognize f232(f216(x)) as x nir: Recognize f216(f232(x)) as x nir: Store variable precision when translating from glsl glsl: Set default precision for builtin variables i965: Prepare uniform mapping for 16-bit values i965: Support for uploading 16-bit uniforms from 32-bit store intel/compiler/fs: WIP: Use 32-bit slots for 16-bit uniforms intel/compiler: Tell compiler if lower precision is supported nir: Add lowering pass for variables marked mediump nir: Add pass for deref precision lowering nir: Add pass for alu precision lowering nir: Add precision conversion for load/store_deref nir: Add precision conversion for sources of texturing ops nir: Don't set destination size 16 for booleans nir: Add precision lowering for texture samples nir: Add support for non-fixed precision nir: Don't try to alter precision of boolean sources nir: Add support for variable sized booleans nir: Add support for lowering phi precision intel/compiler/fs: Prepare alu dest type for 16-bit booleans nir: Add lowering pass setting 16-bit boolean destinations nir: Add lowering pass turning b2f(i2i32(x)) into b2f(x) nir: Adjust integer precision for alus operating with 16-bit srcs nir: Replace b2f(x) with b2f(i2i32(x)) for 16-bit x nir: Adjust precision for discard_if nir: Allow input varyings to be converted to lower precision nir: Replace 16-bit src[0] for bcsel i2i32(src[0]) nir: Replace 16-bit nir_if condition with i2i32(condition) Revert "intel/compiler: fix 16-bit comparisons" intel/compiler: Hook in precision lowering pass nir: Document precision lowering pass src/compiler/Makefile.sources | 2 + src/compiler/glsl/glsl_symbol_table.cpp | 20 + src/compiler/glsl/glsl_symbol_table.h | 7 + src/compiler/glsl/glsl_to_nir.cpp | 1 + src/compiler/nir/meson.build | 2 + src/compiler/nir/nir.h | 18 + src/compiler/nir/nir_lower_bool_size.c | 120 +++ src/compiler/nir/nir_lower_precision.cpp | 820 ++++++++++++++++++ src/compiler/nir/nir_opt_algebraic.py | 5 + src/intel/blorp/blorp.c | 4 +- src/intel/compiler/brw_compiler.c | 1 + src/intel/compiler/brw_disasm.c | 28 +- src/intel/compiler/brw_eu.h | 3 +- src/intel/compiler/brw_eu_emit.c | 83 +- src/intel/compiler/brw_fs.cpp | 68 +- src/intel/compiler/brw_fs.h | 4 +- src/intel/compiler/brw_fs_builder.h | 37 +- .../compiler/brw_fs_combine_constants.cpp | 84 +- .../compiler/brw_fs_copy_propagation.cpp | 7 +- src/intel/compiler/brw_fs_generator.cpp | 13 +- .../compiler/brw_fs_lower_conversions.cpp | 42 + src/intel/compiler/brw_fs_nir.cpp | 197 +++-- src/intel/compiler/brw_fs_surface_builder.cpp | 3 +- src/intel/compiler/brw_fs_visitor.cpp | 6 + src/intel/compiler/brw_inst.h | 5 + src/intel/compiler/brw_ir_fs.h | 16 + src/intel/compiler/brw_nir.c | 22 +- src/intel/compiler/brw_nir.h | 4 +- src/intel/compiler/brw_reg_type.c | 2 + src/intel/compiler/brw_shader.h | 7 + src/intel/vulkan/anv_pipeline.c | 2 +- .../drivers/dri/i965/brw_nir_uniforms.cpp | 8 +- src/mesa/drivers/dri/i965/brw_program.c | 10 +- src/mesa/drivers/dri/i965/brw_program.h | 6 +- src/mesa/drivers/dri/i965/brw_tcs.c | 2 +- .../drivers/dri/i965/gen6_constant_state.c | 14 +- 36 files changed, 1548 insertions(+), 125 deletions(-) create mode 100644 src/compiler/nir/nir_lower_bool_size.c create mode 100644 src/compiler/nir/nir_lower_precision.cpp -- 2.17.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev