Re: [Freedreno] [Mesa-dev] [RFC 1/4] nir: Add a new intrinsic 'load_image_stride'
On 1/28/19 10:16 AM, Eduardo Lima Mitev wrote: > On 1/28/19 9:56 AM, Bas Nieuwenhuizen wrote: >> On Mon, Jan 28, 2019 at 9:38 AM Eduardo Lima Mitev wrote: >>> >>> On 1/26/19 5:34 PM, Jason Ekstrand wrote: >>>> Mind suffixing it with _ir3 or something since it's a back-end-specific >>>> intrinsic? Incidentally, this looks a lot like load_image_param_intel. >>>> >>> >>> Yes, I felt inclined to add the suffix but wasn't sure how good/bad a >>> practice it is, so I deferred it to review :). >>> >>> Now, I did a quick experiment and it turns out I can reuse >>> image_deref_load_param_intel as-is. The semantics are a bit different so >>> I would have to fork the comment to explain ir3 too. >>> >>> So, what about I remove the '_intel' suffix to that one and use if for >>> this ir3? >> >> Instructions that have different semantics for different drivers sound >> like a lot of potential for confusion and latent bugs to me. Is it >> really a problem to have both? >> > > It is not a problem at all, but I think it is preferable not to > duplicate this intrinsic if it serves the same purpose on both backends, > and only slightly differ in the specific set of params it holds: > > On both backends the intrinsic is used to represent registers at compile > time, that will be resolved to uniforms at shader run time. > > On intel, the params are offset, tiling, stride and swizzling; on ir3 > they are bytes-per-pixel, y-stride and z-stride. That's what I mean by > different semantics, and I think the difference is not enough to justify > forking it. > Actually, one can argue there is no semantic difference at all from NIR's point of view. The intrinsic just load certain image params on both backends, and it is only backend-specific what those params are. > I'm ok either way, though. > > Eduardo > >>> >>> Thanks for the feedback and the tip! >>> >>> Eduardo >>> >>>> On January 25, 2019 07:48:54 Eduardo Lima Mitev wrote: >>>> >>>>> This is an internal intrinsic intended to be injected by a >>>>> (freedreno-specific) 'lower_sampler_io' pass that will be introduced >>>>> later in this series; and consumed by ir3_compiler_nir. >>>>> >>>>> The intrinsic will load in SSA values for various constants >>>>> for images (image_dims), namely the format's bytes-per-pixel, >>>>> y-stride and z-stride; for which the backend compiler will emit >>>>> the corresponding uniforms. >>>>> >>>>> const_index[0] is the image index, and const_index[1] is the index >>>>> into image_dims' register file for a given image: 0 for bpp, 1 to >>>>> y-stride and 2 for z-stride. >>>>> --- >>>>> src/compiler/nir/nir_intrinsics.py | 2 ++ >>>>> 1 file changed, 2 insertions(+) >>>>> >>>>> diff --git a/src/compiler/nir/nir_intrinsics.py >>>>> b/src/compiler/nir/nir_intrinsics.py >>>>> index a5cc3f7401c..169c835d746 100644 >>>>> --- a/src/compiler/nir/nir_intrinsics.py >>>>> +++ b/src/compiler/nir/nir_intrinsics.py >>>>> @@ -590,6 +590,8 @@ load("shared", 1, [BASE, ALIGN_MUL, ALIGN_OFFSET], >>>>> [CAN_ELIMINATE]) >>>>> load("push_constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER]) >>>>> # src[] = { offset }. const_index[] = { base, range } >>>>> load("constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER]) >>>>> +# src[] = { offset }. const_index[] = { image_idx, dim_idx } >>>>> +load("image_stride", 1, [], [CAN_REORDER]) >>>>> >>>>> # Stores work the same way as loads, except now the first source is >>>>> the value >>>>> # to store and the second (and possibly third) source specify where to >>>>> store >>>>> -- >>>>> 2.20.1 >>>>> >>>>> ___ >>>>> mesa-dev mailing list >>>>> mesa-...@lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>> >>>> >>>> >>>> >>> ___ >>> mesa-dev mailing list >>> mesa-...@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > ___ > mesa-dev mailing list > mesa-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [Mesa-dev] [RFC 1/4] nir: Add a new intrinsic 'load_image_stride'
On 1/28/19 9:56 AM, Bas Nieuwenhuizen wrote: > On Mon, Jan 28, 2019 at 9:38 AM Eduardo Lima Mitev wrote: >> >> On 1/26/19 5:34 PM, Jason Ekstrand wrote: >>> Mind suffixing it with _ir3 or something since it's a back-end-specific >>> intrinsic? Incidentally, this looks a lot like load_image_param_intel. >>> >> >> Yes, I felt inclined to add the suffix but wasn't sure how good/bad a >> practice it is, so I deferred it to review :). >> >> Now, I did a quick experiment and it turns out I can reuse >> image_deref_load_param_intel as-is. The semantics are a bit different so >> I would have to fork the comment to explain ir3 too. >> >> So, what about I remove the '_intel' suffix to that one and use if for >> this ir3? > > Instructions that have different semantics for different drivers sound > like a lot of potential for confusion and latent bugs to me. Is it > really a problem to have both? > It is not a problem at all, but I think it is preferable not to duplicate this intrinsic if it serves the same purpose on both backends, and only slightly differ in the specific set of params it holds: On both backends the intrinsic is used to represent registers at compile time, that will be resolved to uniforms at shader run time. On intel, the params are offset, tiling, stride and swizzling; on ir3 they are bytes-per-pixel, y-stride and z-stride. That's what I mean by different semantics, and I think the difference is not enough to justify forking it. I'm ok either way, though. Eduardo >> >> Thanks for the feedback and the tip! >> >> Eduardo >> >>> On January 25, 2019 07:48:54 Eduardo Lima Mitev wrote: >>> >>>> This is an internal intrinsic intended to be injected by a >>>> (freedreno-specific) 'lower_sampler_io' pass that will be introduced >>>> later in this series; and consumed by ir3_compiler_nir. >>>> >>>> The intrinsic will load in SSA values for various constants >>>> for images (image_dims), namely the format's bytes-per-pixel, >>>> y-stride and z-stride; for which the backend compiler will emit >>>> the corresponding uniforms. >>>> >>>> const_index[0] is the image index, and const_index[1] is the index >>>> into image_dims' register file for a given image: 0 for bpp, 1 to >>>> y-stride and 2 for z-stride. >>>> --- >>>> src/compiler/nir/nir_intrinsics.py | 2 ++ >>>> 1 file changed, 2 insertions(+) >>>> >>>> diff --git a/src/compiler/nir/nir_intrinsics.py >>>> b/src/compiler/nir/nir_intrinsics.py >>>> index a5cc3f7401c..169c835d746 100644 >>>> --- a/src/compiler/nir/nir_intrinsics.py >>>> +++ b/src/compiler/nir/nir_intrinsics.py >>>> @@ -590,6 +590,8 @@ load("shared", 1, [BASE, ALIGN_MUL, ALIGN_OFFSET], >>>> [CAN_ELIMINATE]) >>>> load("push_constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER]) >>>> # src[] = { offset }. const_index[] = { base, range } >>>> load("constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER]) >>>> +# src[] = { offset }. const_index[] = { image_idx, dim_idx } >>>> +load("image_stride", 1, [], [CAN_REORDER]) >>>> >>>> # Stores work the same way as loads, except now the first source is >>>> the value >>>> # to store and the second (and possibly third) source specify where to >>>> store >>>> -- >>>> 2.20.1 >>>> >>>> ___ >>>> mesa-dev mailing list >>>> mesa-...@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>> >>> >>> >>> >> ___ >> mesa-dev mailing list >> mesa-...@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad
On 1/25/19 6:46 PM, Eric Anholt wrote: > Eduardo Lima Mitev writes: > >> ir3 compiler has an integer multiply-add instruction (IMAD_S24) >> that is used for different offset calculations in the backend. >> Since we intend to move some of these calculations to NIR, we need >> a new ALU op that can represent it. >> --- >> src/compiler/nir/nir_opcodes.py | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/src/compiler/nir/nir_opcodes.py >> b/src/compiler/nir/nir_opcodes.py >> index d32005846a6..b61845fd514 100644 >> --- a/src/compiler/nir/nir_opcodes.py >> +++ b/src/compiler/nir/nir_opcodes.py >> @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, >> src3_size, const_expr): >> [tuint, tuint, tuint], "", const_expr) >> >> triop("ffma", tfloat, "src0 * src1 + src2") >> +triop("imad", tint, "src0 * src1 + src2") > > The constant-folding expression should be corrected to what the backend > operation actually does, and you should probably call it imad24 or > something in that case. > Roger. Got similar feedback from Ilia. Will fix that. Eduardo ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [Mesa-dev] [RFC 1/4] nir: Add a new intrinsic 'load_image_stride'
On 1/25/19 6:42 PM, Eric Anholt wrote: > Eduardo Lima Mitev writes: > >> This is an internal intrinsic intended to be injected by a >> (freedreno-specific) 'lower_sampler_io' pass that will be introduced >> later in this series; and consumed by ir3_compiler_nir. >> >> The intrinsic will load in SSA values for various constants >> for images (image_dims), namely the format's bytes-per-pixel, >> y-stride and z-stride; for which the backend compiler will emit >> the corresponding uniforms. >> >> const_index[0] is the image index, and const_index[1] is the index >> into image_dims' register file for a given image: 0 for bpp, 1 to >> y-stride and 2 for z-stride. > > Can you move this documentation of the meaning of the intrinsic into a > comment in the file? > Yes, will do that. Thanks! ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad
On 1/25/19 5:01 PM, Ilia Mirkin wrote: > On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin wrote: >> >> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called >> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely >> had this, and I think maxwell+ has a variant of this implemented by >> XMAD): >> >> (src0 * src1) & 0xff + src2 > > And of course even that's wrong... the 24th bit has to get > sign-extended on that. Can express it with shifts. > IMAD_S24 is what is currently used in ir3_compiler_nir::get_image_offset(), so the pass doesn't change anything regarding computations. I agree that the nir opcode should hint at the bit limit, so probably nir_op_imad24. That is one of the open questions. Thanks, Eduardo ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
[Freedreno] [RFC 2/4] nir: Add a new ALU nir_op_imad
ir3 compiler has an integer multiply-add instruction (IMAD_S24) that is used for different offset calculations in the backend. Since we intend to move some of these calculations to NIR, we need a new ALU op that can represent it. --- src/compiler/nir/nir_opcodes.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py index d32005846a6..b61845fd514 100644 --- a/src/compiler/nir/nir_opcodes.py +++ b/src/compiler/nir/nir_opcodes.py @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, src3_size, const_expr): [tuint, tuint, tuint], "", const_expr) triop("ffma", tfloat, "src0 * src1 + src2") +triop("imad", tint, "src0 * src1 + src2") triop("flrp", tfloat, "src0 * (1 - src2) + src1 * src2") -- 2.20.1 ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
[Freedreno] [RFC 0/4] freedreno: Move some compiler offset computations to NIR
There are a bunch of instructions emitted on ir3_compiler_nir related to offset computations for IO opcodes (ssbo, image, etc). This small series explores the possibility of moving these instructions to NIR, where we have higher chances of optimizing them. The series introduces a new, freedreno specific NIR pass, 'ir3_nir_lower_sampler_io' (final name not set). The pass is executed early on ir3_optimize_nir(), and the goal is to centralize all these computations there, hoping that later NIR passes will produce better code than what is currently emitted. So far, we have just implemented byte-offset computation for image store and atomics. This seemed like a good first target given the amount of instructions being emitted for it by the backend. This is an RFC series because there are a few open questions, but we wanted to gather feedback already now, in case this effort is something not worth it; and also hoping that somebody else will give it a try against other shaders and on other gens (we have just tried this on a5xx). * We have so far been unable to see any improvement in generated code (not a penalty either). shader-db has not been specially useful. Few shaders there exercise image store or image atomic ops, and of those that do, most require higher versions of GLSL than what freedreno supports, so they get skipped. The few that do actually run, don't show any meaningful difference. Then other shaders picked from tests suites are simple enough not to produce any difference in code either. There is still on-going work looking for cases where the pass helps instruction stats, whether writing custom shaders or porting complex shader from shader-db to run on GLES 310. There is though an open question here as to whether moving backend code to NIR is a benefit in and of itself. * The series adds a nir_op_imad opcode that didn't exist before, and perhaps not generally useful even for freedreno outside this pass, because it maps to IR3_MAD_S24 which is probably not suitable for generic integer multiply-add. * The pass currently has 2 alternative code-paths to emit the multiplication by the bytes-per-pixel of an image format. In one case, since this value can be obtained at compile time, it is emitted as an immediate by nir_imul_imm. The other alternative is emitting an nir_imul with an SSA value that will map to image_dims[0] at shader runtime. The former case is uglier but produces better code (a single SHL instruction), whereas the latter involves a generic imul, for which the backend emits a lot of code to cover for overflow. The doubt here is whether we should introduce a (lower precision) version of imul that maps directly to IR3_IMUL_S. A live (WIP) tree of the series can be found at: <https://gitlab.freedesktop.org/elima/mesa/commits/wip/fd-compiler-io> We plan to continue moving computations to the pass if we see good opportunities. Feedback very welcome, cheers, Eduardo Eduardo Lima Mitev (4): nir: Add a new intrinsic 'load_image_stride' nir: Add a new ALU nir_op_imad ir3/nir: Add a new pass 'ir3_nir_lower_sampler_io' ir3: Use ir3_nir_lower_sampler_io pass src/compiler/nir/nir_intrinsics.py | 2 + src/compiler/nir/nir_opcodes.py | 1 + src/freedreno/Makefile.sources | 1 + src/freedreno/ir3/ir3_compiler_nir.c | 61 ++-- src/freedreno/ir3/ir3_nir.c | 1 + src/freedreno/ir3/ir3_nir.h | 1 + src/freedreno/ir3/ir3_nir_lower_sampler_io.c | 349 +++ 7 files changed, 383 insertions(+), 33 deletions(-) create mode 100644 src/freedreno/ir3/ir3_nir_lower_sampler_io.c -- 2.20.1 ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
[Freedreno] [RFC 4/4] ir3: Use ir3_nir_lower_sampler_io pass
This effectively removes all offset calculations in ir3_compiler_nir::get_image_offset(). No regressions observed on affected tests from Khronos CTS and piglit suites, compared to master. Collecting useful stats on helps/hurts caused by this pass is WIP. Very few shaders in shader-db data-base exercise image store or image atomic ops, and of those that do, most require higher versions of GLSL than what freedreno supports, so they get skipped. There is on-going work writing/porting shaders to collect useful stats. So far, all tested show no meaningful difference compared to master. --- src/freedreno/ir3/ir3_compiler_nir.c | 61 +--- src/freedreno/ir3/ir3_nir.c | 1 + 2 files changed, 29 insertions(+), 33 deletions(-) diff --git a/src/freedreno/ir3/ir3_compiler_nir.c b/src/freedreno/ir3/ir3_compiler_nir.c index fd641735620..fe329db658c 100644 --- a/src/freedreno/ir3/ir3_compiler_nir.c +++ b/src/freedreno/ir3/ir3_compiler_nir.c @@ -548,6 +548,9 @@ emit_alu(struct ir3_context *ctx, nir_alu_instr *alu) ir3_MADSH_M16(b, src[0], 0, src[1], 0, ir3_MULL_U(b, src[0], 0, src[1], 0), 0), 0); break; + case nir_op_imad: + dst[0] = ir3_MAD_S24(b, src[0], 0, src[1], 0, src[2], 0); + break; case nir_op_ineg: dst[0] = ir3_ABSNEG_S(b, src[0], IR3_REG_SNEG); break; @@ -1172,44 +1175,19 @@ get_image_type(const nir_variable *var) static struct ir3_instruction * get_image_offset(struct ir3_context *ctx, const nir_variable *var, - struct ir3_instruction * const *coords, bool byteoff) + struct ir3_instruction * const *coords) { struct ir3_block *b = ctx->block; - struct ir3_instruction *offset; - unsigned ncoords = get_image_coords(var, NULL); - - /* to calculate the byte offset (yes, uggg) we need (up to) three -* const values to know the bytes per pixel, and y and z stride: -*/ - unsigned cb = regid(ctx->so->constbase.image_dims, 0) + - ctx->so->const_layout.image_dims.off[var->data.driver_location]; debug_assert(ctx->so->const_layout.image_dims.mask & (1 << var->data.driver_location)); - /* offset = coords.x * bytes_per_pixel: */ - offset = ir3_MUL_S(b, coords[0], 0, create_uniform(b, cb + 0), 0); - if (ncoords > 1) { - /* offset += coords.y * y_pitch: */ - offset = ir3_MAD_S24(b, create_uniform(b, cb + 1), 0, - coords[1], 0, offset, 0); - } - if (ncoords > 2) { - /* offset += coords.z * z_pitch: */ - offset = ir3_MAD_S24(b, create_uniform(b, cb + 2), 0, - coords[2], 0, offset, 0); - } - - if (!byteoff) { - /* Some cases, like atomics, seem to use dword offset instead -* of byte offsets.. blob just puts an extra shr.b in there -* in those cases: -*/ - offset = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0); - } - + /* ir3_nir_lower_sampler_io pass should have placed the final +* byte-offset (or dword offset for atomics) at the 4th component +* of the coordinate vector. +*/ return ir3_create_collect(ctx, (struct ir3_instruction*[]){ - offset, + coords[3], create_immed(b, 0), }, 2); } @@ -1341,7 +1319,7 @@ emit_intrinsic_store_image(struct ir3_context *ctx, nir_intrinsic_instr *intr) * src2 is 64b byte offset */ - offset = get_image_offset(ctx, var, coords, true); + offset = get_image_offset(ctx, var, coords); /* NOTE: stib seems to take byte offset, but stgb.typed can be used * too and takes a dword offset.. not quite sure yet why blob uses @@ -1443,7 +1421,7 @@ emit_intrinsic_atomic_image(struct ir3_context *ctx, nir_intrinsic_instr *intr) */ src0 = ir3_get_src(ctx, >src[3])[0]; src1 = ir3_create_collect(ctx, coords, ncoords); - src2 = get_image_offset(ctx, var, coords, false); + src2 = get_image_offset(ctx, var, coords); switch (intr->intrinsic) { case nir_intrinsic_image_deref_atomic_add: @@ -1612,6 +1590,23 @@ emit_intrinsic(struct ir3_context *ctx, nir_intrinsic_instr *intr) } switch (intr->intrinsic) { + case nir_intrinsic_load_image_stride: { + idx = intr->const_index[0]; + + /* this is the index into image_dims offsets, which can take +* values 0, 1 or 2 (bpp, y-stride, z-stride respectively). +*/ + uint8_t off = intr->const_index[1]; + debug_assert(off <= 2); + + unsigned cb = regid(ctx->so->constbase.image_dims, 0) + +
[Freedreno] [RFC 3/4] ir3/nir: Add a new pass 'ir3_nir_lower_sampler_io'
This pass moves to NIR some offset calculations that are currently implemented on the backend compiler, to allow NIR to possibly optimize them. For now, only coordinate byte-offset calculations for imageStore and image atomic operations are implemented. --- src/freedreno/Makefile.sources | 1 + src/freedreno/ir3/ir3_nir.h | 1 + src/freedreno/ir3/ir3_nir_lower_sampler_io.c | 349 +++ 3 files changed, 351 insertions(+) create mode 100644 src/freedreno/ir3/ir3_nir_lower_sampler_io.c diff --git a/src/freedreno/Makefile.sources b/src/freedreno/Makefile.sources index 7fea9de39ef..fd4f7f294cd 100644 --- a/src/freedreno/Makefile.sources +++ b/src/freedreno/Makefile.sources @@ -31,6 +31,7 @@ ir3_SOURCES := \ ir3/ir3_legalize.c \ ir3/ir3_nir.c \ ir3/ir3_nir.h \ + ir3/ir3_nir_lower_sampler_io.c \ ir3/ir3_nir_lower_tg4_to_tex.c \ ir3/ir3_print.c \ ir3/ir3_ra.c \ diff --git a/src/freedreno/ir3/ir3_nir.h b/src/freedreno/ir3/ir3_nir.h index 74201d34160..52809ba099e 100644 --- a/src/freedreno/ir3/ir3_nir.h +++ b/src/freedreno/ir3/ir3_nir.h @@ -36,6 +36,7 @@ void ir3_nir_scan_driver_consts(nir_shader *shader, struct ir3_driver_const_layo bool ir3_nir_apply_trig_workarounds(nir_shader *shader); bool ir3_nir_lower_tg4_to_tex(nir_shader *shader); +bool ir3_nir_lower_sampler_io(nir_shader *shader); const nir_shader_compiler_options * ir3_get_compiler_options(struct ir3_compiler *compiler); bool ir3_key_lowers_nir(const struct ir3_shader_key *key); diff --git a/src/freedreno/ir3/ir3_nir_lower_sampler_io.c b/src/freedreno/ir3/ir3_nir_lower_sampler_io.c new file mode 100644 index 000..e2910d8906d --- /dev/null +++ b/src/freedreno/ir3/ir3_nir_lower_sampler_io.c @@ -0,0 +1,349 @@ +/* + * Copyright © 2018 Igalia S.L. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "ir3_nir.h" +#include "compiler/nir/nir_builder.h" + +/** + * The goal of this pass is to move to NIR some offset calculations for + * different I/O that are currently implemented on the backend compiler, + * to allow NIR to possibly optimize them. + * + * Currently, only offset calculations for image store and image + * atomic operations are implemented. + */ + + +/* This flag enables/disables a code-path where the bytes-per-pixel of + * an image is obtained directly from the format, which is known + * at shader compile time; as opposed to using image_dims[0] constant + * available only at shader runtime. + * + * Inlining the bytes-per-pixel here as an immediate has the advantage + * that it gets converted to a single (SHL) instruction (because all + * possible values are powers of two); while loading it as a uniform and + * emitting an IMUL will cause the backend to expand it to quite a few + * instructions (see ir3_compiler_nir for imul), thus ultimately hurting + * instruction count. + */ +#define INLINE_BPP 1 + + +static bool +intrinsic_is_image_atomic(unsigned intrinsic) +{ + switch (intrinsic) { + case nir_intrinsic_image_deref_atomic_add: + case nir_intrinsic_image_deref_atomic_min: + case nir_intrinsic_image_deref_atomic_max: + case nir_intrinsic_image_deref_atomic_and: + case nir_intrinsic_image_deref_atomic_or: + case nir_intrinsic_image_deref_atomic_xor: + case nir_intrinsic_image_deref_atomic_exchange: + case nir_intrinsic_image_deref_atomic_comp_swap: + return true; + default: + break; + } + + return false; +} + +static bool +intrinsic_is_image_store_or_atomic(unsigned intrinsic) +{ + if (intrinsic == nir_intrinsic_image_deref_store) + return true; + else + return intrinsic_is_image_atomic(intrinsic); +} + +/* + * FIXME: shamelessly copied from ir3_compiler_nir until it gets factorized + * out at some point. + */
[Freedreno] [RFC 1/4] nir: Add a new intrinsic 'load_image_stride'
This is an internal intrinsic intended to be injected by a (freedreno-specific) 'lower_sampler_io' pass that will be introduced later in this series; and consumed by ir3_compiler_nir. The intrinsic will load in SSA values for various constants for images (image_dims), namely the format's bytes-per-pixel, y-stride and z-stride; for which the backend compiler will emit the corresponding uniforms. const_index[0] is the image index, and const_index[1] is the index into image_dims' register file for a given image: 0 for bpp, 1 to y-stride and 2 for z-stride. --- src/compiler/nir/nir_intrinsics.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/compiler/nir/nir_intrinsics.py b/src/compiler/nir/nir_intrinsics.py index a5cc3f7401c..169c835d746 100644 --- a/src/compiler/nir/nir_intrinsics.py +++ b/src/compiler/nir/nir_intrinsics.py @@ -590,6 +590,8 @@ load("shared", 1, [BASE, ALIGN_MUL, ALIGN_OFFSET], [CAN_ELIMINATE]) load("push_constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER]) # src[] = { offset }. const_index[] = { base, range } load("constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER]) +# src[] = { offset }. const_index[] = { image_idx, dim_idx } +load("image_stride", 1, [], [CAN_REORDER]) # Stores work the same way as loads, except now the first source is the value # to store and the second (and possibly third) source specify where to store -- 2.20.1 ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
[Freedreno] [PATCH v3] ir3_compiler/nir: fix imageSize() for buffer-backed images
GL_EXT_texture_buffer introduced texture buffers, which can be used in shaders through a new type imageBuffer. Because how image access is implemented in freedreno, calling imageSize on an imageBuffer returns the size in bytes instead of texels, which is incorrect. This patch adds a division of imageSize result by the bytes-per-pixel of the image format, when image is buffer-backed. Fixes all tests under dEQP-GLES31.functional.image_load_store.buffer.image_size.* v2: Pre-compute and submit the log2 of the image format's bpp as shader constant instead of emitting the LOG2 instruction in code. (Rob Clark) v3: Use ffs (find-first-bit) helper for computing log2 (Ilia Mirkin) --- .../drivers/freedreno/ir3/ir3_compiler_nir.c | 23 +++ .../drivers/freedreno/ir3/ir3_shader.c| 10 2 files changed, 33 insertions(+) diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c index 197196383b0..7a3c8a8579c 100644 --- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c +++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c @@ -2035,6 +2035,29 @@ emit_intrinsic_image_size(struct ir3_context *ctx, nir_intrinsic_instr *intr, split_dest(b, tmp, sam, 0, 4); + /* get_size instruction returns size in bytes instead of texels +* for imageBuffer, so we need to divide it by the pixel size +* of the image format. +* +* TODO: This is at least true on a5xx. Check other gens. +*/ + enum glsl_sampler_dim dim = + glsl_get_sampler_dim(glsl_without_array(var->type)); + if (dim == GLSL_SAMPLER_DIM_BUF) { + /* Since all the possible values the divisor can take are +* power-of-two (4, 8, or 16), the division is implemented +* as a shift-right. +* During shader setup, the log2 of the image format's +* bytes-per-pixel should have been emitted in 2nd slot of +* image_dims. See ir3_shader::emit_image_dims(). +*/ + unsigned cb = regid(ctx->so->constbase.image_dims, 0) + + ctx->so->const_layout.image_dims.off[var->data.driver_location]; + struct ir3_instruction *aux = create_uniform(ctx, cb + 1); + + tmp[0] = ir3_SHR_B(b, tmp[0], 0, aux, 0); + } + for (unsigned i = 0; i < ncoords; i++) dst[i] = tmp[i]; diff --git a/src/gallium/drivers/freedreno/ir3/ir3_shader.c b/src/gallium/drivers/freedreno/ir3/ir3_shader.c index 9bf0a7f999c..b3127ff8c38 100644 --- a/src/gallium/drivers/freedreno/ir3/ir3_shader.c +++ b/src/gallium/drivers/freedreno/ir3/ir3_shader.c @@ -699,6 +699,16 @@ emit_image_dims(struct fd_context *ctx, const struct ir3_shader_variant *v, } else { dims[off + 2] = rsc->slices[lvl].size0; } + } else { + /* For buffer-backed images, the log2 of the format's +* bytes-per-pixel is placed on the 2nd slot. This is useful +* when emitting image_size instructions, for which we need +* to divide by bpp for image buffers. Since the bpp +* can only be power-of-two, the division is implemented +* as a SHR, and for that it is handy to have the log2 of +* bpp as a constant. (log2 = first-set-bit - 1) +*/ + dims[off + 1] = ffs(dims[off + 0]) - 1; } } -- 2.19.1 ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
[Freedreno] [PATCH] ir3_compiler/nir: fix imageSize() for buffer-backed images
GL_EXT_texture_buffer introduced texture buffers, which can be used in shaders through a new type imageBuffer. Because how image access is implemented in freedreno, calling imageSize on an imageBuffer returns the size in bytes instead of texels, which is incorrect. This patch adds a division of imageSize result by the bytes-per-pixel of the image format, when image is buffer-backed. Fixes all tests under dEQP-GLES31.functional.image_load_store.buffer.image_size.* v2: Pre-compute and submit the log2 of the image format's bpp as shader constant instead of emitting the LOG2 instruction in code. (Rob Clark) --- .../drivers/freedreno/ir3/ir3_compiler_nir.c | 23 +++ .../drivers/freedreno/ir3/ir3_shader.c| 10 2 files changed, 33 insertions(+) diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c index 197196383b0..7a3c8a8579c 100644 --- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c +++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c @@ -2035,6 +2035,29 @@ emit_intrinsic_image_size(struct ir3_context *ctx, nir_intrinsic_instr *intr, split_dest(b, tmp, sam, 0, 4); + /* get_size instruction returns size in bytes instead of texels +* for imageBuffer, so we need to divide it by the pixel size +* of the image format. +* +* TODO: This is at least true on a5xx. Check other gens. +*/ + enum glsl_sampler_dim dim = + glsl_get_sampler_dim(glsl_without_array(var->type)); + if (dim == GLSL_SAMPLER_DIM_BUF) { + /* Since all the possible values the divisor can take are +* power-of-two (4, 8, or 16), the division is implemented +* as a shift-right. +* During shader setup, the log2 of the image format's +* bytes-per-pixel should have been emitted in 2nd slot of +* image_dims. See ir3_shader::emit_image_dims(). +*/ + unsigned cb = regid(ctx->so->constbase.image_dims, 0) + + ctx->so->const_layout.image_dims.off[var->data.driver_location]; + struct ir3_instruction *aux = create_uniform(ctx, cb + 1); + + tmp[0] = ir3_SHR_B(b, tmp[0], 0, aux, 0); + } + for (unsigned i = 0; i < ncoords; i++) dst[i] = tmp[i]; diff --git a/src/gallium/drivers/freedreno/ir3/ir3_shader.c b/src/gallium/drivers/freedreno/ir3/ir3_shader.c index 9bf0a7f999c..de59e5888c6 100644 --- a/src/gallium/drivers/freedreno/ir3/ir3_shader.c +++ b/src/gallium/drivers/freedreno/ir3/ir3_shader.c @@ -699,6 +699,16 @@ emit_image_dims(struct fd_context *ctx, const struct ir3_shader_variant *v, } else { dims[off + 2] = rsc->slices[lvl].size0; } + } else { + /* For buffer-backed images, the log2 of the format's +* bytes-per-pixel is placed on the 2nd slot. This is useful +* when emitting image_size instructions, for which we need +* to divide by bpp for image buffers. Since the bpp +* can only be power-of-two, the division is implemented +* as a SHR, and for that it is handy to have the log2 of +* bpp as a constant. +*/ + dims[off + 1] = log (dims[off + 0]) / log (2); } } -- 2.19.1 ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
[Freedreno] [PATCH] ir3/nir: Set up image_dims consts for image_deref_size intrinsic too
`nir_intrinsic_image_deref_size` is not being considered during scan for driver constants, so image constants are not emitted if a shader only ever query the size of an image (no load, store, atomic op, etc). This is unlikely, but possible. --- src/gallium/drivers/freedreno/ir3/ir3_nir.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/drivers/freedreno/ir3/ir3_nir.c b/src/gallium/drivers/freedreno/ir3/ir3_nir.c index d934acc7427..63866ae4d01 100644 --- a/src/gallium/drivers/freedreno/ir3/ir3_nir.c +++ b/src/gallium/drivers/freedreno/ir3/ir3_nir.c @@ -254,6 +254,7 @@ ir3_nir_scan_driver_consts(nir_shader *shader, case nir_intrinsic_image_deref_atomic_exchange: case nir_intrinsic_image_deref_atomic_comp_swap: case nir_intrinsic_image_deref_store: + case nir_intrinsic_image_deref_size: idx = nir_intrinsic_get_var(intr, 0)->data.driver_location; if (layout->image_dims.mask & (1 << idx)) break; -- 2.19.1 ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno