Re: [Freedreno] [Mesa-dev] [RFC 1/4] nir: Add a new intrinsic 'load_image_stride'

2019-01-28 Thread Eduardo Lima Mitev
On 1/28/19 10:16 AM, Eduardo Lima Mitev wrote:
> On 1/28/19 9:56 AM, Bas Nieuwenhuizen wrote:
>> On Mon, Jan 28, 2019 at 9:38 AM Eduardo Lima Mitev  wrote:
>>>
>>> On 1/26/19 5:34 PM, Jason Ekstrand wrote:
>>>> Mind suffixing it with _ir3 or something since it's a back-end-specific
>>>> intrinsic?  Incidentally, this looks a lot like load_image_param_intel.
>>>>
>>>
>>> Yes, I felt inclined to add the suffix but wasn't sure how good/bad a
>>> practice it is, so I deferred it to review :).
>>>
>>> Now, I did a quick experiment and it turns out I can reuse
>>> image_deref_load_param_intel as-is. The semantics are a bit different so
>>> I would have to fork the comment to explain ir3 too.
>>>
>>> So, what about I remove the '_intel' suffix to that one and use if for
>>> this ir3?
>>
>> Instructions that have different semantics for different drivers sound
>> like a lot of potential for confusion and latent bugs to me. Is it
>> really a problem to have both?
>>
> 
> It is not a problem at all, but I think it is preferable not to
> duplicate this intrinsic if it serves the same purpose on both backends,
> and only slightly differ in the specific set of params it holds:
> 
> On both backends the intrinsic is used to represent registers at compile
> time, that will be resolved to uniforms at shader run time.
> 
> On intel, the params are offset, tiling, stride and swizzling; on ir3
> they are bytes-per-pixel, y-stride and z-stride. That's what I mean by
> different semantics, and I think the difference is not enough to justify
> forking it.
> 

Actually, one can argue there is no semantic difference at all from
NIR's point of view. The intrinsic just load certain image params on
both backends, and it is only backend-specific what those params are.

> I'm ok either way, though.
> 
> Eduardo
> 
>>>
>>> Thanks for the feedback and the tip!
>>>
>>> Eduardo
>>>
>>>> On January 25, 2019 07:48:54 Eduardo Lima Mitev  wrote:
>>>>
>>>>> This is an internal intrinsic intended to be injected by a
>>>>> (freedreno-specific) 'lower_sampler_io' pass that will be introduced
>>>>> later in this series; and consumed by ir3_compiler_nir.
>>>>>
>>>>> The intrinsic will load in SSA values for various constants
>>>>> for images (image_dims), namely the format's bytes-per-pixel,
>>>>> y-stride and z-stride; for which the backend compiler will emit
>>>>> the corresponding uniforms.
>>>>>
>>>>> const_index[0] is the image index, and const_index[1] is the index
>>>>> into image_dims' register file for a given image: 0 for bpp, 1 to
>>>>> y-stride and 2 for z-stride.
>>>>> ---
>>>>> src/compiler/nir/nir_intrinsics.py | 2 ++
>>>>> 1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/src/compiler/nir/nir_intrinsics.py
>>>>> b/src/compiler/nir/nir_intrinsics.py
>>>>> index a5cc3f7401c..169c835d746 100644
>>>>> --- a/src/compiler/nir/nir_intrinsics.py
>>>>> +++ b/src/compiler/nir/nir_intrinsics.py
>>>>> @@ -590,6 +590,8 @@ load("shared", 1, [BASE, ALIGN_MUL, ALIGN_OFFSET],
>>>>> [CAN_ELIMINATE])
>>>>> load("push_constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER])
>>>>> # src[] = { offset }. const_index[] = { base, range }
>>>>> load("constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER])
>>>>> +# src[] = { offset }. const_index[] = { image_idx, dim_idx }
>>>>> +load("image_stride", 1, [], [CAN_REORDER])
>>>>>
>>>>> # Stores work the same way as loads, except now the first source is
>>>>> the value
>>>>> # to store and the second (and possibly third) source specify where to
>>>>> store
>>>>> --
>>>>> 2.20.1
>>>>>
>>>>> ___
>>>>> mesa-dev mailing list
>>>>> mesa-...@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>
>>>>
>>>>
>>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-...@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
> ___
> mesa-dev mailing list
> mesa-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 1/4] nir: Add a new intrinsic 'load_image_stride'

2019-01-28 Thread Eduardo Lima Mitev
On 1/28/19 9:56 AM, Bas Nieuwenhuizen wrote:
> On Mon, Jan 28, 2019 at 9:38 AM Eduardo Lima Mitev  wrote:
>>
>> On 1/26/19 5:34 PM, Jason Ekstrand wrote:
>>> Mind suffixing it with _ir3 or something since it's a back-end-specific
>>> intrinsic?  Incidentally, this looks a lot like load_image_param_intel.
>>>
>>
>> Yes, I felt inclined to add the suffix but wasn't sure how good/bad a
>> practice it is, so I deferred it to review :).
>>
>> Now, I did a quick experiment and it turns out I can reuse
>> image_deref_load_param_intel as-is. The semantics are a bit different so
>> I would have to fork the comment to explain ir3 too.
>>
>> So, what about I remove the '_intel' suffix to that one and use if for
>> this ir3?
> 
> Instructions that have different semantics for different drivers sound
> like a lot of potential for confusion and latent bugs to me. Is it
> really a problem to have both?
>

It is not a problem at all, but I think it is preferable not to
duplicate this intrinsic if it serves the same purpose on both backends,
and only slightly differ in the specific set of params it holds:

On both backends the intrinsic is used to represent registers at compile
time, that will be resolved to uniforms at shader run time.

On intel, the params are offset, tiling, stride and swizzling; on ir3
they are bytes-per-pixel, y-stride and z-stride. That's what I mean by
different semantics, and I think the difference is not enough to justify
forking it.

I'm ok either way, though.

Eduardo

>>
>> Thanks for the feedback and the tip!
>>
>> Eduardo
>>
>>> On January 25, 2019 07:48:54 Eduardo Lima Mitev  wrote:
>>>
>>>> This is an internal intrinsic intended to be injected by a
>>>> (freedreno-specific) 'lower_sampler_io' pass that will be introduced
>>>> later in this series; and consumed by ir3_compiler_nir.
>>>>
>>>> The intrinsic will load in SSA values for various constants
>>>> for images (image_dims), namely the format's bytes-per-pixel,
>>>> y-stride and z-stride; for which the backend compiler will emit
>>>> the corresponding uniforms.
>>>>
>>>> const_index[0] is the image index, and const_index[1] is the index
>>>> into image_dims' register file for a given image: 0 for bpp, 1 to
>>>> y-stride and 2 for z-stride.
>>>> ---
>>>> src/compiler/nir/nir_intrinsics.py | 2 ++
>>>> 1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/src/compiler/nir/nir_intrinsics.py
>>>> b/src/compiler/nir/nir_intrinsics.py
>>>> index a5cc3f7401c..169c835d746 100644
>>>> --- a/src/compiler/nir/nir_intrinsics.py
>>>> +++ b/src/compiler/nir/nir_intrinsics.py
>>>> @@ -590,6 +590,8 @@ load("shared", 1, [BASE, ALIGN_MUL, ALIGN_OFFSET],
>>>> [CAN_ELIMINATE])
>>>> load("push_constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER])
>>>> # src[] = { offset }. const_index[] = { base, range }
>>>> load("constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER])
>>>> +# src[] = { offset }. const_index[] = { image_idx, dim_idx }
>>>> +load("image_stride", 1, [], [CAN_REORDER])
>>>>
>>>> # Stores work the same way as loads, except now the first source is
>>>> the value
>>>> # to store and the second (and possibly third) source specify where to
>>>> store
>>>> --
>>>> 2.20.1
>>>>
>>>> ___
>>>> mesa-dev mailing list
>>>> mesa-...@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>
>>>
>>>
>>>
>> ___
>> mesa-dev mailing list
>> mesa-...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-26 Thread Eduardo Lima Mitev
On 1/25/19 6:46 PM, Eric Anholt wrote:
> Eduardo Lima Mitev  writes:
> 
>> ir3 compiler has an integer multiply-add instruction (IMAD_S24)
>> that is used for different offset calculations in the backend.
>> Since we intend to move some of these calculations to NIR, we need
>> a new ALU op that can represent it.
>> ---
>>  src/compiler/nir/nir_opcodes.py | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/nir/nir_opcodes.py 
>> b/src/compiler/nir/nir_opcodes.py
>> index d32005846a6..b61845fd514 100644
>> --- a/src/compiler/nir/nir_opcodes.py
>> +++ b/src/compiler/nir/nir_opcodes.py
>> @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, 
>> src3_size, const_expr):
>> [tuint, tuint, tuint], "", const_expr)
>>  
>>  triop("ffma", tfloat, "src0 * src1 + src2")
>> +triop("imad", tint, "src0 * src1 + src2")
> 
> The constant-folding expression should be corrected to what the backend
> operation actually does, and you should probably call it imad24 or
> something in that case.
> 

Roger. Got similar feedback from Ilia. Will fix that.

Eduardo
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 1/4] nir: Add a new intrinsic 'load_image_stride'

2019-01-26 Thread Eduardo Lima Mitev
On 1/25/19 6:42 PM, Eric Anholt wrote:
> Eduardo Lima Mitev  writes:
> 
>> This is an internal intrinsic intended to be injected by a
>> (freedreno-specific) 'lower_sampler_io' pass that will be introduced
>> later in this series; and consumed by ir3_compiler_nir.
>>
>> The intrinsic will load in SSA values for various constants
>> for images (image_dims), namely the format's bytes-per-pixel,
>> y-stride and z-stride; for which the backend compiler will emit
>> the corresponding uniforms.
>>
>> const_index[0] is the image index, and const_index[1] is the index
>> into image_dims' register file for a given image: 0 for bpp, 1 to
>> y-stride and 2 for z-stride.
> 
> Can you move this documentation of the meaning of the intrinsic into a
> comment in the file?
> 

Yes, will do that.
Thanks!
___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Eduardo Lima Mitev
On 1/25/19 5:01 PM, Ilia Mirkin wrote:
> On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin  wrote:
>>
>> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called
>> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely
>> had this, and I think maxwell+ has a variant of this implemented by
>> XMAD):
>>
>> (src0 * src1) & 0xff + src2
> 
> And of course even that's wrong... the 24th bit has to get
> sign-extended on that. Can express it with shifts.
>

IMAD_S24 is what is currently used in
ir3_compiler_nir::get_image_offset(), so the pass doesn't change
anything regarding computations.

I agree that the nir opcode should hint at the bit limit, so probably
nir_op_imad24. That is one of the open questions.

Thanks,

Eduardo


___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [RFC 2/4] nir: Add a new ALU nir_op_imad

2019-01-25 Thread Eduardo Lima Mitev
ir3 compiler has an integer multiply-add instruction (IMAD_S24)
that is used for different offset calculations in the backend.
Since we intend to move some of these calculations to NIR, we need
a new ALU op that can represent it.
---
 src/compiler/nir/nir_opcodes.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index d32005846a6..b61845fd514 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, 
src3_size, const_expr):
[tuint, tuint, tuint], "", const_expr)
 
 triop("ffma", tfloat, "src0 * src1 + src2")
+triop("imad", tint, "src0 * src1 + src2")
 
 triop("flrp", tfloat, "src0 * (1 - src2) + src1 * src2")
 
-- 
2.20.1

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [RFC 0/4] freedreno: Move some compiler offset computations to NIR

2019-01-25 Thread Eduardo Lima Mitev
There are a bunch of instructions emitted on ir3_compiler_nir related to
offset computations for IO opcodes (ssbo, image, etc). This small series
explores the possibility of moving these instructions to NIR, where we
have higher chances of optimizing them.

The series introduces a new, freedreno specific NIR pass,
'ir3_nir_lower_sampler_io' (final name not set). The pass is executed
early on ir3_optimize_nir(), and the goal is to centralize all these
computations there, hoping that later NIR passes will produce better
code than what is currently emitted.

So far, we have just implemented byte-offset computation for image store
and atomics. This seemed like a good first target given the amount of
instructions being emitted for it by the backend.

This is an RFC series because there are a few open questions, but we
wanted to gather feedback already now, in case this effort is something
not worth it; and also hoping that somebody else will give it a try
against other shaders and on other gens (we have just tried this on
a5xx).

* We have so far been unable to see any improvement in generated code
(not a penalty either). shader-db has not been specially useful. Few
shaders there exercise image store or image atomic ops, and of those
that do, most require higher versions of GLSL than what freedreno
supports, so they get skipped. The few that do actually run, don't
show any meaningful difference.

Then other shaders picked from tests suites are simple enough not to
produce any difference in code either.

There is still on-going work looking for cases where the pass helps
instruction stats, whether writing custom shaders or porting complex
shader from shader-db to run on GLES 310.

There is though an open question here as to whether moving backend
code to NIR is a benefit in and of itself.

* The series adds a nir_op_imad opcode that didn't exist before, and
perhaps not generally useful even for freedreno outside this pass,
because it maps to IR3_MAD_S24 which is probably not suitable for
generic integer multiply-add.

* The pass currently has 2 alternative code-paths to emit the
multiplication by the bytes-per-pixel of an image format. In one
case, since this value can be obtained at compile time, it is
emitted as an immediate by nir_imul_imm. The other alternative is
emitting an nir_imul with an SSA value that will map to
image_dims[0] at shader runtime.

The former case is uglier but produces better code (a single SHL
instruction), whereas the latter involves a generic imul, for which
the backend emits a lot of code to cover for overflow.

The doubt here is whether we should introduce a (lower precision)
version of imul that maps directly to IR3_IMUL_S.


A live (WIP) tree of the series can be found at:
<https://gitlab.freedesktop.org/elima/mesa/commits/wip/fd-compiler-io>

We plan to continue moving computations to the pass if we see
good opportunities.

Feedback very welcome,

cheers,
Eduardo

Eduardo Lima Mitev (4):
  nir: Add a new intrinsic 'load_image_stride'
  nir: Add a new ALU nir_op_imad
  ir3/nir: Add a new pass 'ir3_nir_lower_sampler_io'
  ir3: Use ir3_nir_lower_sampler_io pass

 src/compiler/nir/nir_intrinsics.py   |   2 +
 src/compiler/nir/nir_opcodes.py  |   1 +
 src/freedreno/Makefile.sources   |   1 +
 src/freedreno/ir3/ir3_compiler_nir.c |  61 ++--
 src/freedreno/ir3/ir3_nir.c  |   1 +
 src/freedreno/ir3/ir3_nir.h  |   1 +
 src/freedreno/ir3/ir3_nir_lower_sampler_io.c | 349 +++
 7 files changed, 383 insertions(+), 33 deletions(-)
 create mode 100644 src/freedreno/ir3/ir3_nir_lower_sampler_io.c

-- 
2.20.1

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [RFC 4/4] ir3: Use ir3_nir_lower_sampler_io pass

2019-01-25 Thread Eduardo Lima Mitev
This effectively removes all offset calculations in
ir3_compiler_nir::get_image_offset().

No regressions observed on affected tests from Khronos CTS and piglit
suites, compared to master.

Collecting useful stats on helps/hurts caused by this pass is WIP. Very
few shaders in shader-db data-base exercise image store or image
atomic ops, and of those that do, most require higher versions of
GLSL than what freedreno supports, so they get skipped.

There is on-going work writing/porting shaders to collect useful
stats. So far, all tested show no meaningful difference compared
to master.
---
 src/freedreno/ir3/ir3_compiler_nir.c | 61 +---
 src/freedreno/ir3/ir3_nir.c  |  1 +
 2 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/src/freedreno/ir3/ir3_compiler_nir.c 
b/src/freedreno/ir3/ir3_compiler_nir.c
index fd641735620..fe329db658c 100644
--- a/src/freedreno/ir3/ir3_compiler_nir.c
+++ b/src/freedreno/ir3/ir3_compiler_nir.c
@@ -548,6 +548,9 @@ emit_alu(struct ir3_context *ctx, nir_alu_instr *alu)
ir3_MADSH_M16(b, src[0], 0, src[1], 0,
ir3_MULL_U(b, src[0], 0, 
src[1], 0), 0), 0);
break;
+   case nir_op_imad:
+   dst[0] = ir3_MAD_S24(b, src[0], 0, src[1], 0, src[2], 0);
+   break;
case nir_op_ineg:
dst[0] = ir3_ABSNEG_S(b, src[0], IR3_REG_SNEG);
break;
@@ -1172,44 +1175,19 @@ get_image_type(const nir_variable *var)
 
 static struct ir3_instruction *
 get_image_offset(struct ir3_context *ctx, const nir_variable *var,
-   struct ir3_instruction * const *coords, bool byteoff)
+   struct ir3_instruction * const *coords)
 {
struct ir3_block *b = ctx->block;
-   struct ir3_instruction *offset;
-   unsigned ncoords = get_image_coords(var, NULL);
-
-   /* to calculate the byte offset (yes, uggg) we need (up to) three
-* const values to know the bytes per pixel, and y and z stride:
-*/
-   unsigned cb = regid(ctx->so->constbase.image_dims, 0) +
-   ctx->so->const_layout.image_dims.off[var->data.driver_location];
 
debug_assert(ctx->so->const_layout.image_dims.mask &
(1 << var->data.driver_location));
 
-   /* offset = coords.x * bytes_per_pixel: */
-   offset = ir3_MUL_S(b, coords[0], 0, create_uniform(b, cb + 0), 0);
-   if (ncoords > 1) {
-   /* offset += coords.y * y_pitch: */
-   offset = ir3_MAD_S24(b, create_uniform(b, cb + 1), 0,
-   coords[1], 0, offset, 0);
-   }
-   if (ncoords > 2) {
-   /* offset += coords.z * z_pitch: */
-   offset = ir3_MAD_S24(b, create_uniform(b, cb + 2), 0,
-   coords[2], 0, offset, 0);
-   }
-
-   if (!byteoff) {
-   /* Some cases, like atomics, seem to use dword offset instead
-* of byte offsets.. blob just puts an extra shr.b in there
-* in those cases:
-*/
-   offset = ir3_SHR_B(b, offset, 0, create_immed(b, 2), 0);
-   }
-
+   /* ir3_nir_lower_sampler_io pass should have placed the final
+* byte-offset (or dword offset for atomics) at the 4th component
+* of the coordinate vector.
+*/
return ir3_create_collect(ctx, (struct ir3_instruction*[]){
-   offset,
+   coords[3],
create_immed(b, 0),
}, 2);
 }
@@ -1341,7 +1319,7 @@ emit_intrinsic_store_image(struct ir3_context *ctx, 
nir_intrinsic_instr *intr)
 * src2 is 64b byte offset
 */
 
-   offset = get_image_offset(ctx, var, coords, true);
+   offset = get_image_offset(ctx, var, coords);
 
/* NOTE: stib seems to take byte offset, but stgb.typed can be used
 * too and takes a dword offset.. not quite sure yet why blob uses
@@ -1443,7 +1421,7 @@ emit_intrinsic_atomic_image(struct ir3_context *ctx, 
nir_intrinsic_instr *intr)
 */
src0 = ir3_get_src(ctx, >src[3])[0];
src1 = ir3_create_collect(ctx, coords, ncoords);
-   src2 = get_image_offset(ctx, var, coords, false);
+   src2 = get_image_offset(ctx, var, coords);
 
switch (intr->intrinsic) {
case nir_intrinsic_image_deref_atomic_add:
@@ -1612,6 +1590,23 @@ emit_intrinsic(struct ir3_context *ctx, 
nir_intrinsic_instr *intr)
}
 
switch (intr->intrinsic) {
+   case nir_intrinsic_load_image_stride: {
+   idx = intr->const_index[0];
+
+   /* this is the index into image_dims offsets, which can take
+* values 0, 1 or 2 (bpp, y-stride, z-stride respectively).
+*/
+   uint8_t off = intr->const_index[1];
+   debug_assert(off <= 2);
+
+   unsigned cb = regid(ctx->so->constbase.image_dims, 0) +
+  

[Freedreno] [RFC 3/4] ir3/nir: Add a new pass 'ir3_nir_lower_sampler_io'

2019-01-25 Thread Eduardo Lima Mitev
This pass moves to NIR some offset calculations that are currently
implemented on the backend compiler, to allow NIR to possibly
optimize them.

For now, only coordinate byte-offset calculations for imageStore
and image atomic operations are implemented.
---
 src/freedreno/Makefile.sources   |   1 +
 src/freedreno/ir3/ir3_nir.h  |   1 +
 src/freedreno/ir3/ir3_nir_lower_sampler_io.c | 349 +++
 3 files changed, 351 insertions(+)
 create mode 100644 src/freedreno/ir3/ir3_nir_lower_sampler_io.c

diff --git a/src/freedreno/Makefile.sources b/src/freedreno/Makefile.sources
index 7fea9de39ef..fd4f7f294cd 100644
--- a/src/freedreno/Makefile.sources
+++ b/src/freedreno/Makefile.sources
@@ -31,6 +31,7 @@ ir3_SOURCES := \
ir3/ir3_legalize.c \
ir3/ir3_nir.c \
ir3/ir3_nir.h \
+   ir3/ir3_nir_lower_sampler_io.c \
ir3/ir3_nir_lower_tg4_to_tex.c \
ir3/ir3_print.c \
ir3/ir3_ra.c \
diff --git a/src/freedreno/ir3/ir3_nir.h b/src/freedreno/ir3/ir3_nir.h
index 74201d34160..52809ba099e 100644
--- a/src/freedreno/ir3/ir3_nir.h
+++ b/src/freedreno/ir3/ir3_nir.h
@@ -36,6 +36,7 @@ void ir3_nir_scan_driver_consts(nir_shader *shader, struct 
ir3_driver_const_layo
 
 bool ir3_nir_apply_trig_workarounds(nir_shader *shader);
 bool ir3_nir_lower_tg4_to_tex(nir_shader *shader);
+bool ir3_nir_lower_sampler_io(nir_shader *shader);
 
 const nir_shader_compiler_options * ir3_get_compiler_options(struct 
ir3_compiler *compiler);
 bool ir3_key_lowers_nir(const struct ir3_shader_key *key);
diff --git a/src/freedreno/ir3/ir3_nir_lower_sampler_io.c 
b/src/freedreno/ir3/ir3_nir_lower_sampler_io.c
new file mode 100644
index 000..e2910d8906d
--- /dev/null
+++ b/src/freedreno/ir3/ir3_nir_lower_sampler_io.c
@@ -0,0 +1,349 @@
+/*
+ * Copyright © 2018 Igalia S.L.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "ir3_nir.h"
+#include "compiler/nir/nir_builder.h"
+
+/**
+ * The goal of this pass is to move to NIR some offset calculations for
+ * different I/O that are currently implemented on the backend compiler,
+ * to allow NIR to possibly optimize them.
+ *
+ * Currently, only offset calculations for image store and image
+ * atomic operations are implemented.
+ */
+
+
+/* This flag enables/disables a code-path where the bytes-per-pixel of
+ * an image is obtained directly from the format, which is known
+ * at shader compile time; as opposed to using image_dims[0] constant
+ * available only at shader runtime.
+ *
+ * Inlining the bytes-per-pixel here as an immediate has the advantage
+ * that it gets converted to a single (SHL) instruction (because all
+ * possible values are powers of two); while loading it as a uniform and
+ * emitting an IMUL will cause the backend to expand it to quite a few
+ * instructions (see ir3_compiler_nir for imul), thus ultimately hurting
+ * instruction count.
+ */
+#define INLINE_BPP 1
+
+
+static bool
+intrinsic_is_image_atomic(unsigned intrinsic)
+{
+   switch (intrinsic) {
+   case nir_intrinsic_image_deref_atomic_add:
+   case nir_intrinsic_image_deref_atomic_min:
+   case nir_intrinsic_image_deref_atomic_max:
+   case nir_intrinsic_image_deref_atomic_and:
+   case nir_intrinsic_image_deref_atomic_or:
+   case nir_intrinsic_image_deref_atomic_xor:
+   case nir_intrinsic_image_deref_atomic_exchange:
+   case nir_intrinsic_image_deref_atomic_comp_swap:
+   return true;
+   default:
+   break;
+   }
+
+   return false;
+}
+
+static bool
+intrinsic_is_image_store_or_atomic(unsigned intrinsic)
+{
+   if (intrinsic == nir_intrinsic_image_deref_store)
+   return true;
+   else
+   return intrinsic_is_image_atomic(intrinsic);
+}
+
+/*
+ * FIXME: shamelessly copied from ir3_compiler_nir until it gets factorized
+ * out at some point.
+ */

[Freedreno] [RFC 1/4] nir: Add a new intrinsic 'load_image_stride'

2019-01-25 Thread Eduardo Lima Mitev
This is an internal intrinsic intended to be injected by a
(freedreno-specific) 'lower_sampler_io' pass that will be introduced
later in this series; and consumed by ir3_compiler_nir.

The intrinsic will load in SSA values for various constants
for images (image_dims), namely the format's bytes-per-pixel,
y-stride and z-stride; for which the backend compiler will emit
the corresponding uniforms.

const_index[0] is the image index, and const_index[1] is the index
into image_dims' register file for a given image: 0 for bpp, 1 to
y-stride and 2 for z-stride.
---
 src/compiler/nir/nir_intrinsics.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/compiler/nir/nir_intrinsics.py 
b/src/compiler/nir/nir_intrinsics.py
index a5cc3f7401c..169c835d746 100644
--- a/src/compiler/nir/nir_intrinsics.py
+++ b/src/compiler/nir/nir_intrinsics.py
@@ -590,6 +590,8 @@ load("shared", 1, [BASE, ALIGN_MUL, ALIGN_OFFSET], 
[CAN_ELIMINATE])
 load("push_constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER])
 # src[] = { offset }. const_index[] = { base, range }
 load("constant", 1, [BASE, RANGE], [CAN_ELIMINATE, CAN_REORDER])
+# src[] = { offset }. const_index[] = { image_idx, dim_idx }
+load("image_stride", 1, [], [CAN_REORDER])
 
 # Stores work the same way as loads, except now the first source is the value
 # to store and the second (and possibly third) source specify where to store
-- 
2.20.1

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH v3] ir3_compiler/nir: fix imageSize() for buffer-backed images

2018-10-23 Thread Eduardo Lima Mitev
GL_EXT_texture_buffer introduced texture buffers, which can be used
in shaders through a new type imageBuffer.

Because how image access is implemented in freedreno, calling
imageSize on an imageBuffer returns the size in bytes instead of texels,
which is incorrect.

This patch adds a division of imageSize result by the bytes-per-pixel
of the image format, when image is buffer-backed.

Fixes all tests under
dEQP-GLES31.functional.image_load_store.buffer.image_size.*

v2: Pre-compute and submit the log2 of the image format's bpp as shader
constant instead of emitting the LOG2 instruction in code. (Rob Clark)

v3: Use ffs (find-first-bit) helper for computing log2 (Ilia Mirkin)
---
 .../drivers/freedreno/ir3/ir3_compiler_nir.c  | 23 +++
 .../drivers/freedreno/ir3/ir3_shader.c| 10 
 2 files changed, 33 insertions(+)

diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
index 197196383b0..7a3c8a8579c 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
@@ -2035,6 +2035,29 @@ emit_intrinsic_image_size(struct ir3_context *ctx, 
nir_intrinsic_instr *intr,
 
split_dest(b, tmp, sam, 0, 4);
 
+   /* get_size instruction returns size in bytes instead of texels
+* for imageBuffer, so we need to divide it by the pixel size
+* of the image format.
+*
+* TODO: This is at least true on a5xx. Check other gens.
+*/
+   enum glsl_sampler_dim dim =
+   glsl_get_sampler_dim(glsl_without_array(var->type));
+   if (dim == GLSL_SAMPLER_DIM_BUF) {
+   /* Since all the possible values the divisor can take are
+* power-of-two (4, 8, or 16), the division is implemented
+* as a shift-right.
+* During shader setup, the log2 of the image format's
+* bytes-per-pixel should have been emitted in 2nd slot of
+* image_dims. See ir3_shader::emit_image_dims().
+*/
+   unsigned cb = regid(ctx->so->constbase.image_dims, 0) +
+   
ctx->so->const_layout.image_dims.off[var->data.driver_location];
+   struct ir3_instruction *aux = create_uniform(ctx, cb + 1);
+
+   tmp[0] = ir3_SHR_B(b, tmp[0], 0, aux, 0);
+   }
+
for (unsigned i = 0; i < ncoords; i++)
dst[i] = tmp[i];
 
diff --git a/src/gallium/drivers/freedreno/ir3/ir3_shader.c 
b/src/gallium/drivers/freedreno/ir3/ir3_shader.c
index 9bf0a7f999c..b3127ff8c38 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_shader.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_shader.c
@@ -699,6 +699,16 @@ emit_image_dims(struct fd_context *ctx, const struct 
ir3_shader_variant *v,
} else {
dims[off + 2] = rsc->slices[lvl].size0;
}
+   } else {
+   /* For buffer-backed images, the log2 of the 
format's
+* bytes-per-pixel is placed on the 2nd slot. 
This is useful
+* when emitting image_size instructions, for 
which we need
+* to divide by bpp for image buffers. Since 
the bpp
+* can only be power-of-two, the division is 
implemented
+* as a SHR, and for that it is handy to have 
the log2 of
+* bpp as a constant. (log2 = first-set-bit - 1)
+*/
+   dims[off + 1] = ffs(dims[off + 0]) - 1;
}
}
 
-- 
2.19.1

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH] ir3_compiler/nir: fix imageSize() for buffer-backed images

2018-10-23 Thread Eduardo Lima Mitev
GL_EXT_texture_buffer introduced texture buffers, which can be used
in shaders through a new type imageBuffer.

Because how image access is implemented in freedreno, calling
imageSize on an imageBuffer returns the size in bytes instead of texels,
which is incorrect.

This patch adds a division of imageSize result by the bytes-per-pixel
of the image format, when image is buffer-backed.

Fixes all tests under
dEQP-GLES31.functional.image_load_store.buffer.image_size.*

v2: Pre-compute and submit the log2 of the image format's bpp as shader
constant instead of emitting the LOG2 instruction in code. (Rob Clark)
---
 .../drivers/freedreno/ir3/ir3_compiler_nir.c  | 23 +++
 .../drivers/freedreno/ir3/ir3_shader.c| 10 
 2 files changed, 33 insertions(+)

diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
index 197196383b0..7a3c8a8579c 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
@@ -2035,6 +2035,29 @@ emit_intrinsic_image_size(struct ir3_context *ctx, 
nir_intrinsic_instr *intr,
 
split_dest(b, tmp, sam, 0, 4);
 
+   /* get_size instruction returns size in bytes instead of texels
+* for imageBuffer, so we need to divide it by the pixel size
+* of the image format.
+*
+* TODO: This is at least true on a5xx. Check other gens.
+*/
+   enum glsl_sampler_dim dim =
+   glsl_get_sampler_dim(glsl_without_array(var->type));
+   if (dim == GLSL_SAMPLER_DIM_BUF) {
+   /* Since all the possible values the divisor can take are
+* power-of-two (4, 8, or 16), the division is implemented
+* as a shift-right.
+* During shader setup, the log2 of the image format's
+* bytes-per-pixel should have been emitted in 2nd slot of
+* image_dims. See ir3_shader::emit_image_dims().
+*/
+   unsigned cb = regid(ctx->so->constbase.image_dims, 0) +
+   
ctx->so->const_layout.image_dims.off[var->data.driver_location];
+   struct ir3_instruction *aux = create_uniform(ctx, cb + 1);
+
+   tmp[0] = ir3_SHR_B(b, tmp[0], 0, aux, 0);
+   }
+
for (unsigned i = 0; i < ncoords; i++)
dst[i] = tmp[i];
 
diff --git a/src/gallium/drivers/freedreno/ir3/ir3_shader.c 
b/src/gallium/drivers/freedreno/ir3/ir3_shader.c
index 9bf0a7f999c..de59e5888c6 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_shader.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_shader.c
@@ -699,6 +699,16 @@ emit_image_dims(struct fd_context *ctx, const struct 
ir3_shader_variant *v,
} else {
dims[off + 2] = rsc->slices[lvl].size0;
}
+   } else {
+   /* For buffer-backed images, the log2 of the 
format's
+* bytes-per-pixel is placed on the 2nd slot. 
This is useful
+* when emitting image_size instructions, for 
which we need
+* to divide by bpp for image buffers. Since 
the bpp
+* can only be power-of-two, the division is 
implemented
+* as a SHR, and for that it is handy to have 
the log2 of
+* bpp as a constant.
+*/
+   dims[off + 1] = log (dims[off + 0]) / log (2);
}
}
 
-- 
2.19.1

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno


[Freedreno] [PATCH] ir3/nir: Set up image_dims consts for image_deref_size intrinsic too

2018-10-21 Thread Eduardo Lima Mitev
`nir_intrinsic_image_deref_size` is not being considered during scan for
driver constants, so image constants are not emitted if a shader
only ever query the size of an image (no load, store, atomic op, etc).
This is unlikely, but possible.
---
 src/gallium/drivers/freedreno/ir3/ir3_nir.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/freedreno/ir3/ir3_nir.c 
b/src/gallium/drivers/freedreno/ir3/ir3_nir.c
index d934acc7427..63866ae4d01 100644
--- a/src/gallium/drivers/freedreno/ir3/ir3_nir.c
+++ b/src/gallium/drivers/freedreno/ir3/ir3_nir.c
@@ -254,6 +254,7 @@ ir3_nir_scan_driver_consts(nir_shader *shader,
case nir_intrinsic_image_deref_atomic_exchange:
case nir_intrinsic_image_deref_atomic_comp_swap:
case nir_intrinsic_image_deref_store:
+   case nir_intrinsic_image_deref_size:
idx = nir_intrinsic_get_var(intr, 
0)->data.driver_location;
if (layout->image_dims.mask & (1 << 
idx))
break;
-- 
2.19.1

___
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno