Re: [Mesa-dev] [PATCH 14/22] intel/compiler: Do image load/store lowering to NIR

2018-08-29 Thread Kenneth Graunke
On Wednesday, August 29, 2018 5:31:38 AM PDT Jason Ekstrand wrote:
> On Wed, Aug 29, 2018 at 12:49 AM Kenneth Graunke 
> wrote:
> 
> > On Friday, August 17, 2018 1:06:20 PM PDT Jason Ekstrand wrote:
> > [snip]
> > > +# Intel-specific query for loading from the brw_image_param struct
> > passed
> > > +# into the shader as a uniform.  The variable is a deref to the image
> > > +# variable. The const index specifies which of the six parameters to
> > load.
> >
> > This might be our first driver-specific intrinsics.  Some people make
> > big extensibility systems for that, where drivers can extend with their
> > own concepts.  But given that we all live in the same project, I think
> > this makes a lot of sense - just have everybody add their own here,
> > suffixed with a name they own (i.e. intel, amd, radv, ir3, whatever).
> >
> 
> We almost added an ir3 intrinsic some time ago.  I don't remember what it
> was or why it didn't happen.
> 
> 
> > It's certainly nice and simple.
> >
> > > +intrinsic("image_deref_load_param_intel", src_comp=[1], dest_comp=0,
> > > +  indices=[BASE], flags=[CAN_ELIMINATE, CAN_REORDER])
> > > +intrinsic("image_deref_load_raw_intel", src_comp=[1, 1], dest_comp=0,
> > > +  flags=[CAN_ELIMINATE, CAN_REORDER])
> > > +intrinsic("image_deref_store_raw_intel", src_comp=[1, 1, 0])
> >
> > I don't think you want CAN_REORDER for the new load intrinsic...at
> > least, image_deref_load only has CAN_ELIMINATE.  It probably makes
> > sense for the two to match...
> >
> 
> Right... Thanks!
> 
> 
> > [snip]
> > > +static nir_ssa_def *
> > > +image_address(nir_builder *b, const struct gen_device_info *devinfo,
> > > +  nir_deref_instr *deref, nir_ssa_def *coord)
> > > +{
> > > +   coord = sanitize_image_coord(b, deref, coord);
> > > +
> > > +   nir_ssa_def *offset = load_image_param(b, deref, OFFSET);
> > > +   nir_ssa_def *tiling = load_image_param(b, deref, TILING);
> > > +   nir_ssa_def *stride = load_image_param(b, deref, STRIDE);
> > > +
> > > +   /* Shift the coordinates by the fixed surface offset.  It may be
> > non-zero
> > > +* if the image is a single slice of a higher-dimensional surface,
> > or if a
> > > +* non-zero mipmap level of the surface is bound to the pipeline.
> > The
> > > +* offset needs to be applied here rather than at surface state
> > set-up time
> > > +* because the desired slice-level may start mid-tile, so simply
> > shifting
> > > +* the surface base address wouldn't give a well-formed tiled
> > surface in
> > > +* the general case.
> > > +*/
> > > +   nir_ssa_def *xypos = (coord->num_components == 1) ?
> > > +nir_vec2(b, coord, nir_imm_int(b, 0)) :
> > > +nir_channels(b, coord, 0x3);
> > > +   xypos = nir_iadd(b, xypos, offset);
> > > +
> > > +   /* The layout of 3-D textures in memory is sort-of like a tiling
> > > +* format.  At each miplevel, the slices are arranged in rows of
> > > +* 2^level slices per row.  The slice row is stored in tmp.y and
> > > +* the slice within the row is stored in tmp.x.
> > > +*
> > > +* The layout of 2-D array textures and cubemaps is much simpler:
> > > +* Depending on whether the ARYSPC_LOD0 layout is in use it will be
> > > +* stored in memory as an array of slices, each one being a 2-D
> > > +* arrangement of miplevels, or as a 2D arrangement of miplevels,
> > > +* each one being an array of slices.  In either case the separation
> > > +* between slices of the same LOD is equal to the qpitch value
> > > +* provided as stride.w.
> > > +*
> > > +* This code can be made to handle either 2D arrays and 3D textures
> > > +* by passing in the miplevel as tile.z for 3-D textures and 0 in
> > > +* tile.z for 2-D array textures.
> > > +*
> > > +* See Volume 1 Part 1 of the Gen7 PRM, sections 6.18.4.7 "Surface
> > > +* Arrays" and 6.18.6 "3D Surfaces" for a more extensive discussion
> > > +* of the hardware 3D texture and 2D array layouts.
> > > +*/
> > > +   if (coord->num_components > 2) {
> > > +  /* Decompose z into a major (tmp.y) and a minor (tmp.x)
> > > +   * index.
> > > +   */
> > > +  nir_ssa_def *z = nir_channel(b, coord, 2);
> > > +  nir_ssa_def *z_x = nir_ubfe(b, z, nir_imm_int(b, 0),
> > > +  nir_channel(b, tiling, 2));
> > > +  nir_ssa_def *z_y = nir_ushr(b, z, nir_channel(b, tiling, 2));
> > > +
> > > +  /* Take into account the horizontal (tmp.x) and vertical (tmp.y)
> > > +   * slice offset.
> > > +   */
> > > +  xypos = nir_iadd(b, xypos, nir_imul(b, nir_vec2(b, z_x, z_y),
> > > + nir_channels(b, stride,
> > 0xc)));
> > > +   }
> > > +
> > > +   nir_ssa_def *addr;
> > > +   if (coord->num_components > 1) {
> > > +  /* Calculate the major/minor x and y indices.  In order to
> > > +   * accommodate both X and Y 

Re: [Mesa-dev] [PATCH 14/22] intel/compiler: Do image load/store lowering to NIR

2018-08-29 Thread Jason Ekstrand
On Wed, Aug 29, 2018 at 12:49 AM Kenneth Graunke 
wrote:

> On Friday, August 17, 2018 1:06:20 PM PDT Jason Ekstrand wrote:
> [snip]
> > +# Intel-specific query for loading from the brw_image_param struct
> passed
> > +# into the shader as a uniform.  The variable is a deref to the image
> > +# variable. The const index specifies which of the six parameters to
> load.
>
> This might be our first driver-specific intrinsics.  Some people make
> big extensibility systems for that, where drivers can extend with their
> own concepts.  But given that we all live in the same project, I think
> this makes a lot of sense - just have everybody add their own here,
> suffixed with a name they own (i.e. intel, amd, radv, ir3, whatever).
>

We almost added an ir3 intrinsic some time ago.  I don't remember what it
was or why it didn't happen.


> It's certainly nice and simple.
>
> > +intrinsic("image_deref_load_param_intel", src_comp=[1], dest_comp=0,
> > +  indices=[BASE], flags=[CAN_ELIMINATE, CAN_REORDER])
> > +intrinsic("image_deref_load_raw_intel", src_comp=[1, 1], dest_comp=0,
> > +  flags=[CAN_ELIMINATE, CAN_REORDER])
> > +intrinsic("image_deref_store_raw_intel", src_comp=[1, 1, 0])
>
> I don't think you want CAN_REORDER for the new load intrinsic...at
> least, image_deref_load only has CAN_ELIMINATE.  It probably makes
> sense for the two to match...
>

Right... Thanks!


> [snip]
> > +static nir_ssa_def *
> > +image_address(nir_builder *b, const struct gen_device_info *devinfo,
> > +  nir_deref_instr *deref, nir_ssa_def *coord)
> > +{
> > +   coord = sanitize_image_coord(b, deref, coord);
> > +
> > +   nir_ssa_def *offset = load_image_param(b, deref, OFFSET);
> > +   nir_ssa_def *tiling = load_image_param(b, deref, TILING);
> > +   nir_ssa_def *stride = load_image_param(b, deref, STRIDE);
> > +
> > +   /* Shift the coordinates by the fixed surface offset.  It may be
> non-zero
> > +* if the image is a single slice of a higher-dimensional surface,
> or if a
> > +* non-zero mipmap level of the surface is bound to the pipeline.
> The
> > +* offset needs to be applied here rather than at surface state
> set-up time
> > +* because the desired slice-level may start mid-tile, so simply
> shifting
> > +* the surface base address wouldn't give a well-formed tiled
> surface in
> > +* the general case.
> > +*/
> > +   nir_ssa_def *xypos = (coord->num_components == 1) ?
> > +nir_vec2(b, coord, nir_imm_int(b, 0)) :
> > +nir_channels(b, coord, 0x3);
> > +   xypos = nir_iadd(b, xypos, offset);
> > +
> > +   /* The layout of 3-D textures in memory is sort-of like a tiling
> > +* format.  At each miplevel, the slices are arranged in rows of
> > +* 2^level slices per row.  The slice row is stored in tmp.y and
> > +* the slice within the row is stored in tmp.x.
> > +*
> > +* The layout of 2-D array textures and cubemaps is much simpler:
> > +* Depending on whether the ARYSPC_LOD0 layout is in use it will be
> > +* stored in memory as an array of slices, each one being a 2-D
> > +* arrangement of miplevels, or as a 2D arrangement of miplevels,
> > +* each one being an array of slices.  In either case the separation
> > +* between slices of the same LOD is equal to the qpitch value
> > +* provided as stride.w.
> > +*
> > +* This code can be made to handle either 2D arrays and 3D textures
> > +* by passing in the miplevel as tile.z for 3-D textures and 0 in
> > +* tile.z for 2-D array textures.
> > +*
> > +* See Volume 1 Part 1 of the Gen7 PRM, sections 6.18.4.7 "Surface
> > +* Arrays" and 6.18.6 "3D Surfaces" for a more extensive discussion
> > +* of the hardware 3D texture and 2D array layouts.
> > +*/
> > +   if (coord->num_components > 2) {
> > +  /* Decompose z into a major (tmp.y) and a minor (tmp.x)
> > +   * index.
> > +   */
> > +  nir_ssa_def *z = nir_channel(b, coord, 2);
> > +  nir_ssa_def *z_x = nir_ubfe(b, z, nir_imm_int(b, 0),
> > +  nir_channel(b, tiling, 2));
> > +  nir_ssa_def *z_y = nir_ushr(b, z, nir_channel(b, tiling, 2));
> > +
> > +  /* Take into account the horizontal (tmp.x) and vertical (tmp.y)
> > +   * slice offset.
> > +   */
> > +  xypos = nir_iadd(b, xypos, nir_imul(b, nir_vec2(b, z_x, z_y),
> > + nir_channels(b, stride,
> 0xc)));
> > +   }
> > +
> > +   nir_ssa_def *addr;
> > +   if (coord->num_components > 1) {
> > +  /* Calculate the major/minor x and y indices.  In order to
> > +   * accommodate both X and Y tiling, the Y-major tiling format is
> > +   * treated as being a bunch of narrow X-tiles placed next to each
> > +   * other.  This means that the tile width for Y-tiling is actually
> > +   * the width of one sub-column of the Y-major tile where each 4K
> > +   * tile has 8 

Re: [Mesa-dev] [PATCH 14/22] intel/compiler: Do image load/store lowering to NIR

2018-08-28 Thread Kenneth Graunke
On Friday, August 17, 2018 1:06:20 PM PDT Jason Ekstrand wrote:
[snip]
> +# Intel-specific query for loading from the brw_image_param struct passed
> +# into the shader as a uniform.  The variable is a deref to the image
> +# variable. The const index specifies which of the six parameters to load.

This might be our first driver-specific intrinsics.  Some people make
big extensibility systems for that, where drivers can extend with their
own concepts.  But given that we all live in the same project, I think
this makes a lot of sense - just have everybody add their own here,
suffixed with a name they own (i.e. intel, amd, radv, ir3, whatever).

It's certainly nice and simple.

> +intrinsic("image_deref_load_param_intel", src_comp=[1], dest_comp=0,
> +  indices=[BASE], flags=[CAN_ELIMINATE, CAN_REORDER])
> +intrinsic("image_deref_load_raw_intel", src_comp=[1, 1], dest_comp=0,
> +  flags=[CAN_ELIMINATE, CAN_REORDER])
> +intrinsic("image_deref_store_raw_intel", src_comp=[1, 1, 0])

I don't think you want CAN_REORDER for the new load intrinsic...at
least, image_deref_load only has CAN_ELIMINATE.  It probably makes
sense for the two to match...

[snip]
> +static nir_ssa_def *
> +image_address(nir_builder *b, const struct gen_device_info *devinfo,
> +  nir_deref_instr *deref, nir_ssa_def *coord)
> +{
> +   coord = sanitize_image_coord(b, deref, coord);
> +
> +   nir_ssa_def *offset = load_image_param(b, deref, OFFSET);
> +   nir_ssa_def *tiling = load_image_param(b, deref, TILING);
> +   nir_ssa_def *stride = load_image_param(b, deref, STRIDE);
> +
> +   /* Shift the coordinates by the fixed surface offset.  It may be non-zero
> +* if the image is a single slice of a higher-dimensional surface, or if a
> +* non-zero mipmap level of the surface is bound to the pipeline.  The
> +* offset needs to be applied here rather than at surface state set-up 
> time
> +* because the desired slice-level may start mid-tile, so simply shifting
> +* the surface base address wouldn't give a well-formed tiled surface in
> +* the general case.
> +*/
> +   nir_ssa_def *xypos = (coord->num_components == 1) ?
> +nir_vec2(b, coord, nir_imm_int(b, 0)) :
> +nir_channels(b, coord, 0x3);
> +   xypos = nir_iadd(b, xypos, offset);
> +
> +   /* The layout of 3-D textures in memory is sort-of like a tiling
> +* format.  At each miplevel, the slices are arranged in rows of
> +* 2^level slices per row.  The slice row is stored in tmp.y and
> +* the slice within the row is stored in tmp.x.
> +*
> +* The layout of 2-D array textures and cubemaps is much simpler:
> +* Depending on whether the ARYSPC_LOD0 layout is in use it will be
> +* stored in memory as an array of slices, each one being a 2-D
> +* arrangement of miplevels, or as a 2D arrangement of miplevels,
> +* each one being an array of slices.  In either case the separation
> +* between slices of the same LOD is equal to the qpitch value
> +* provided as stride.w.
> +*
> +* This code can be made to handle either 2D arrays and 3D textures
> +* by passing in the miplevel as tile.z for 3-D textures and 0 in
> +* tile.z for 2-D array textures.
> +*
> +* See Volume 1 Part 1 of the Gen7 PRM, sections 6.18.4.7 "Surface
> +* Arrays" and 6.18.6 "3D Surfaces" for a more extensive discussion
> +* of the hardware 3D texture and 2D array layouts.
> +*/
> +   if (coord->num_components > 2) {
> +  /* Decompose z into a major (tmp.y) and a minor (tmp.x)
> +   * index.
> +   */
> +  nir_ssa_def *z = nir_channel(b, coord, 2);
> +  nir_ssa_def *z_x = nir_ubfe(b, z, nir_imm_int(b, 0),
> +  nir_channel(b, tiling, 2));
> +  nir_ssa_def *z_y = nir_ushr(b, z, nir_channel(b, tiling, 2));
> +
> +  /* Take into account the horizontal (tmp.x) and vertical (tmp.y)
> +   * slice offset.
> +   */
> +  xypos = nir_iadd(b, xypos, nir_imul(b, nir_vec2(b, z_x, z_y),
> + nir_channels(b, stride, 0xc)));
> +   }
> +
> +   nir_ssa_def *addr;
> +   if (coord->num_components > 1) {
> +  /* Calculate the major/minor x and y indices.  In order to
> +   * accommodate both X and Y tiling, the Y-major tiling format is
> +   * treated as being a bunch of narrow X-tiles placed next to each
> +   * other.  This means that the tile width for Y-tiling is actually
> +   * the width of one sub-column of the Y-major tile where each 4K
> +   * tile has 8 512B sub-columns.
> +   *
> +   * The major Y value is the row of tiles in which the pixel lives.
> +   * The major X value is the tile sub-column in which the pixel
> +   * lives; for X tiling, this is the same as the tile column, for Y
> +   * tiling, each tile has 8 sub-columns.  The minor X and Y indices
> +   * are the position within the 

[Mesa-dev] [PATCH 14/22] intel/compiler: Do image load/store lowering to NIR

2018-08-17 Thread Jason Ekstrand
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end.  This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down.  In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end.  On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.

Shader-db results on Kaby Lake:

total instructions in shared programs: 15166910 -> 15166872 (<.01%)
instructions in affected programs: 5895 -> 5857 (-0.64%)
helped: 15
HURT: 0

Clearly, we don't have that much image_load_store happening in the
shaders in shader-db
---
 src/compiler/nir/nir_intrinsics.py|9 +
 src/intel/Makefile.sources|1 +
 src/intel/compiler/brw_fs_nir.cpp |  128 +-
 src/intel/compiler/brw_fs_surface_builder.cpp | 1030 -
 src/intel/compiler/brw_fs_surface_builder.h   |   20 -
 src/intel/compiler/brw_nir.h  |3 +
 .../compiler/brw_nir_lower_image_load_store.c |  824 +
 src/intel/compiler/meson.build|1 +
 src/intel/vulkan/anv_pipeline.c   |2 +
 src/mesa/drivers/dri/i965/brw_program.c   |2 +
 10 files changed, 899 insertions(+), 1121 deletions(-)
 create mode 100644 src/intel/compiler/brw_nir_lower_image_load_store.c

diff --git a/src/compiler/nir/nir_intrinsics.py 
b/src/compiler/nir/nir_intrinsics.py
index 63c602c8874..45872e00c55 100644
--- a/src/compiler/nir/nir_intrinsics.py
+++ b/src/compiler/nir/nir_intrinsics.py
@@ -310,6 +310,15 @@ intrinsic("image_deref_atomic_comp_swap", src_comp=[1, 4, 
1, 1, 1], dest_comp=1)
 intrinsic("image_deref_size",src_comp=[1], dest_comp=0, 
flags=[CAN_ELIMINATE, CAN_REORDER])
 intrinsic("image_deref_samples", src_comp=[1], dest_comp=1, 
flags=[CAN_ELIMINATE, CAN_REORDER])
 
+# Intel-specific query for loading from the brw_image_param struct passed
+# into the shader as a uniform.  The variable is a deref to the image
+# variable. The const index specifies which of the six parameters to load.
+intrinsic("image_deref_load_param_intel", src_comp=[1], dest_comp=0,
+  indices=[BASE], flags=[CAN_ELIMINATE, CAN_REORDER])
+intrinsic("image_deref_load_raw_intel", src_comp=[1, 1], dest_comp=0,
+  flags=[CAN_ELIMINATE, CAN_REORDER])
+intrinsic("image_deref_store_raw_intel", src_comp=[1, 1, 0])
+
 # Vulkan descriptor set intrinsics
 #
 # The Vulkan API uses a different binding model from GL.  In the Vulkan
diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
index 5f6cd96825b..d10c4511734 100644
--- a/src/intel/Makefile.sources
+++ b/src/intel/Makefile.sources
@@ -84,6 +84,7 @@ COMPILER_FILES = \
compiler/brw_nir_analyze_ubo_ranges.c \
compiler/brw_nir_attribute_workarounds.c \
compiler/brw_nir_lower_cs_intrinsics.c \
+   compiler/brw_nir_lower_image_load_store.c \
compiler/brw_nir_opt_peephole_ffma.c \
compiler/brw_nir_tcs_workarounds.c \
compiler/brw_packed_float.c \
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 6e9a5829d3b..021a31d069c 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -3871,58 +3871,89 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
case nir_intrinsic_image_deref_atomic_xor:
case nir_intrinsic_image_deref_atomic_exchange:
case nir_intrinsic_image_deref_atomic_comp_swap: {
-  using namespace image_access;
-
   if (stage == MESA_SHADER_FRAGMENT &&
   instr->intrinsic != nir_intrinsic_image_deref_load)
  brw_wm_prog_data(prog_data)->has_side_effects = true;
 
   /* Get the referenced image variable and type. */
   nir_deref_instr *deref = nir_src_as_deref(instr->src[0]);
-  const nir_variable *var = nir_deref_instr_get_variable(deref);
-  const glsl_type *type = var->type->without_array();
-  const brw_reg_type base_type = get_image_base_type(type);
+  const glsl_type *type = deref->type;
 
   /* Get some metadata from the image intrinsic. */
   const nir_intrinsic_info *info = _intrinsic_infos[instr->intrinsic];
-  const unsigned arr_dims = type->sampler_array ? 1 : 0;
-  const unsigned surf_dims = type->coordinate_components() - arr_dims;
-  const unsigned format = var->data.image.format;
+  const unsigned dims = type->coordinate_components();
   const unsigned dest_components = nir_intrinsic_dest_components(instr);
 
   /* Get the arguments of the image intrinsic. */
   const fs_reg image = get_nir_image_deref(deref);
-  const fs_reg addr = retype(get_nir_src(instr->src[1]),
- BRW_REGISTER_TYPE_UD);
+  const fs_reg coords = retype(get_nir_src(instr->src[1]),