[Mesa-dev] [Bug 102891] [radv] glitches on rpcs3 emulator (green zones)

2017-11-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102891

Dave Airlie  changed:

   What|Removed |Added

 Status|NEW |NEEDINFO

--- Comment #8 from Dave Airlie  ---
jdruel can you try the env var on the actual game?

The problem is the damage is capture in the trace, so Samuel's check didn't
help.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] etnaviv: Add sampler TS support

2017-11-09 Thread Wladimir J. van der Laan
Hello Lucas,

On Thu, Nov 09, 2017 at 06:15:51PM +0100, Lucas Stach wrote:
> Hi Wladimir!

> > etna_resource_needs_flush is only called from two places - here, and
> > in resource_flush, where it also determines whether to do a
> > resolve-to-self, but before presenting the image. There it also only
> > makes sense to do if the resource has at least a valid TS.
> 
> Yes, this makes sense.
> 
> Also I've just tested this and I've seen some intermittent missing
> shadow tiles in the glmark2 shadow demo. Probably you are now missing
> the TS cache flush we would normally do before blitting the shadow
> image with the RS.

Thanks for testing!

I'll see if I can reproduce it. 

Good point about the TS, I expect that needs to be flushed before the texture
can be rendered from, so that sampler TS sees a consistent TS. 

Regards,
Wladimir
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: don't crash when creating a huge image

2017-11-09 Thread Samuel Iglesias Gonsálvez
On Thu, 2017-11-09 at 16:34 -0800, Jason Ekstrand wrote:
> On Thu, Nov 9, 2017 at 4:23 PM, Chad Versace  g>
> wrote:
> 
> > On Wed 08 Nov 2017, Jason Ekstrand wrote:
> > > On Wed, Nov 8, 2017 at 1:34 AM, Samuel Iglesias Gonsálvez <[1]
> > > sigles...@igalia.com> wrote:
> > > 
> > > The HW has some limits but, according to the spec, we can
> > > create
> > > the image as it has not yet any memory backing it. When we
> > > allocate
> > > that memory, then we fail following what Vulkan spec, "10.2.
> > > Device
> > > Memory" says when talking about vkAllocateMemory():
> > > 
> > > "Some platforms may have a limit on the maximum size of a
> > > single
> > >  allocation. For example, certain systems may fail to create
> > >  allocations with a size greater than or equal to 4GB. Such a
> > > limit
> > 
> > is
> > >  implementation-dependent, and if such a failure occurs then
> > > the
> > 
> > error
> > >  VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned."
> > > 
> > > Fixes the crashes on BDW for the following tests:
> > > 
> > > dEQP-VK.pipeline.render_to_image.core.2d_array.huge.*
> > > dEQP-VK.pipeline.render_to_image.core.cube_array.huge.*
> > > 
> > > Signed-off-by: Samuel Iglesias Gonsálvez <[2]siglesias@igalia
> > > .com>
> > > ---
> > > 
> > > Jason, I was tempted to move this piece of code to
> > 
> > anv_AllocateMemory()
> > > but then I found the kernel relocation limitation of 32-
> > > bit... Is
> > 
> > that
> > > limitation still applicable? Or was it from the BDW age and
> > > we forgot
> > > to update that limitation for gen9+?
> > > 
> > > 
> > > We're still using relocations on all hardware so it applies to
> > > everything
> > > today.  One of my 2018 projects is to fix that and get rid of
> > 
> > relocations on
> > > gen8+.
> > > 
> > > 
> > > Sam
> > > 
> > >  src/intel/isl/isl.c | 22 --
> > >  1 file changed, 22 deletions(-)
> > > 
> > > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > > index 59f512fc050..aaadcbaf991 100644
> > > --- a/src/intel/isl/isl.c
> > > +++ b/src/intel/isl/isl.c
> > > @@ -1472,28 +1472,6 @@ isl_surf_init_s(const struct
> > > isl_device *dev,
> > >base_alignment = MAX(info->min_alignment, tile_size);
> > > }
> > > 
> > > -   if (ISL_DEV_GEN(dev) < 9) {
> > > -  /* From the Broadwell PRM Vol 5, Surface Layout:
> > > -   *
> > > -   *"In addition to restrictions on maximum height,
> > > width,
> > 
> > and
> > > depth,
> > > -   * surfaces are also restricted to a maximum size
> > > in
> > 
> > bytes. This
> > > -   * maximum is 2 GB for all products and all
> > > surface
> > 
> > types."
> > > -   *
> > > -   * This comment is applicable to all Pre-gen9
> > > platforms.
> > > -   */
> > > -  if (size > (uint64_t) 1 << 31)
> > > - return false;
> > > -   } else {
> > > -  /* From the Skylake PRM Vol 5, Maximum Surface Size in
> > > Bytes:
> > > -   *"In addition to restrictions on maximum height,
> > > width,
> > 
> > and
> > > depth,
> > > -   * surfaces are also restricted to a maximum size
> > > of 2^38
> > 
> > bytes.
> > > -   * All pixels within the surface must be contained
> > > within
> > 
> > 2^38
> > > bytes
> > > -   * of the base address."
> > > -   */
> > > -  if (size > (uint64_t) 1 << 38)
> > > - return false;
> > > -   }
> > 
> > I think it very unwise to delete code that enforces requirements
> > defined
> > by the hardware spec. Deleting the code doesn't make the hardware
> > requirements go away :)
> > 

The idea was to move that code to another place, hence my question out
of the commit log message :-)

> > > I'm not sure how I feel about removing this from ISL.  There are
> > > really
> > 
> > two
> > > limitations going on here.  One is a limitation imposed by
> > > relocations,
> > 
> > and the
> > > other is some sort of fundamental hardware surface size
> > > limitation.  Most
> > > likely, the surface size limitation has to do with how many bits
> > > they
> > 
> > use for
> > > image address computations in the hardware.  Most likely, on
> > > gen8, they
> > 
> > do all
> > > of the internal calculations in 32 bits and only convert to 48 at
> > > the
> > 
> > end when
> > > they need to add it to Surface Base Address.
> > > 
> > > If my understanding is correct then we will still have this
> > > limitation
> > 
> > on gen8
> > > even after we get rid of relocations and remove the BO size
> > > limitation.
> > 
> > I see
> > > a couple of options, neither of which I like very much:
> > > 
> > >  1) Take something like this patch and then keep the BO size
> > > limitation
> > 
> > on BDW
> > > to 2GiB when we get rid of relocations even though it's
> > > 

[Mesa-dev] [Bug 103658] addrlib/gfx9/gfx9addrlib.cpp:727:50: error: expected expression

2017-11-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103658

Bug ID: 103658
   Summary: addrlib/gfx9/gfx9addrlib.cpp:727:50: error: expected
expression
   Product: Mesa
   Version: git
  Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: v...@freedesktop.org
QA Contact: mesa-dev@lists.freedesktop.org

mesa: cd6f79a71d75d5d756176a03f04c4442c0ef9e7f (master 17.4.0-devel)

clang build error

  CXX  addrlib/gfx9/addrlib_libamdgpu_addrlib_la-gfx9addrlib.lo
addrlib/gfx9/gfx9addrlib.cpp:727:50: error: expected expression
const CoordEq* pMetaEq = GetMetaEquation({0, fmaskElementBytesLog2, 0,
pIn->cMaskFlags,
 ^

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/17] main, glsl: Add UniformDataDefaults which stores uniform defaults

2017-11-09 Thread Timothy Arceri

On 09/11/17 17:42, Jordan Justen wrote:

The ARB_get_program_binary extension requires that uniform values in a
program be restored to their initial value just after linking.

This patch saves off the initial values just after linking. When the
program is restored by glProgramBinary, we can use this to copy the
initial value of uniforms into UniformDataSlots.

Signed-off-by: Jordan Justen 
---
  src/compiler/glsl/link_uniform_initializers.cpp |  2 ++
  src/compiler/glsl/link_uniforms.cpp |  3 +++
  src/compiler/glsl/serialize.cpp | 18 --
  src/mesa/main/mtypes.h  |  1 +
  4 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/compiler/glsl/link_uniform_initializers.cpp 
b/src/compiler/glsl/link_uniform_initializers.cpp
index f70d9100e12..2395f5cf695 100644
--- a/src/compiler/glsl/link_uniform_initializers.cpp
+++ b/src/compiler/glsl/link_uniform_initializers.cpp
@@ -354,5 +354,7 @@ link_set_uniform_initializers(struct gl_shader_program 
*prog,
}
 }
  
+   memcpy(prog->data->UniformDataDefaults, prog->data->UniformDataSlots,

+  sizeof(union gl_constant_value) * prog->data->NumUniformDataSlots);
 ralloc_free(mem_ctx);
  }
diff --git a/src/compiler/glsl/link_uniforms.cpp 
b/src/compiler/glsl/link_uniforms.cpp
index 7d141549f55..51e02bcf840 100644
--- a/src/compiler/glsl/link_uniforms.cpp
+++ b/src/compiler/glsl/link_uniforms.cpp
@@ -1338,6 +1338,9 @@ link_assign_uniform_storage(struct gl_context *ctx,
   
prog->data->NumUniformStorage);
data = rzalloc_array(prog->data->UniformStorage,
 union gl_constant_value, num_data_slots);
+  prog->data->UniformDataDefaults =
+ rzalloc_array(prog->data->UniformStorage,
+   union gl_constant_value, num_data_slots);
 } else {
data = prog->data->UniformDataSlots;
 }
diff --git a/src/compiler/glsl/serialize.cpp b/src/compiler/glsl/serialize.cpp
index b4c9545702e..e55f1680ffc 100644
--- a/src/compiler/glsl/serialize.cpp
+++ b/src/compiler/glsl/serialize.cpp
@@ -449,7 +449,12 @@ write_uniforms(struct blob *metadata, struct 
gl_shader_program *prog)
   unsigned vec_size =
  prog->data->UniformStorage[i].type->component_slots() *
  MAX2(prog->data->UniformStorage[i].array_elements, 1);
- blob_write_bytes(metadata, prog->data->UniformStorage[i].storage,
+ unsigned slot =
+prog->data->UniformStorage[i].storage -
+prog->data->UniformDataSlots;
+ blob_write_bytes(metadata, >data->UniformDataSlots[slot],
+  sizeof(union gl_constant_value) * vec_size);
+ blob_write_bytes(metadata, >data->UniformDataDefaults[slot],
sizeof(union gl_constant_value) * vec_size);
}
 }
@@ -472,6 +477,9 @@ read_uniforms(struct blob_reader *metadata, struct 
gl_shader_program *prog)
 data = rzalloc_array(uniforms, union gl_constant_value,
  prog->data->NumUniformDataSlots);
 prog->data->UniformDataSlots = data;
+   prog->data->UniformDataDefaults =
+  rzalloc_array(uniforms, union gl_constant_value,
+prog->data->NumUniformDataSlots);
  
 prog->UniformHash = new string_to_uint_map;
  
@@ -512,8 +520,14 @@ read_uniforms(struct blob_reader *metadata, struct gl_shader_program *prog)

   unsigned vec_size =
  prog->data->UniformStorage[i].type->component_slots() *
  MAX2(prog->data->UniformStorage[i].array_elements, 1);
+ unsigned slot =
+prog->data->UniformStorage[i].storage -
+prog->data->UniformDataSlots;
+ blob_copy_bytes(metadata,
+ (uint8_t *) >data->UniformDataSlots[slot],
+ sizeof(union gl_constant_value) * vec_size);
   blob_copy_bytes(metadata,
- (uint8_t *) prog->data->UniformStorage[i].storage,
+ (uint8_t *) >data->UniformDataDefaults[slot],
   sizeof(union gl_constant_value) * vec_size);
  
  assert(vec_size + prog->data->UniformStorage[i].storage <=

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 2acf64eb56d..023692cc0e1 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2871,6 +2871,7 @@ struct gl_shader_program_data
 /* Shader cache variables used during restore */
 unsigned NumUniformDataSlots;
 union gl_constant_value *UniformDataSlots;
+   union gl_constant_value *UniformDataDefaults;


It really sucks that we need to carry this around in memory just for 
this extension.


Can we separate this from the shader cache vars and add a comment. 
Something like:


/* Used to hold initial uniform values for program binary restores.
 *
 * From the ARB_get_program_binary spec:
 *
 *  

Re: [Mesa-dev] [PATCH 09/17] i965: Fix memory leak when serializing nir

2017-11-09 Thread Timothy Arceri

6-9:

Reviewed-by: Timothy Arceri 

On 09/11/17 17:42, Jordan Justen wrote:

Signed-off-by: Jordan Justen 
---
  src/mesa/drivers/dri/i965/brw_program.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index 798b7d24dd6..f795fc1dbc3 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -791,6 +791,7 @@ brw_program_serialize_nir(struct gl_context *ctx, struct 
gl_program *prog,
 prog->driver_cache_blob = ralloc_size(NULL, writer.size);
 memcpy(prog->driver_cache_blob, writer.data, writer.size);
 prog->driver_cache_blob_size = writer.size;
+   blob_finish();
  }
  
  void



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/17] glsl: Split out shader program serialization

2017-11-09 Thread Timothy Arceri

Reviewed-by: Timothy Arceri 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600: don't emit atomic save if we have no atomic counters.

2017-11-09 Thread Dave Airlie
From: Dave Airlie 

Otherwise we end up emitting the fence.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/evergreen_state.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index c1d13fd..30819ae 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -4634,6 +4634,9 @@ void evergreen_emit_atomic_buffer_save(struct 
r600_context *rctx,
unsigned reloc;
 
mask = *atomic_used_mask_p;
+   if (!mask)
+   return;
+
while (mask) {
unsigned atomic_index = u_bit_scan();
struct r600_shader_atomic *atomic = 
_atomics[atomic_index];
-- 
2.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/17] util: Add Mesa ARB_get_program_binary helper functions

2017-11-09 Thread Jordan Justen
On 2017-11-09 08:07:57, Jose Fonseca wrote:
> On 09/11/17 13:19, Emil Velikov wrote:
> > Hi Jordan,
> > 
> > On 9 November 2017 at 06:42, Jordan Justen  
> > wrote:
> >> Signed-off-by: Jordan Justen 
> >> ---
> >>   src/util/Makefile.sources |   2 +
> >>   src/util/meson.build  |   2 +
> >>   src/util/program_binary.c | 322 
> >> ++
> >>   src/util/program_binary.h |  91 +
> >>   4 files changed, 417 insertions(+)
> >>   create mode 100644 src/util/program_binary.c
> >>   create mode 100644 src/util/program_binary.h
> >>
> > 
> >> +#include "zlib.h"
> >> +
> > Currently zlib is a dependency for !WIndows platforms.
> > With this patch we add it to the Windows builds.
> > 
> > Brian, Jose any ideas how we can get zlib on Windows?
> > 
> > Thanks
> > Emil
> > 
> 
> Thanks for the heads up Emil!
> 
> The most effective way to get zlib on Windows would be to bundle the 
> source code in mesa/src/zlib.
> 
> But it would be much simpler we did not add zlib as a required dependency.
> 
> Could we use some other checksum/hash?
> 
> Or instead of bundling the whole zlib, we could just bundle crc 
> implementation.  I'm sure there are many BDS/MIT licensed out there. 
> I've used some in apitrace -- 
> https://github.com/apitrace/apitrace/tree/master/thirdparty/crc32c

We have bundled crc32 already.

I compress the program, but it is optional, and compression is not
used if it actually doesn't produce a smaller result. I'll update the
series such that if zlib is not available, then we'll always produce
an uncompressed result.

-Jordan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] radeonsi: copy some nir gs info

2017-11-09 Thread Timothy Arceri
v2: copy input primitive
---
 src/gallium/drivers/radeonsi/si_shader_nir.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
b/src/gallium/drivers/radeonsi/si_shader_nir.c
index 5b68ff2a07..c7880b7f87 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -154,20 +154,27 @@ void si_nir_scan_shader(const struct nir_shader *nir,
assert(nir->info.stage == MESA_SHADER_VERTEX ||
   nir->info.stage == MESA_SHADER_FRAGMENT);
 
info->processor = pipe_shader_type_from_mesa(nir->info.stage);
info->num_tokens = 2; /* indicate that the shader is non-empty */
info->num_instructions = 2;
 
info->num_inputs = nir->num_inputs;
info->num_outputs = nir->num_outputs;
 
+   if (nir->info.stage == MESA_SHADER_GEOMETRY) {
+   info->properties[TGSI_PROPERTY_GS_INPUT_PRIM] = 
nir->info.gs.input_primitive;
+   info->properties[TGSI_PROPERTY_GS_OUTPUT_PRIM] = 
nir->info.gs.output_primitive;
+   info->properties[TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES] = 
nir->info.gs.vertices_out;
+   info->properties[TGSI_PROPERTY_GS_INVOCATIONS] = 
nir->info.gs.invocations;
+   }
+
i = 0;
nir_foreach_variable(variable, >inputs) {
unsigned semantic_name, semantic_index;
unsigned attrib_count = 
glsl_count_attribute_slots(variable->type,
   
nir->info.stage == MESA_SHADER_VERTEX);
 
assert(attrib_count == 1 && "not implemented");
 
/* Vertex shader inputs don't have semantics. The state
 * tracker has already mapped them to attributes via
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 20/20] radeonsi: enable gs support for nir backend

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_pipe.c   |  3 +-
 src/gallium/drivers/radeonsi/si_shader_nir.c | 43 
 src/mesa/state_tracker/st_glsl_to_nir.cpp|  3 +-
 3 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 391997db84..9548f10766 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -529,21 +529,21 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_CONSTANT_BUFFER_OFFSET_ALIGNMENT:
case PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT:
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
case PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS:
case PIPE_CAP_MAX_VERTEX_STREAMS:
case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT:
return 4;
 
case PIPE_CAP_GLSL_FEATURE_LEVEL:
if (sscreen->b.debug_flags & DBG(NIR))
-   return 140; /* no geometry and tessellation shaders yet 
*/
+   return 150; /* no tessellation shaders yet */
if (si_have_tgsi_compute(sscreen))
return 450;
return 420;
 
case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
return MIN2(sscreen->b.info.max_alloc_size, INT_MAX);
 
case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
@@ -733,20 +733,21 @@ static int si_get_shader_param(struct pipe_screen* 
pscreen,
return SI_NUM_SAMPLERS;
case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS:
return SI_NUM_SHADER_BUFFERS;
case PIPE_SHADER_CAP_MAX_SHADER_IMAGES:
return SI_NUM_IMAGES;
case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT:
return 32;
case PIPE_SHADER_CAP_PREFERRED_IR:
if (sscreen->b.debug_flags & DBG(NIR) &&
(shader == PIPE_SHADER_VERTEX ||
+shader == PIPE_SHADER_GEOMETRY ||
 shader == PIPE_SHADER_FRAGMENT))
return PIPE_SHADER_IR_NIR;
return PIPE_SHADER_IR_TGSI;
case PIPE_SHADER_CAP_LOWER_IF_THRESHOLD:
return 4;
 
/* Supported boolean features. */
case PIPE_SHADER_CAP_TGSI_CONT_SUPPORTED:
case PIPE_SHADER_CAP_TGSI_SQRT_SUPPORTED:
case PIPE_SHADER_CAP_INDIRECT_TEMP_ADDR:
diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
b/src/gallium/drivers/radeonsi/si_shader_nir.c
index 1933c8c770..ca67967083 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -145,20 +145,21 @@ static void scan_instruction(struct tgsi_shader_info 
*info,
}
 }
 
 void si_nir_scan_shader(const struct nir_shader *nir,
struct tgsi_shader_info *info)
 {
nir_function *func;
unsigned i;
 
assert(nir->info.stage == MESA_SHADER_VERTEX ||
+  nir->info.stage == MESA_SHADER_GEOMETRY ||
   nir->info.stage == MESA_SHADER_FRAGMENT);
 
info->processor = pipe_shader_type_from_mesa(nir->info.stage);
info->num_tokens = 2; /* indicate that the shader is non-empty */
info->num_instructions = 2;
 
info->num_inputs = nir->num_inputs;
info->num_outputs = nir->num_outputs;
 
if (nir->info.stage == MESA_SHADER_GEOMETRY) {
@@ -166,29 +167,30 @@ void si_nir_scan_shader(const struct nir_shader *nir,
info->properties[TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES] = 
nir->info.gs.vertices_out;
info->properties[TGSI_PROPERTY_GS_INVOCATIONS] = 
nir->info.gs.invocations;
}
 
i = 0;
nir_foreach_variable(variable, >inputs) {
unsigned semantic_name, semantic_index;
unsigned attrib_count = 
glsl_count_attribute_slots(variable->type,
   
nir->info.stage == MESA_SHADER_VERTEX);
 
-   assert(attrib_count == 1 && "not implemented");
-
/* Vertex shader inputs don't have semantics. The state
 * tracker has already mapped them to attributes via
 * variable->data.driver_location.
 */
if (nir->info.stage == MESA_SHADER_VERTEX)
continue;
 
+   assert(nir->info.stage != MESA_SHADER_FRAGMENT ||
+  (attrib_count == 1 && "not implemented"));
+
/* Fragment shader position is a system value. */
if (nir->info.stage == MESA_SHADER_FRAGMENT &&
variable->data.location == VARYING_SLOT_POS) {
if (variable->data.pixel_center_integer)


[Mesa-dev] [PATCH 17/20] ac: add si_nir_load_input_gs() to the abi

2017-11-09 Thread Timothy Arceri
---
 src/amd/common/ac_nir_to_llvm.c   | 24 -
 src/amd/common/ac_shader_abi.h|  7 ++
 src/gallium/drivers/radeonsi/si_shader.c  |  1 +
 src/gallium/drivers/radeonsi/si_shader_internal.h |  5 +
 src/gallium/drivers/radeonsi/si_shader_nir.c  | 26 +++
 5 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 158e954fa8..483dd52b36 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2854,32 +2854,31 @@ load_tes_input(struct nir_to_llvm_context *ctx,
buf_addr = LLVMBuildAdd(ctx->builder, buf_addr, comp_offset, "");
 
result = ac_build_buffer_load(>ac, ctx->hs_ring_tess_offchip, 
instr->num_components, NULL,
  buf_addr, ctx->oc_lds, is_compact ? (4 * 
const_index) : 0, 1, 0, true, false);
result = trim_vector(>ac, result, instr->num_components);
result = LLVMBuildBitCast(ctx->builder, result, get_def_type(ctx->nir, 
>dest.ssa), "");
return result;
 }
 
 static LLVMValueRef
-load_gs_input(struct nir_to_llvm_context *ctx,
- nir_intrinsic_instr *instr)
+load_gs_input(struct ac_shader_abi *abi,
+ nir_intrinsic_instr *instr,
+ unsigned vertex_index,
+ unsigned const_index)
 {
-   LLVMValueRef indir_index, vtx_offset;
-   unsigned const_index;
+   struct nir_to_llvm_context *ctx = nir_to_llvm_context_from_abi(abi);
+   LLVMValueRef vtx_offset;
LLVMValueRef args[9];
unsigned param, vtx_offset_param;
LLVMValueRef value[4], result;
-   unsigned vertex_index;
-   get_deref_offset(ctx->nir, instr->variables[0],
-false, _index, NULL,
-_index, _index);
+
vtx_offset_param = vertex_index;
assert(vtx_offset_param < 6);
vtx_offset = LLVMBuildMul(ctx->builder, 
ctx->gs_vtx_offset[vtx_offset_param],
  LLVMConstInt(ctx->ac.i32, 4, false), "");
 
param = 
shader_io_get_unique_index(instr->variables[0]->var->data.location);
 
unsigned comp = instr->variables[0]->var->data.location_frac;
for (unsigned i = comp; i < instr->num_components + comp; i++) {
if (ctx->ac.chip_class >= GFX9) {
@@ -2966,21 +2965,26 @@ static LLVMValueRef visit_load_var(struct 
ac_nir_context *ctx,
if (instr->dest.ssa.bit_size == 64)
ve *= 2;
 
switch (instr->variables[0]->var->data.mode) {
case nir_var_shader_in:
if (ctx->stage == MESA_SHADER_TESS_CTRL)
return load_tcs_input(ctx->nctx, instr);
if (ctx->stage == MESA_SHADER_TESS_EVAL)
return load_tes_input(ctx->nctx, instr);
if (ctx->stage == MESA_SHADER_GEOMETRY) {
-   return load_gs_input(ctx->nctx, instr);
+   LLVMValueRef indir_index;
+   unsigned const_index, vertex_index;
+   get_deref_offset(ctx, instr->variables[0],
+false, _index, NULL,
+_index, _index);
+   return ctx->abi->load_inputs(ctx->abi, instr, 
vertex_index, const_index);
}
 
for (unsigned chan = comp; chan < ve + comp; chan++) {
if (indir_index) {
unsigned count = glsl_count_attribute_slots(
instr->variables[0]->var->type,
ctx->stage == 
MESA_SHADER_VERTEX);
count -= chan / 4;
LLVMValueRef tmp_vec = 
ac_build_gather_values_extended(
>ac, ctx->abi->inputs + 
idx + chan, count,
@@ -6489,22 +6493,22 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
 
for(int i = 0; i < shader_count; ++i) {
ctx.stage = shaders[i]->info.stage;
ctx.output_mask = 0;
ctx.tess_outputs_written = 0;
ctx.num_output_clips = 
shaders[i]->info.clip_distance_array_size;
ctx.num_output_culls = 
shaders[i]->info.cull_distance_array_size;
 
if (shaders[i]->info.stage == MESA_SHADER_GEOMETRY) {
ctx.gs_next_vertex = ac_build_alloca(, 
ctx.ac.i32, "gs_next_vertex");
-
ctx.gs_max_out_vertices = 
shaders[i]->info.gs.vertices_out;
+   ctx.abi.load_inputs = load_gs_input;
} else if (shaders[i]->info.stage == MESA_SHADER_TESS_EVAL) {
ctx.tes_primitive_mode = 
shaders[i]->info.tess.primitive_mode;
   

[Mesa-dev] [PATCH 19/20] radeonsi: copy some nir gs info

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_shader_nir.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
b/src/gallium/drivers/radeonsi/si_shader_nir.c
index 5b68ff2a07..1933c8c770 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -154,20 +154,26 @@ void si_nir_scan_shader(const struct nir_shader *nir,
assert(nir->info.stage == MESA_SHADER_VERTEX ||
   nir->info.stage == MESA_SHADER_FRAGMENT);
 
info->processor = pipe_shader_type_from_mesa(nir->info.stage);
info->num_tokens = 2; /* indicate that the shader is non-empty */
info->num_instructions = 2;
 
info->num_inputs = nir->num_inputs;
info->num_outputs = nir->num_outputs;
 
+   if (nir->info.stage == MESA_SHADER_GEOMETRY) {
+   info->properties[TGSI_PROPERTY_GS_OUTPUT_PRIM] = 
nir->info.gs.output_primitive;
+   info->properties[TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES] = 
nir->info.gs.vertices_out;
+   info->properties[TGSI_PROPERTY_GS_INVOCATIONS] = 
nir->info.gs.invocations;
+   }
+
i = 0;
nir_foreach_variable(variable, >inputs) {
unsigned semantic_name, semantic_index;
unsigned attrib_count = 
glsl_count_attribute_slots(variable->type,
   
nir->info.stage == MESA_SHADER_VERTEX);
 
assert(attrib_count == 1 && "not implemented");
 
/* Vertex shader inputs don't have semantics. The state
 * tracker has already mapped them to attributes via
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 18/20] ac: add gs_{prim, invocation}_id to the abi

2017-11-09 Thread Timothy Arceri
---
 src/amd/common/ac_nir_to_llvm.c   | 16 
 src/amd/common/ac_shader_abi.h|  2 ++
 src/gallium/drivers/radeonsi/si_shader.c  | 14 ++
 src/gallium/drivers/radeonsi/si_shader_internal.h |  2 --
 4 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 483dd52b36..a82730f9f6 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -115,21 +115,20 @@ struct nir_to_llvm_context {
LLVMValueRef tes_rel_patch_id;
LLVMValueRef tes_patch_id;
LLVMValueRef tes_u;
LLVMValueRef tes_v;
 
LLVMValueRef gsvs_ring_stride;
LLVMValueRef gsvs_num_entries;
LLVMValueRef gs2vs_offset;
LLVMValueRef gs_wave_id;
LLVMValueRef gs_vtx_offset[6];
-   LLVMValueRef gs_prim_id, gs_invocation_id;
 
LLVMValueRef esgs_ring;
LLVMValueRef gsvs_ring;
LLVMValueRef hs_ring_tess_offchip;
LLVMValueRef hs_ring_tess_factor;
 
LLVMValueRef prim_mask;
LLVMValueRef sample_pos_offset;
LLVMValueRef persp_sample, persp_center, persp_centroid;
LLVMValueRef linear_sample, linear_center, linear_centroid;
@@ -819,22 +818,22 @@ static void create_function(struct nir_to_llvm_context 
*ctx,
add_user_sgpr_argument(, ctx->ac.i32, 
>tcs_offchip_layout); // tcs offchip layout
else
radv_define_vs_user_sgprs_phase1(ctx, stage, 
has_previous_stage, previous_stage, );
add_user_sgpr_argument(, ctx->ac.i32, 
>gsvs_ring_stride); // gsvs stride
add_user_sgpr_argument(, ctx->ac.i32, 
>gsvs_num_entries); // gsvs num entires
if (ctx->shader_info->info.needs_multiview_view_index)
add_user_sgpr_argument(, ctx->ac.i32, 
>view_index);
 
add_vgpr_argument(, ctx->ac.i32, 
>gs_vtx_offset[0]); // vtx01
add_vgpr_argument(, ctx->ac.i32, 
>gs_vtx_offset[2]); // vtx23
-   add_vgpr_argument(, ctx->ac.i32, 
>gs_prim_id); // prim id
-   add_vgpr_argument(, ctx->ac.i32, 
>gs_invocation_id);
+   add_vgpr_argument(, ctx->ac.i32, 
>abi.gs_prim_id); // prim id
+   add_vgpr_argument(, ctx->ac.i32, 
>abi.gs_invocation_id);
add_vgpr_argument(, ctx->ac.i32, 
>gs_vtx_offset[4]);
 
if (previous_stage == MESA_SHADER_VERTEX) {
add_vgpr_argument(, ctx->ac.i32, 
>abi.vertex_id); // vertex id
add_vgpr_argument(, ctx->ac.i32, 
>rel_auto_id); // rel auto id
add_vgpr_argument(, ctx->ac.i32, 
>vs_prim_id); // vs prim id
add_vgpr_argument(, ctx->ac.i32, 
>abi.instance_id); // instance id
} else {
add_vgpr_argument(, ctx->ac.f32, 
>tes_u); // tes_u
add_vgpr_argument(, ctx->ac.f32, 
>tes_v); // tes_v
@@ -845,26 +844,26 @@ static void create_function(struct nir_to_llvm_context 
*ctx,
radv_define_common_user_sgprs_phase1(ctx, stage, 
has_previous_stage, previous_stage, _sgpr_info, , _sets);
radv_define_vs_user_sgprs_phase1(ctx, stage, 
has_previous_stage, previous_stage, );
add_user_sgpr_argument(, ctx->ac.i32, 
>gsvs_ring_stride); // gsvs stride
add_user_sgpr_argument(, ctx->ac.i32, 
>gsvs_num_entries); // gsvs num entires
if (ctx->shader_info->info.needs_multiview_view_index)
add_user_sgpr_argument(, ctx->ac.i32, 
>view_index);
add_sgpr_argument(, ctx->ac.i32, 
>gs2vs_offset); // gs2vs offset
add_sgpr_argument(, ctx->ac.i32, 
>gs_wave_id); // wave id
add_vgpr_argument(, ctx->ac.i32, 
>gs_vtx_offset[0]); // vtx0
add_vgpr_argument(, ctx->ac.i32, 
>gs_vtx_offset[1]); // vtx1
-   add_vgpr_argument(, ctx->ac.i32, 
>gs_prim_id); // prim id
+   add_vgpr_argument(, ctx->ac.i32, 
>abi.gs_prim_id); // prim id
add_vgpr_argument(, ctx->ac.i32, 
>gs_vtx_offset[2]);
add_vgpr_argument(, ctx->ac.i32, 
>gs_vtx_offset[3]);
add_vgpr_argument(, ctx->ac.i32, 
>gs_vtx_offset[4]);
add_vgpr_argument(, ctx->ac.i32, 
>gs_vtx_offset[5]);
-   add_vgpr_argument(, ctx->ac.i32, 
>gs_invocation_id);
+   add_vgpr_argument(, ctx->ac.i32, 
>abi.gs_invocation_id);
}
break;
case MESA_SHADER_FRAGMENT:

[Mesa-dev] [PATCH 15/20] radeonsi: add basic nir -> llvm type helper

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_shader_nir.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
b/src/gallium/drivers/radeonsi/si_shader_nir.c
index 847d75ba14..fca16f46cf 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -25,20 +25,42 @@
 #include "si_shader_internal.h"
 
 #include "ac_nir_to_llvm.h"
 
 #include "tgsi/tgsi_from_mesa.h"
 
 #include "compiler/nir/nir.h"
 #include "compiler/nir_types.h"
 
 
+static LLVMTypeRef
+nir2llvmtype(struct si_shader_context *ctx,
+const struct glsl_type *type)
+{
+   switch (glsl_get_base_type(glsl_without_array(type))) {
+   case GLSL_TYPE_UINT:
+   case GLSL_TYPE_INT:
+   return ctx->ac.i32;
+   case GLSL_TYPE_UINT64:
+   case GLSL_TYPE_INT64:
+   return ctx->ac.i64;
+   case GLSL_TYPE_DOUBLE:
+   return ctx->ac.f64;
+   case GLSL_TYPE_FLOAT:
+   return ctx->ac.f32;
+   default:
+   assert(!"Unsupported type in nir2llvmtype()");
+   break;
+   }
+   return 0;
+}
+
 static int
 type_size(const struct glsl_type *type)
 {
return glsl_count_attribute_slots(type, false);
 }
 
 static void scan_instruction(struct tgsi_shader_info *info,
 nir_instr *instr)
 {
if (instr->type == nir_instr_type_alu) {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 16/20] ac: move build_varying_gather_values() to ac_llvm_build.h and expose

2017-11-09 Thread Timothy Arceri
---
 src/amd/common/ac_llvm_build.c  | 22 ++
 src/amd/common/ac_llvm_build.h  |  4 
 src/amd/common/ac_nir_to_llvm.c | 34 ++
 3 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 5640a23b8a..b2bf1bf7b5 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -363,20 +363,42 @@ ac_build_vote_eq(struct ac_llvm_context *ctx, 
LLVMValueRef value)
LLVMValueRef vote_set = ac_build_ballot(ctx, value);
 
LLVMValueRef all = LLVMBuildICmp(ctx->builder, LLVMIntEQ,
 vote_set, active_set, "");
LLVMValueRef none = LLVMBuildICmp(ctx->builder, LLVMIntEQ,
  vote_set,
  LLVMConstInt(ctx->i64, 0, 0), "");
return LLVMBuildOr(ctx->builder, all, none, "");
 }
 
+LLVMValueRef
+ac_build_varying_gather_values(struct ac_llvm_context *ctx, LLVMValueRef 
*values,
+  unsigned value_count, unsigned component)
+{
+   LLVMValueRef vec = NULL;
+
+   if (value_count == 1) {
+   return values[component];
+   } else if (!value_count)
+   unreachable("value_count is 0");
+
+   for (unsigned i = component; i < value_count + component; i++) {
+   LLVMValueRef value = values[i];
+
+   if (!i)
+   vec = LLVMGetUndef( LLVMVectorType(LLVMTypeOf(value), 
value_count));
+   LLVMValueRef index = LLVMConstInt(ctx->i32, i - component, 
false);
+   vec = LLVMBuildInsertElement(ctx->builder, vec, value, index, 
"");
+   }
+   return vec;
+}
+
 LLVMValueRef
 ac_build_gather_values_extended(struct ac_llvm_context *ctx,
LLVMValueRef *values,
unsigned value_count,
unsigned value_stride,
bool load,
bool always_vector)
 {
LLVMBuilderRef builder = ctx->builder;
LLVMValueRef vec = NULL;
diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index 1f51937c9e..655dc1dcc8 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -105,20 +105,24 @@ void ac_build_optimization_barrier(struct ac_llvm_context 
*ctx,
   LLVMValueRef *pvgpr);
 
 LLVMValueRef ac_build_ballot(struct ac_llvm_context *ctx, LLVMValueRef value);
 
 LLVMValueRef ac_build_vote_all(struct ac_llvm_context *ctx, LLVMValueRef 
value);
 
 LLVMValueRef ac_build_vote_any(struct ac_llvm_context *ctx, LLVMValueRef 
value);
 
 LLVMValueRef ac_build_vote_eq(struct ac_llvm_context *ctx, LLVMValueRef value);
 
+LLVMValueRef
+ac_build_varying_gather_values(struct ac_llvm_context *ctx, LLVMValueRef 
*values,
+  unsigned value_count, unsigned component);
+
 LLVMValueRef
 ac_build_gather_values_extended(struct ac_llvm_context *ctx,
LLVMValueRef *values,
unsigned value_count,
unsigned value_stride,
bool load,
bool always_vector);
 LLVMValueRef
 ac_build_gather_values(struct ac_llvm_context *ctx,
   LLVMValueRef *values,
diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 36f471dcc7..158e954fa8 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2673,42 +2673,20 @@ get_dw_address(struct nir_to_llvm_context *ctx,
 
dw_addr = LLVMBuildAdd(ctx->builder, dw_addr,
   LLVMConstInt(ctx->ac.i32, param * 4, false), "");
 
if (const_index && compact_const_index)
dw_addr = LLVMBuildAdd(ctx->builder, dw_addr,
   LLVMConstInt(ctx->ac.i32, const_index, 
false), "");
return dw_addr;
 }
 
-static LLVMValueRef
-build_varying_gather_values(struct ac_llvm_context *ctx, LLVMValueRef *values,
-   unsigned value_count, unsigned component)
-{
-   LLVMValueRef vec = NULL;
-
-   if (value_count == 1) {
-   return values[component];
-   } else if (!value_count)
-   unreachable("value_count is 0");
-
-   for (unsigned i = component; i < value_count + component; i++) {
-   LLVMValueRef value = values[i];
-
-   if (!i)
-   vec = LLVMGetUndef( LLVMVectorType(LLVMTypeOf(value), 
value_count));
-   LLVMValueRef index = LLVMConstInt(ctx->i32, i - component, 
false);
-   vec = LLVMBuildInsertElement(ctx->builder, vec, value, index, 
"");
-   }
-   return vec;
-}
-
 static LLVMValueRef
 load_tcs_input(struct nir_to_llvm_context *ctx,
   

[Mesa-dev] [PATCH 09/20] radeonsi: get llvm types from ac

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c 
b/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
index 9ec5a876f3..59d02605e9 100644
--- a/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
+++ b/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
@@ -163,29 +163,29 @@ out:
 }
 
 LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base,
  enum tgsi_opcode_type type)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
 
switch (type) {
case TGSI_TYPE_UNSIGNED:
case TGSI_TYPE_SIGNED:
-   return ctx->i32;
+   return ctx->ac.i32;
case TGSI_TYPE_UNSIGNED64:
case TGSI_TYPE_SIGNED64:
-   return ctx->i64;
+   return ctx->ac.i64;
case TGSI_TYPE_DOUBLE:
-   return LLVMDoubleTypeInContext(ctx->ac.context);
+   return ctx->ac.f64;
case TGSI_TYPE_UNTYPED:
case TGSI_TYPE_FLOAT:
-   return ctx->f32;
+   return ctx->ac.f32;
default: break;
}
return 0;
 }
 
 LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base,
 enum tgsi_opcode_type type, LLVMValueRef value)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMTypeRef dst_type = tgsi2llvmtype(bld_base, type);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/20] radeonsi: create si_llvm_load_input_gs()

2017-11-09 Thread Timothy Arceri
This creates a common function that can be shared by the tgsi
and nir backends.
---
 src/gallium/drivers/radeonsi/si_shader.c  | 61 ++-
 src/gallium/drivers/radeonsi/si_shader_internal.h |  6 +++
 2 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 37d97cb341..06e3d0f9f1 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1298,47 +1298,42 @@ static void store_output_tcs(struct 
lp_build_tgsi_context *bld_base,
}
 
if (reg->Register.WriteMask == 0xF && !is_tess_factor) {
LLVMValueRef value = lp_build_gather_values(>gallivm,
values, 4);
ac_build_buffer_store_dword(>ac, buffer, value, 4, 
buf_addr,
base, 0, 1, 0, true, false);
}
 }
 
-static LLVMValueRef fetch_input_gs(
-   struct lp_build_tgsi_context *bld_base,
-   const struct tgsi_full_src_register *reg,
-   enum tgsi_opcode_type type,
-   unsigned swizzle)
+LLVMValueRef si_llvm_load_input_gs(struct ac_shader_abi *abi,
+  unsigned input_index,
+  unsigned vtx_offset_param,
+  LLVMTypeRef type,
+  unsigned swizzle)
 {
-   struct si_shader_context *ctx = si_shader_context(bld_base);
+   struct si_shader_context *ctx = si_shader_context_from_abi(abi);
+   struct lp_build_tgsi_context *bld_base = >bld_base;
struct si_shader *shader = ctx->shader;
struct lp_build_context *uint = >bld_base.uint_bld;
LLVMValueRef vtx_offset, soffset;
struct tgsi_shader_info *info = >selector->info;
-   unsigned semantic_name = info->input_semantic_name[reg->Register.Index];
-   unsigned semantic_index = 
info->input_semantic_index[reg->Register.Index];
+   unsigned semantic_name = info->input_semantic_name[input_index];
+   unsigned semantic_index = info->input_semantic_index[input_index];
unsigned param;
LLVMValueRef value;
 
-   if (swizzle != ~0 && semantic_name == TGSI_SEMANTIC_PRIMID)
-   return get_primitive_id(ctx, swizzle);
-
-   if (!reg->Register.Dimension)
-   return NULL;
-
param = si_shader_io_get_unique_index(semantic_name, semantic_index);
 
/* GFX9 has the ESGS ring in LDS. */
if (ctx->screen->b.chip_class >= GFX9) {
-   unsigned index = reg->Dimension.Index;
+   unsigned index = vtx_offset_param;
 
switch (index / 2) {
case 0:
vtx_offset = unpack_param(ctx, 
ctx->param_gs_vtx01_offset,
  index % 2 ? 16 : 0, 16);
break;
case 1:
vtx_offset = unpack_param(ctx, 
ctx->param_gs_vtx23_offset,
  index % 2 ? 16 : 0, 16);
break;
@@ -1346,56 +1341,76 @@ static LLVMValueRef fetch_input_gs(
vtx_offset = unpack_param(ctx, 
ctx->param_gs_vtx45_offset,
  index % 2 ? 16 : 0, 16);
break;
default:
assert(0);
return NULL;
}
 
vtx_offset = LLVMBuildAdd(ctx->ac.builder, vtx_offset,
  LLVMConstInt(ctx->i32, param * 4, 0), 
"");
-   return lds_load(bld_base, tgsi2llvmtype(bld_base, type),
-   swizzle, vtx_offset);
+   return lds_load(bld_base, type, swizzle, vtx_offset);
}
 
/* GFX6: input load from the ESGS ring in memory. */
if (swizzle == ~0) {
LLVMValueRef values[TGSI_NUM_CHANNELS];
unsigned chan;
for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) {
-   values[chan] = fetch_input_gs(bld_base, reg, type, 
chan);
+   values[chan] = si_llvm_load_input_gs(abi, input_index, 
vtx_offset_param,
+type, chan);
}
return lp_build_gather_values(>gallivm, values,
  TGSI_NUM_CHANNELS);
}
 
/* Get the vertex offset parameter on GFX6. */
-   unsigned vtx_offset_param = reg->Dimension.Index;
LLVMValueRef gs_vtx_offset = ctx->gs_vtx_offset[vtx_offset_param];
 
vtx_offset = lp_build_mul_imm(uint, gs_vtx_offset, 4);
 
soffset = LLVMConstInt(ctx->i32, (param * 4 + swizzle) * 256, 0);
 
value = ac_build_buffer_load(>ac, ctx->esgs_ring, 1, ctx->i32_0,
  

[Mesa-dev] [PATCH 08/20] radeonsi: gather stream info in nir path

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_shader_nir.c | 37 
 1 file changed, 37 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
b/src/gallium/drivers/radeonsi/si_shader_nir.c
index 32f6d86647..847d75ba14 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -241,20 +241,57 @@ void si_nir_scan_shader(const struct nir_shader *nir,
_name, _index);
} else {
tgsi_get_gl_varying_semantic(variable->data.location, 
true,
 _name, 
_index);
}
 
info->output_semantic_name[i] = semantic_name;
info->output_semantic_index[i] = semantic_index;
info->output_usagemask[i] = TGSI_WRITEMASK_XYZW;
 
+   unsigned num_components = 4;
+   unsigned vector_elements = 
glsl_get_vector_elements(glsl_without_array(variable->type));
+   if (vector_elements)
+   num_components = vector_elements;
+
+   unsigned gs_out_streams;
+   if (variable->data.stream & (1u << 31)) {
+   gs_out_streams = variable->data.stream & ~(1u << 31);
+   } else {
+   assert(variable->data.stream < 4);
+   gs_out_streams = 0;
+   for (unsigned j = 0; j < num_components; ++j)
+   gs_out_streams |= variable->data.stream << (2 * 
(variable->data.location_frac + j));
+   }
+
+   unsigned streamx = gs_out_streams & 3;
+   unsigned streamy = (gs_out_streams >> 2) & 3;
+   unsigned streamz = (gs_out_streams >> 4) & 3;
+   unsigned streamw = (gs_out_streams >> 6) & 3;
+
+   if (info->output_usagemask[i] & TGSI_WRITEMASK_X) {
+   info->output_streams[i] |= streamx;
+   info->num_stream_output_components[streamx]++;
+   }
+   if (info->output_usagemask[i] & TGSI_WRITEMASK_Y) {
+   info->output_streams[i] |= streamy << 2;
+   info->num_stream_output_components[streamy]++;
+   }
+   if (info->output_usagemask[i] & TGSI_WRITEMASK_Z) {
+   info->output_streams[i] |= streamz << 4;
+   info->num_stream_output_components[streamz]++;
+   }
+   if (info->output_usagemask[i] & TGSI_WRITEMASK_W) {
+   info->output_streams[i] |= streamw << 6;
+   info->num_stream_output_components[streamw]++;
+   }
+
switch (semantic_name) {
case TGSI_SEMANTIC_PRIMID:
info->writes_primid = true;
break;
case TGSI_SEMANTIC_VIEWPORT_INDEX:
info->writes_viewport_index = true;
break;
case TGSI_SEMANTIC_LAYER:
info->writes_layer = true;
break;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/20] radeonsi: add nir support for gs epilogue

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_shader.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 10b1890b4f..efaab0a7a1 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -3257,21 +3257,37 @@ static void si_tgsi_emit_es_epilogue(struct 
lp_build_tgsi_context *bld_base)
 }
 
 static LLVMValueRef si_get_gs_wave_id(struct si_shader_context *ctx)
 {
if (ctx->screen->b.chip_class >= GFX9)
return unpack_param(ctx, ctx->param_merged_wave_info, 16, 8);
else
return LLVMGetParam(ctx->main_fn, ctx->param_gs_wave_id);
 }
 
-static void si_llvm_emit_gs_epilogue(struct lp_build_tgsi_context *bld_base)
+static void si_llvm_emit_gs_epilogue(struct ac_shader_abi *abi,
+unsigned max_outputs,
+LLVMValueRef *addrs)
+{
+   struct si_shader_context *ctx = si_shader_context_from_abi(abi);
+   struct tgsi_shader_info UNUSED *info = >shader->selector->info;
+
+   assert(info->num_outputs <= max_outputs);
+
+   ac_build_sendmsg(>ac, AC_SENDMSG_GS_OP_NOP | AC_SENDMSG_GS_DONE,
+si_get_gs_wave_id(ctx));
+
+   if (ctx->screen->b.chip_class >= GFX9)
+   lp_build_endif(>merged_wrap_if_state);
+}
+
+static void si_tgsi_emit_gs_epilogue(struct lp_build_tgsi_context *bld_base)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
 
ac_build_sendmsg(>ac, AC_SENDMSG_GS_OP_NOP | AC_SENDMSG_GS_DONE,
 si_get_gs_wave_id(ctx));
 
if (ctx->screen->b.chip_class >= GFX9)
lp_build_endif(>merged_wrap_if_state);
 }
 
@@ -5772,21 +5788,22 @@ static bool si_compile_tgsi_main(struct 
si_shader_context *ctx,
ctx->abi.emit_outputs = si_llvm_emit_es_epilogue;
bld_base->emit_epilogue = si_tgsi_emit_es_epilogue;
} else {
ctx->abi.emit_outputs = si_llvm_emit_vs_epilogue;
bld_base->emit_epilogue = si_tgsi_emit_epilogue;
}
break;
case PIPE_SHADER_GEOMETRY:
bld_base->emit_fetch_funcs[TGSI_FILE_INPUT] = fetch_input_gs;
ctx->abi.emit_vertex = si_llvm_emit_vertex;
-   bld_base->emit_epilogue = si_llvm_emit_gs_epilogue;
+   ctx->abi.emit_outputs = si_llvm_emit_gs_epilogue;
+   bld_base->emit_epilogue = si_tgsi_emit_gs_epilogue;
break;
case PIPE_SHADER_FRAGMENT:
ctx->load_input = declare_input_fs;
ctx->abi.emit_outputs = si_llvm_return_fs_outputs;
bld_base->emit_epilogue = si_tgsi_emit_epilogue;
break;
case PIPE_SHADER_COMPUTE:
break;
default:
assert(!"Unsupported shader type");
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/20] radeonsi: add nir support for es epilogue

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_shader.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index cc68d0ac6f..10b1890b4f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -3185,75 +3185,84 @@ static void si_llvm_emit_ls_epilogue(struct 
ac_shader_abi *abi,
 }
 
 static void si_tgsi_emit_ls_epilogue(struct lp_build_tgsi_context *bld_base)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
 
ctx->abi.emit_outputs(>abi, RADEON_LLVM_MAX_OUTPUTS,
  ctx->outputs[0]);
 }
 
-static void si_llvm_emit_es_epilogue(struct lp_build_tgsi_context *bld_base)
+static void si_llvm_emit_es_epilogue(struct ac_shader_abi *abi,
+unsigned max_outputs,
+LLVMValueRef *addrs)
 {
-   struct si_shader_context *ctx = si_shader_context(bld_base);
+   struct si_shader_context *ctx = si_shader_context_from_abi(abi);
struct si_shader *es = ctx->shader;
struct tgsi_shader_info *info = >selector->info;
LLVMValueRef soffset = LLVMGetParam(ctx->main_fn,
ctx->param_es2gs_offset);
LLVMValueRef lds_base = NULL;
unsigned chan;
int i;
 
if (ctx->screen->b.chip_class >= GFX9 && info->num_outputs) {
unsigned itemsize_dw = es->selector->esgs_itemsize / 4;
LLVMValueRef vertex_idx = ac_get_thread_id(>ac);
LLVMValueRef wave_idx = unpack_param(ctx, 
ctx->param_merged_wave_info, 24, 4);
vertex_idx = LLVMBuildOr(ctx->ac.builder, vertex_idx,
 LLVMBuildMul(ctx->ac.builder, wave_idx,
  LLVMConstInt(ctx->i32, 
64, false), ""), "");
lds_base = LLVMBuildMul(ctx->ac.builder, vertex_idx,
LLVMConstInt(ctx->i32, itemsize_dw, 0), 
"");
}
 
for (i = 0; i < info->num_outputs; i++) {
-   LLVMValueRef *out_ptr = ctx->outputs[i];
int param;
 
if (info->output_semantic_name[i] == 
TGSI_SEMANTIC_VIEWPORT_INDEX ||
info->output_semantic_name[i] == TGSI_SEMANTIC_LAYER)
continue;
 
param = 
si_shader_io_get_unique_index(info->output_semantic_name[i],
  
info->output_semantic_index[i]);
 
for (chan = 0; chan < 4; chan++) {
-   LLVMValueRef out_val = LLVMBuildLoad(ctx->ac.builder, 
out_ptr[chan], "");
+   LLVMValueRef out_val = LLVMBuildLoad(ctx->ac.builder, 
addrs[4 * i + chan], "");
out_val = ac_to_integer(>ac, out_val);
 
/* GFX9 has the ESGS ring in LDS. */
if (ctx->screen->b.chip_class >= GFX9) {
lds_store(ctx, param * 4 + chan, lds_base, 
out_val);
continue;
}
 
ac_build_buffer_store_dword(>ac,
ctx->esgs_ring,
out_val, 1, NULL, soffset,
(4 * param + chan) * 4,
1, 1, true, true);
}
}
 
if (ctx->screen->b.chip_class >= GFX9)
si_set_es_return_value_for_gs(ctx);
 }
 
+static void si_tgsi_emit_es_epilogue(struct lp_build_tgsi_context *bld_base)
+{
+   struct si_shader_context *ctx = si_shader_context(bld_base);
+
+   ctx->abi.emit_outputs(>abi, RADEON_LLVM_MAX_OUTPUTS,
+ ctx->outputs[0]);
+}
+
 static LLVMValueRef si_get_gs_wave_id(struct si_shader_context *ctx)
 {
if (ctx->screen->b.chip_class >= GFX9)
return unpack_param(ctx, ctx->param_merged_wave_info, 16, 8);
else
return LLVMGetParam(ctx->main_fn, ctx->param_gs_wave_id);
 }
 
 static void si_llvm_emit_gs_epilogue(struct lp_build_tgsi_context *bld_base)
 {
@@ -4430,21 +4439,20 @@ static void create_function(struct si_shader_context 
*ctx)
 
/* VGPRs */
declare_vs_input_vgprs(ctx, , _prolog_vgprs);
break;
}
 
declare_per_stage_desc_pointers(ctx, , true);
declare_vs_specific_input_sgprs(ctx, );
 
if (shader->key.as_es) {
-   assert(!shader->selector->nir);
ctx->param_es2gs_offset = add_arg(, ARG_SGPR, 
ctx->i32);
} else if (shader->key.as_ls) {

[Mesa-dev] [PATCH 12/20] radeonsi: add llvm_type_is_64bit() helper

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_shader.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 33c37d438b..3708696c69 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -96,20 +96,29 @@ static void si_build_ps_epilog_function(struct 
si_shader_context *ctx,
 /* Ideally pass the sample mask input to the PS epilog as v14, which
  * is its usual location, so that the shader doesn't have to add v_mov.
  */
 #define PS_EPILOG_SAMPLEMASK_MIN_LOC 14
 
 enum {
CONST_ADDR_SPACE = 2,
LOCAL_ADDR_SPACE = 3,
 };
 
+static bool llvm_type_is_64bit(struct si_shader_context *ctx,
+  LLVMTypeRef type)
+{
+   if (type == ctx->ac.i64 || type == ctx->ac.f64)
+   return true;
+
+   return false;
+}
+
 static bool is_merged_shader(struct si_shader *shader)
 {
if (shader->selector->screen->b.chip_class <= VI)
return false;
 
return shader->key.as_ls ||
   shader->key.as_es ||
   shader->selector->type == PIPE_SHADER_TESS_CTRL ||
   shader->selector->type == PIPE_SHADER_GEOMETRY;
 }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/20] radeonsi: pass llvm type to lds_load()

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_shader.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 3708696c69..37d97cb341 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1078,50 +1078,49 @@ static LLVMValueRef buffer_load(struct 
lp_build_tgsi_context *bld_base,
 }
 
 /**
  * Load from LDS.
  *
  * \param type output value type
  * \param swizzle  offset (typically 0..3); it can be ~0, which loads a 
vec4
  * \param dw_addr  address in dwords
  */
 static LLVMValueRef lds_load(struct lp_build_tgsi_context *bld_base,
-enum tgsi_opcode_type type, unsigned swizzle,
+LLVMTypeRef type, unsigned swizzle,
 LLVMValueRef dw_addr)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMValueRef value;
 
if (swizzle == ~0) {
LLVMValueRef values[TGSI_NUM_CHANNELS];
 
for (unsigned chan = 0; chan < TGSI_NUM_CHANNELS; chan++)
values[chan] = lds_load(bld_base, type, chan, dw_addr);
 
return lp_build_gather_values(>gallivm, values,
  TGSI_NUM_CHANNELS);
}
 
dw_addr = lp_build_add(_base->uint_bld, dw_addr,
LLVMConstInt(ctx->i32, swizzle, 0));
 
value = ac_lds_load(>ac, dw_addr);
-   if (tgsi_type_is_64bit(type)) {
+   if (llvm_type_is_64bit(ctx, type)) {
LLVMValueRef value2;
dw_addr = lp_build_add(_base->uint_bld, dw_addr,
   ctx->i32_1);
value2 = ac_lds_load(>ac, dw_addr);
-   return si_llvm_emit_fetch_64bit(bld_base, 
tgsi2llvmtype(bld_base, type),
-   value, value2);
+   return si_llvm_emit_fetch_64bit(bld_base, type, value, value2);
}
 
-   return bitcast(bld_base, type, value);
+   return bitcast_llvmtype(ctx, type, value);
 }
 
 /**
  * Store to LDS.
  *
  * \param swizzle  offset (typically 0..3)
  * \param dw_addr  address in dwords
  * \param valuevalue to store
  */
 static void lds_store(struct si_shader_context *ctx,
@@ -1163,41 +1162,41 @@ static LLVMValueRef fetch_input_tcs(
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type, unsigned swizzle)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMValueRef dw_addr, stride;
 
stride = get_tcs_in_vertex_dw_stride(ctx);
dw_addr = get_tcs_in_current_patch_offset(ctx);
dw_addr = get_dw_address(ctx, NULL, reg, stride, dw_addr);
 
-   return lds_load(bld_base, type, swizzle, dw_addr);
+   return lds_load(bld_base, tgsi2llvmtype(bld_base, type), swizzle, 
dw_addr);
 }
 
 static LLVMValueRef fetch_output_tcs(
struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type, unsigned swizzle)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMValueRef dw_addr, stride;
 
if (reg->Register.Dimension) {
stride = get_tcs_out_vertex_dw_stride(ctx);
dw_addr = get_tcs_out_current_patch_offset(ctx);
dw_addr = get_dw_address(ctx, NULL, reg, stride, dw_addr);
} else {
dw_addr = get_tcs_out_current_patch_data_offset(ctx);
dw_addr = get_dw_address(ctx, NULL, reg, NULL, dw_addr);
}
 
-   return lds_load(bld_base, type, swizzle, dw_addr);
+   return lds_load(bld_base, tgsi2llvmtype(bld_base, type), swizzle, 
dw_addr);
 }
 
 static LLVMValueRef fetch_input_tes(
struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type, unsigned swizzle)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMValueRef buffer, base, addr;
 
@@ -1347,21 +1346,22 @@ static LLVMValueRef fetch_input_gs(
vtx_offset = unpack_param(ctx, 
ctx->param_gs_vtx45_offset,
  index % 2 ? 16 : 0, 16);
break;
default:
assert(0);
return NULL;
}
 
vtx_offset = LLVMBuildAdd(ctx->ac.builder, vtx_offset,
  LLVMConstInt(ctx->i32, param * 4, 0), 
"");
-   return lds_load(bld_base, type, swizzle, vtx_offset);
+   return lds_load(bld_base, tgsi2llvmtype(bld_base, type),
+   swizzle, vtx_offset);
}
 
/* GFX6: input load from the 

[Mesa-dev] [PATCH 10/20] radeonsi: introduce bitcast_llvmtype()

2017-11-09 Thread Timothy Arceri
This is like bitcast() but takes an llvm type rather than a tgsi
type.
---
 src/gallium/drivers/radeonsi/si_shader_internal.h   | 3 +++
 src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c | 9 +
 2 files changed, 12 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader_internal.h 
b/src/gallium/drivers/radeonsi/si_shader_internal.h
index 7ff8815b92..4da39830f6 100644
--- a/src/gallium/drivers/radeonsi/si_shader_internal.h
+++ b/src/gallium/drivers/radeonsi/si_shader_internal.h
@@ -242,20 +242,23 @@ si_shader_context_from_abi(struct ac_shader_abi *abi)
 
 void si_llvm_add_attribute(LLVMValueRef F, const char *name, int value);
 
 unsigned si_llvm_compile(LLVMModuleRef M, struct ac_shader_binary *binary,
 LLVMTargetMachineRef tm,
 struct pipe_debug_callback *debug);
 
 LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context *bld_base,
  enum tgsi_opcode_type type);
 
+LLVMValueRef bitcast_llvmtype(struct si_shader_context *ctx,
+ LLVMTypeRef type, LLVMValueRef value);
+
 LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base,
 enum tgsi_opcode_type type, LLVMValueRef value);
 
 LLVMValueRef si_llvm_bound_index(struct si_shader_context *ctx,
 LLVMValueRef index,
 unsigned num);
 
 void si_llvm_context_init(struct si_shader_context *ctx,
  struct si_screen *sscreen,
  LLVMTargetMachineRef tm);
diff --git a/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c 
b/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
index 59d02605e9..b6a919fc8f 100644
--- a/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
+++ b/src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c
@@ -177,20 +177,29 @@ LLVMTypeRef tgsi2llvmtype(struct lp_build_tgsi_context 
*bld_base,
case TGSI_TYPE_DOUBLE:
return ctx->ac.f64;
case TGSI_TYPE_UNTYPED:
case TGSI_TYPE_FLOAT:
return ctx->ac.f32;
default: break;
}
return 0;
 }
 
+LLVMValueRef bitcast_llvmtype(struct si_shader_context *ctx,
+ LLVMTypeRef type, LLVMValueRef value)
+{
+   if (type)
+   return LLVMBuildBitCast(ctx->ac.builder, value, type, "");
+   else
+   return value;
+}
+
 LLVMValueRef bitcast(struct lp_build_tgsi_context *bld_base,
 enum tgsi_opcode_type type, LLVMValueRef value)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMTypeRef dst_type = tgsi2llvmtype(bld_base, type);
 
if (dst_type)
return LLVMBuildBitCast(ctx->ac.builder, value, dst_type, "");
else
return value;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/20] radeonsi: pass llvm type to si_llvm_emit_fetch_64bit()

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_shader.c| 11 +++
 src/gallium/drivers/radeonsi/si_shader_internal.h   |  2 +-
 src/gallium/drivers/radeonsi/si_shader_tgsi_setup.c | 17 ++---
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index efaab0a7a1..33c37d438b 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1057,21 +1057,22 @@ static LLVMValueRef buffer_load(struct 
lp_build_tgsi_context *bld_base,
return LLVMBuildExtractElement(ctx->ac.builder, value,
LLVMConstInt(ctx->i32, swizzle, 0), "");
}
 
value = ac_build_buffer_load(>ac, buffer, 1, NULL, base, offset,
  swizzle * 4, 1, 0, can_speculate, false);
 
value2 = ac_build_buffer_load(>ac, buffer, 1, NULL, base, offset,
   swizzle * 4 + 4, 1, 0, can_speculate, false);
 
-   return si_llvm_emit_fetch_64bit(bld_base, type, value, value2);
+   return si_llvm_emit_fetch_64bit(bld_base, tgsi2llvmtype(bld_base, type),
+   value, value2);
 }
 
 /**
  * Load from LDS.
  *
  * \param type output value type
  * \param swizzle  offset (typically 0..3); it can be ~0, which loads a 
vec4
  * \param dw_addr  address in dwords
  */
 static LLVMValueRef lds_load(struct lp_build_tgsi_context *bld_base,
@@ -1093,21 +1094,22 @@ static LLVMValueRef lds_load(struct 
lp_build_tgsi_context *bld_base,
 
dw_addr = lp_build_add(_base->uint_bld, dw_addr,
LLVMConstInt(ctx->i32, swizzle, 0));
 
value = ac_lds_load(>ac, dw_addr);
if (tgsi_type_is_64bit(type)) {
LLVMValueRef value2;
dw_addr = lp_build_add(_base->uint_bld, dw_addr,
   ctx->i32_1);
value2 = ac_lds_load(>ac, dw_addr);
-   return si_llvm_emit_fetch_64bit(bld_base, type, value, value2);
+   return si_llvm_emit_fetch_64bit(bld_base, 
tgsi2llvmtype(bld_base, type),
+   value, value2);
}
 
return bitcast(bld_base, type, value);
 }
 
 /**
  * Store to LDS.
  *
  * \param swizzle  offset (typically 0..3)
  * \param dw_addr  address in dwords
@@ -1367,21 +1369,21 @@ static LLVMValueRef fetch_input_gs(
 
value = ac_build_buffer_load(>ac, ctx->esgs_ring, 1, ctx->i32_0,
 vtx_offset, soffset, 0, 1, 0, true, false);
if (tgsi_type_is_64bit(type)) {
LLVMValueRef value2;
soffset = LLVMConstInt(ctx->i32, (param * 4 + swizzle + 1) * 
256, 0);
 
value2 = ac_build_buffer_load(>ac, ctx->esgs_ring, 1,
  ctx->i32_0, vtx_offset, soffset,
  0, 1, 0, true, false);
-   return si_llvm_emit_fetch_64bit(bld_base, type,
+   return si_llvm_emit_fetch_64bit(bld_base, 
tgsi2llvmtype(bld_base, type),
value, value2);
}
return bitcast(bld_base, type, value);
 }
 
 static int lookup_interp_param_index(unsigned interpolate, unsigned location)
 {
switch (interpolate) {
case TGSI_INTERPOLATE_CONSTANT:
return 0;
@@ -1971,21 +1973,22 @@ static LLVMValueRef fetch_constant(
 
return lp_build_gather_values(>gallivm, values, 4);
}
 
/* Split 64-bit loads. */
if (tgsi_type_is_64bit(type)) {
LLVMValueRef lo, hi;
 
lo = fetch_constant(bld_base, reg, TGSI_TYPE_UNSIGNED, swizzle);
hi = fetch_constant(bld_base, reg, TGSI_TYPE_UNSIGNED, swizzle 
+ 1);
-   return si_llvm_emit_fetch_64bit(bld_base, type, lo, hi);
+   return si_llvm_emit_fetch_64bit(bld_base, 
tgsi2llvmtype(bld_base, type),
+   lo, hi);
}
 
idx = reg->Register.Index * 4 + swizzle;
if (reg->Register.Indirect) {
addr = si_get_indirect_index(ctx, ireg, 16, idx * 4);
} else {
addr = LLVMConstInt(ctx->i32, idx * 4, 0);
}
 
/* Fast path when user data SGPRs point to constant buffer 0 directly. 
*/
diff --git a/src/gallium/drivers/radeonsi/si_shader_internal.h 
b/src/gallium/drivers/radeonsi/si_shader_internal.h
index 4da39830f6..42cd80216d 100644
--- a/src/gallium/drivers/radeonsi/si_shader_internal.h
+++ b/src/gallium/drivers/radeonsi/si_shader_internal.h
@@ -268,21 +268,21 @@ void si_llvm_context_set_tgsi(struct si_shader_context 
*ctx,
 void si_llvm_create_func(struct si_shader_context *ctx,
 const char *name,
 LLVMTypeRef 

[Mesa-dev] [PATCH 03/20] radeonsi: rework gs_vtx_offset handling

2017-11-09 Thread Timothy Arceri
This simplifies things a bit and will enable it to work with the
common NIR -> LLVM code.
---
 src/gallium/drivers/radeonsi/si_shader.c  | 25 ---
 src/gallium/drivers/radeonsi/si_shader_internal.h |  7 +--
 2 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index c95f8d7ed7..d234e08071 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1354,30 +1354,23 @@ static LLVMValueRef fetch_input_gs(
unsigned chan;
for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) {
values[chan] = fetch_input_gs(bld_base, reg, type, 
chan);
}
return lp_build_gather_values(>gallivm, values,
  TGSI_NUM_CHANNELS);
}
 
/* Get the vertex offset parameter on GFX6. */
unsigned vtx_offset_param = reg->Dimension.Index;
-   if (vtx_offset_param < 2) {
-   vtx_offset_param += ctx->param_gs_vtx0_offset;
-   } else {
-   assert(vtx_offset_param < 6);
-   vtx_offset_param += ctx->param_gs_vtx2_offset - 2;
-   }
-   vtx_offset = lp_build_mul_imm(uint,
- LLVMGetParam(ctx->main_fn,
-  vtx_offset_param),
- 4);
+   LLVMValueRef gs_vtx_offset = ctx->gs_vtx_offset[vtx_offset_param];
+
+   vtx_offset = lp_build_mul_imm(uint, gs_vtx_offset, 4);
 
soffset = LLVMConstInt(ctx->i32, (param * 4 + swizzle) * 256, 0);
 
value = ac_build_buffer_load(>ac, ctx->esgs_ring, 1, ctx->i32_0,
 vtx_offset, soffset, 0, 1, 0, true, false);
if (tgsi_type_is_64bit(type)) {
LLVMValueRef value2;
soffset = LLVMConstInt(ctx->i32, (param * 4 + swizzle + 1) * 
256, 0);
 
value2 = ac_build_buffer_load(>ac, ctx->esgs_ring, 1,
@@ -4605,27 +4598,27 @@ static void create_function(struct si_shader_context 
*ctx)
declare_tes_input_vgprs(ctx, );
break;
 
case PIPE_SHADER_GEOMETRY:
declare_global_desc_pointers(ctx, );
declare_per_stage_desc_pointers(ctx, , true);
ctx->param_gs2vs_offset = add_arg(, ARG_SGPR, ctx->i32);
ctx->param_gs_wave_id = add_arg(, ARG_SGPR, ctx->i32);
 
/* VGPRs */
-   ctx->param_gs_vtx0_offset = add_arg(, ARG_VGPR, 
ctx->i32);
-   ctx->param_gs_vtx1_offset = add_arg(, ARG_VGPR, 
ctx->i32);
+   add_arg_assign(, ARG_VGPR, ctx->i32, 
>gs_vtx_offset[0]);
+   add_arg_assign(, ARG_VGPR, ctx->i32, 
>gs_vtx_offset[1]);
ctx->param_gs_prim_id = add_arg(, ARG_VGPR, ctx->i32);
-   ctx->param_gs_vtx2_offset = add_arg(, ARG_VGPR, 
ctx->i32);
-   ctx->param_gs_vtx3_offset = add_arg(, ARG_VGPR, 
ctx->i32);
-   ctx->param_gs_vtx4_offset = add_arg(, ARG_VGPR, 
ctx->i32);
-   ctx->param_gs_vtx5_offset = add_arg(, ARG_VGPR, 
ctx->i32);
+   add_arg_assign(, ARG_VGPR, ctx->i32, 
>gs_vtx_offset[2]);
+   add_arg_assign(, ARG_VGPR, ctx->i32, 
>gs_vtx_offset[3]);
+   add_arg_assign(, ARG_VGPR, ctx->i32, 
>gs_vtx_offset[4]);
+   add_arg_assign(, ARG_VGPR, ctx->i32, 
>gs_vtx_offset[5]);
ctx->param_gs_instance_id = add_arg(, ARG_VGPR, 
ctx->i32);
break;
 
case PIPE_SHADER_FRAGMENT:
declare_global_desc_pointers(ctx, );
declare_per_stage_desc_pointers(ctx, , true);
add_arg_checked(, ARG_SGPR, ctx->f32, 
SI_PARAM_ALPHA_REF);
add_arg_checked(, ARG_SGPR, ctx->i32, 
SI_PARAM_PRIM_MASK);
 
add_arg_checked(, ARG_VGPR, ctx->v2i32, 
SI_PARAM_PERSP_SAMPLE);
diff --git a/src/gallium/drivers/radeonsi/si_shader_internal.h 
b/src/gallium/drivers/radeonsi/si_shader_internal.h
index b249bf961a..7ff8815b92 100644
--- a/src/gallium/drivers/radeonsi/si_shader_internal.h
+++ b/src/gallium/drivers/radeonsi/si_shader_internal.h
@@ -175,27 +175,22 @@ struct si_shader_context {
/* API TES */
int param_tes_u;
int param_tes_v;
int param_tes_rel_patch_id;
int param_tes_patch_id;
/* HW ES */
int param_es2gs_offset;
/* API GS */
int param_gs2vs_offset;
int param_gs_wave_id; /* GFX6 */
-   int param_gs_vtx0_offset; /* in dwords (GFX6) */
-   int param_gs_vtx1_offset; /* in dwords (GFX6) */
+   LLVMValueRef gs_vtx_offset[6]; /* in dwords (GFX6) */
int param_gs_prim_id;
-   int param_gs_vtx2_offset; /* in dwords (GFX6) */
-   int param_gs_vtx3_offset; /* in dwords (GFX6) */
-   int param_gs_vtx4_offset; /* in 

[Mesa-dev] [PATCH 01/20] st/glsl_to_nir: disable io type lowering for stages other than vs and fs

2017-11-09 Thread Timothy Arceri
This is too simple and breaks gs and I'm not sure its required there
anyway.
---
 src/mesa/state_tracker/st_glsl_to_nir.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index 6c474cb718..d478725fbe 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -359,21 +359,24 @@ sort_varyings(struct exec_list *var_list)
  * variant lowering.
  */
 void
 st_finalize_nir(struct st_context *st, struct gl_program *prog,
 struct gl_shader_program *shader_program, nir_shader *nir)
 {
struct pipe_screen *screen = st->pipe->screen;
 
NIR_PASS_V(nir, nir_split_var_copies);
NIR_PASS_V(nir, nir_lower_var_copies);
-   NIR_PASS_V(nir, nir_lower_io_types);
+
+   if (nir->info.stage == MESA_SHADER_VERTEX ||
+   nir->info.stage == MESA_SHADER_FRAGMENT)
+  NIR_PASS_V(nir, nir_lower_io_types);
 
if (nir->info.stage == MESA_SHADER_VERTEX) {
   /* Needs special handling so drvloc matches the vbo state: */
   st_nir_assign_vs_in_locations(prog, nir);
   /* Re-lower global vars, to deal with any dead VS inputs. */
   NIR_PASS_V(nir, nir_lower_global_vars_to_local);
 
   sort_varyings(>outputs);
   nir_assign_var_locations(>outputs,
>num_outputs,
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/20] radeonsi: add nir support for ls epilogue

2017-11-09 Thread Timothy Arceri
---
 src/gallium/drivers/radeonsi/si_shader.c | 37 +++-
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 47ca64fdea..cc68d0ac6f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1106,27 +1106,25 @@ static LLVMValueRef lds_load(struct 
lp_build_tgsi_context *bld_base,
return bitcast(bld_base, type, value);
 }
 
 /**
  * Store to LDS.
  *
  * \param swizzle  offset (typically 0..3)
  * \param dw_addr  address in dwords
  * \param valuevalue to store
  */
-static void lds_store(struct lp_build_tgsi_context *bld_base,
+static void lds_store(struct si_shader_context *ctx,
  unsigned dw_offset_imm, LLVMValueRef dw_addr,
  LLVMValueRef value)
 {
-   struct si_shader_context *ctx = si_shader_context(bld_base);
-
-   dw_addr = lp_build_add(_base->uint_bld, dw_addr,
+   dw_addr = lp_build_add(>bld_base.uint_bld, dw_addr,
LLVMConstInt(ctx->i32, dw_offset_imm, 0));
 
ac_lds_store(>ac, dw_addr, value);
 }
 
 static LLVMValueRef desc_from_addr_base64k(struct si_shader_context *ctx,
  unsigned param)
 {
LLVMBuilderRef builder = ctx->ac.builder;
 
@@ -1258,21 +1256,21 @@ static void store_output_tcs(struct 
lp_build_tgsi_context *bld_base,
uint32_t writemask = reg->Register.WriteMask;
while (writemask) {
chan_index = u_bit_scan();
LLVMValueRef value = dst[chan_index];
 
if (inst->Instruction.Saturate)
value = ac_build_clamp(>ac, value);
 
/* Skip LDS stores if there is no LDS read of this output. */
if (!skip_lds_store)
-   lds_store(bld_base, chan_index, dw_addr, value);
+   lds_store(ctx, chan_index, dw_addr, value);
 
value = ac_to_integer(>ac, value);
values[chan_index] = value;
 
if (reg->Register.WriteMask != 0xF && !is_tess_factor) {
ac_build_buffer_store_dword(>ac, buffer, value, 1,
buf_addr, base,
4 * chan_index, 1, 0, true, 
false);
}
 
@@ -3126,36 +3124,37 @@ static void si_set_es_return_value_for_gs(struct 
si_shader_context *ctx)
   8 + 
GFX9_SGPR_GS_SAMPLERS_AND_IMAGES);
 
unsigned vgpr = 8 + GFX9_GS_NUM_USER_SGPR;
for (unsigned i = 0; i < 5; i++) {
unsigned param = ctx->param_gs_vtx01_offset + i;
ret = si_insert_input_ret_float(ctx, ret, param, vgpr++);
}
ctx->return_value = ret;
 }
 
-static void si_llvm_emit_ls_epilogue(struct lp_build_tgsi_context *bld_base)
+static void si_llvm_emit_ls_epilogue(struct ac_shader_abi *abi,
+unsigned max_outputs,
+LLVMValueRef *addrs)
 {
-   struct si_shader_context *ctx = si_shader_context(bld_base);
+   struct si_shader_context *ctx = si_shader_context_from_abi(abi);
struct si_shader *shader = ctx->shader;
struct tgsi_shader_info *info = >selector->info;
unsigned i, chan;
LLVMValueRef vertex_id = LLVMGetParam(ctx->main_fn,
  ctx->param_rel_auto_id);
LLVMValueRef vertex_dw_stride = get_tcs_in_vertex_dw_stride(ctx);
LLVMValueRef base_dw_addr = LLVMBuildMul(ctx->ac.builder, vertex_id,
 vertex_dw_stride, "");
 
/* Write outputs to LDS. The next shader (TCS aka HS) will read
 * its inputs from it. */
for (i = 0; i < info->num_outputs; i++) {
-   LLVMValueRef *out_ptr = ctx->outputs[i];
unsigned name = info->output_semantic_name[i];
unsigned index = info->output_semantic_index[i];
 
/* The ARB_shader_viewport_layer_array spec contains the
 * following issue:
 *
 *2) What happens if gl_ViewportIndex or gl_Layer is
 *written in the vertex shader and a geometry shader is
 *present?
 *
@@ -3169,29 +3168,37 @@ static void si_llvm_emit_ls_epilogue(struct 
lp_build_tgsi_context *bld_base)
 */
if (name == TGSI_SEMANTIC_LAYER ||
name == TGSI_SEMANTIC_VIEWPORT_INDEX)
continue;
 
int param = si_shader_io_get_unique_index(name, index);
LLVMValueRef dw_addr = LLVMBuildAdd(ctx->ac.builder, 
base_dw_addr,
LLVMConstInt(ctx->i32, param * 

[Mesa-dev] [PATCH 02/20] nir: add streams to nir data

2017-11-09 Thread Timothy Arceri
This will be used by gallium drivers.
---
 src/compiler/glsl/glsl_to_nir.cpp | 1 +
 src/compiler/nir/nir.h| 8 
 2 files changed, 9 insertions(+)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index caea2ea3b2..d327f52be6 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -315,20 +315,21 @@ nir_visitor::visit(ir_variable *ir)
var->type = ir->type;
var->name = ralloc_strdup(var, ir->name);
 
var->data.always_active_io = ir->data.always_active_io;
var->data.read_only = ir->data.read_only;
var->data.centroid = ir->data.centroid;
var->data.sample = ir->data.sample;
var->data.patch = ir->data.patch;
var->data.invariant = ir->data.invariant;
var->data.location = ir->data.location;
+   var->data.stream = ir->data.stream;
var->data.compact = false;
 
switch(ir->data.mode) {
case ir_var_auto:
case ir_var_temporary:
   if (is_global)
  var->data.mode = nir_var_global;
   else
  var->data.mode = nir_var_local;
   break;
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 1a33d751bd..b6c7ac3e54 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -267,20 +267,28 @@ typedef struct nir_variable {
* slot has not been assigned, the value will be -1.
*/
   int location;
 
   /**
* The actual location of the variable in the IR. Only valid for inputs
* and outputs.
*/
   unsigned int driver_location;
 
+  /**
+   * Vertex stream output identifier.
+   *
+   * For packed outputs, bit 31 is set and bits [2*i+1,2*i] indicate the
+   * stream of the i-th component.
+   */
+  unsigned stream;
+
   /**
* output index for dual source blending.
*/
   int index;
 
   /**
* Descriptor set binding for sampler or UBO.
*/
   int descriptor_set;
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Initial GS NIR support for radeonsi

2017-11-09 Thread Timothy Arceri
The support is still WIP but the patches as starting to pile
up so thought I'd see if I could land these before continuing.

Whats missing?

Vega support for gs_vtx_offset handling (see patch 3), I don't
have one yet for testing so didn't attempt to adapt the code.

Lots of piglit tests still fail. The gs clip distance piglit
tests currently cause my system to hang for instance.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/20] ac: add emit_vertex to the abi

2017-11-09 Thread Timothy Arceri
---
 src/amd/common/ac_nir_to_llvm.c  | 11 +-
 src/amd/common/ac_shader_abi.h   |  4 
 src/gallium/drivers/radeonsi/si_shader.c | 35 +++-
 3 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 2ae656693f..36f471dcc7 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -3902,46 +3902,45 @@ static LLVMValueRef visit_interp(struct 
nir_to_llvm_context *ctx,
  
LLVMConstInt(ctx->ac.i32, 2, false),
  llvm_chan, 
attr_number,
  ctx->prim_mask);
}
}
return build_varying_gather_values(>ac, result, 
instr->num_components,
   
instr->variables[0]->var->data.location_frac);
 }
 
 static void
-visit_emit_vertex(struct nir_to_llvm_context *ctx,
- const nir_intrinsic_instr *instr)
+visit_emit_vertex(struct ac_shader_abi *abi, unsigned stream, LLVMValueRef 
*addrs)
 {
LLVMValueRef gs_next_vertex;
LLVMValueRef can_emit;
int idx;
+   struct nir_to_llvm_context *ctx = nir_to_llvm_context_from_abi(abi);
 
-   assert(instr->const_index[0] == 0);
/* Write vertex attribute values to GSVS ring */
gs_next_vertex = LLVMBuildLoad(ctx->builder,
   ctx->gs_next_vertex,
   "");
 
/* If this thread has already emitted the declared maximum number of
 * vertices, kill it: excessive vertex emissions are not supposed to
 * have any effect, and GS threads have no externally observable
 * effects other than emitting vertices.
 */
can_emit = LLVMBuildICmp(ctx->builder, LLVMIntULT, gs_next_vertex,
 LLVMConstInt(ctx->ac.i32, 
ctx->gs_max_out_vertices, false), "");
ac_build_kill_if_false(>ac, can_emit);
 
/* loop num outputs */
idx = 0;
for (unsigned i = 0; i < RADEON_LLVM_MAX_OUTPUTS; ++i) {
-   LLVMValueRef *out_ptr = >nir->outputs[i * 4];
+   LLVMValueRef *out_ptr = [i * 4];
int length = 4;
int slot = idx;
int slot_inc = 1;
 
if (!(ctx->output_mask & (1ull << i)))
continue;
 
if (i == VARYING_SLOT_CLIP_DIST0) {
/* pack clip and cull into a single set of slots */
length = ctx->num_output_clips + ctx->num_output_culls;
@@ -4160,21 +4159,22 @@ static void visit_intrinsic(struct ac_nir_context *ctx,
case nir_intrinsic_var_atomic_exchange:
case nir_intrinsic_var_atomic_comp_swap:
result = visit_var_atomic(ctx->nctx, instr);
break;
case nir_intrinsic_interp_var_at_centroid:
case nir_intrinsic_interp_var_at_sample:
case nir_intrinsic_interp_var_at_offset:
result = visit_interp(ctx->nctx, instr);
break;
case nir_intrinsic_emit_vertex:
-   visit_emit_vertex(ctx->nctx, instr);
+   assert(instr->const_index[0] == 0);
+   ctx->abi->emit_vertex(ctx->abi, 0, ctx->outputs);
break;
case nir_intrinsic_end_primitive:
visit_end_primitive(ctx->nctx, instr);
break;
case nir_intrinsic_load_tess_coord:
result = visit_load_tess_coord(ctx->nctx, instr);
break;
case nir_intrinsic_load_patch_vertices_in:
result = LLVMConstInt(ctx->ac.i32, 
ctx->nctx->options->key.tcs.input_vertices, false);
break;
@@ -6490,20 +6490,21 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
ctx.max_workgroup_size = MAX2(ctx.max_workgroup_size,
  
ac_nir_get_max_workgroup_size(ctx.options->chip_class,

shaders[i]));
}
 
create_function(, shaders[shader_count - 1]->info.stage, 
shader_count >= 2,
shader_count >= 2 ? shaders[shader_count - 
2]->info.stage  : MESA_SHADER_VERTEX);
 
ctx.abi.inputs = [0];
ctx.abi.emit_outputs = handle_shader_outputs_post;
+   ctx.abi.emit_vertex = visit_emit_vertex;
ctx.abi.load_ssbo = radv_load_ssbo;
ctx.abi.load_sampler_desc = radv_get_sampler_desc;
ctx.abi.clamp_shadow_reference = false;
 
if (shader_count >= 2)
ac_init_exec_full_mask();
 
if (ctx.ac.chip_class == GFX9 &&
shaders[shader_count - 1]->info.stage == MESA_SHADER_TESS_CTRL)

Re: [Mesa-dev] [PATCH 1/3] i965: Make a better helper function for UBO/SSBO/ABO surface handling.

2017-11-09 Thread Jason Ekstrand
On Thu, Nov 9, 2017 at 2:42 PM, Kenneth Graunke 
wrote:

> On Thursday, November 9, 2017 12:31:18 PM PST Jason Ekstrand wrote:
> > On Thu, Nov 9, 2017 at 12:45 AM, Kenneth Graunke 
> > wrote:
> >
> > > This fixes the missing AutomaticSize handling in the ABO code, removes
> > > a bunch of duplicated code, and drops an extra layer of wrapping around
> > > brw_emit_buffer_surface_state().
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_context.h  |  10 --
> > >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 113
> > > +++
> > >  src/mesa/drivers/dri/i965/gen6_constant_state.c  |   7 +-
> > >  3 files changed, 36 insertions(+), 94 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> > > b/src/mesa/drivers/dri/i965/brw_context.h
> > > index 8aa0c5ff64c..5d19a6bfc9a 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > > @@ -1395,16 +1395,6 @@ brw_get_index_type(unsigned index_size)
> > >  void brw_prepare_vertices(struct brw_context *brw);
> > >
> > >  /* brw_wm_surface_state.c */
> > > -void brw_create_constant_surface(struct brw_context *brw,
> > > - struct brw_bo *bo,
> > > - uint32_t offset,
> > > - uint32_t size,
> > > - uint32_t *out_offset);
> > > -void brw_create_buffer_surface(struct brw_context *brw,
> > > -   struct brw_bo *bo,
> > > -   uint32_t offset,
> > > -   uint32_t size,
> > > -   uint32_t *out_offset);
> > >  void brw_update_buffer_texture_surface(struct gl_context *ctx,
> > > unsigned unit,
> > > uint32_t *surf_offset);
> > > diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > > b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > > index 27c241a87af..a483ba34151 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > > @@ -672,44 +672,6 @@ brw_update_buffer_texture_surface(struct
> gl_context
> > > *ctx,
> > >   0);
> > >  }
> > >
> > > -/**
> > > - * Create the constant buffer surface.  Vertex/fragment shader
> constants
> > > will be
> > > - * read from this buffer with Data Port Read instructions/messages.
> > > - */
> > > -void
> > > -brw_create_constant_surface(struct brw_context *brw,
> > > -   struct brw_bo *bo,
> > > -   uint32_t offset,
> > > -   uint32_t size,
> > > -   uint32_t *out_offset)
> > > -{
> > > -   brw_emit_buffer_surface_state(brw, out_offset, bo, offset,
> > > - ISL_FORMAT_R32G32B32A32_FLOAT,
> > > - size, 1, 0);
> > > -}
> > > -
> > > -/**
> > > - * Create the buffer surface. Shader buffer variables will be
> > > - * read from / write to this buffer with Data Port Read/Write
> > > - * instructions/messages.
> > > - */
> > > -void
> > > -brw_create_buffer_surface(struct brw_context *brw,
> > > -  struct brw_bo *bo,
> > > -  uint32_t offset,
> > > -  uint32_t size,
> > > -  uint32_t *out_offset)
> > > -{
> > > -   /* Use a raw surface so we can reuse existing untyped
> read/write/atomic
> > > -* messages. We need these specifically for the fragment shader
> since
> > > they
> > > -* include a pixel mask header that we need to ensure correct
> behavior
> > > -* with helper invocations, which cannot write to the buffer.
> > > -*/
> > > -   brw_emit_buffer_surface_state(brw, out_offset, bo, offset,
> > > - ISL_FORMAT_RAW,
> > > - size, 1, RELOC_WRITE);
> > > -}
> > > -
> > >  /**
> > >   * Set up a binding table entry for use by stream output logic
> (transform
> > >   * feedback).
> > > @@ -1271,6 +1233,31 @@ const struct brw_tracked_state
> > > brw_cs_texture_surfaces = {
> > > .emit = brw_update_cs_texture_surfaces,
> > >  };
> > >
> > > +static void
> > > +upload_buffer_surface(struct brw_context *brw,
> > > +  struct gl_buffer_binding *binding,
> > > +  uint32_t *out_offset,
> > > +  enum isl_format format,
> > > +  unsigned reloc_flags)
> > > +{
> > > +   struct gl_context *ctx = >ctx;
> > > +
> > > +   if (binding->BufferObject == ctx->Shared->NullBufferObj) {
> > > +  emit_null_surface_state(brw, NULL, out_offset);
> > > +   } else {
> > > +  ptrdiff_t size = binding->BufferObject->Size - binding->Offset;
> > > +  if 

Re: [Mesa-dev] [PATCH 0/5] Volatile and invariant LDS memory ops

2017-11-09 Thread Connor Abbott
On Thu, Nov 9, 2017 at 7:17 PM, Marek Olšák  wrote:
> On Fri, Nov 10, 2017 at 12:40 AM, Matt Arsenault  wrote:
>>
>>> On Nov 10, 2017, at 07:41, Marek Olšák  wrote:
>>>
>>> Hi,
>>>
>>> This fixes the TCS gl_ClipDistance piglit failure that was uncovered
>>> by a recent LLVM change. The solution is to set volatile on loads
>>> and stores to enforce proper ordering.
>>>
>>> Please review.
>>>
>>
>>
>> Every LDS access certainly should not be volatile. This kills all 
>> optimizations, like formation of ds_read2_b32. What ordering issue are you 
>> having?
>
> It might be caused by inttoptr(NULL) that we do to declare LDS. There
> is simply no ordering enforced, which is weird.

As soon as you do inttoptr(NULL), you've generated a poison value (in
LLVM legalese), so LLVM will assume that you never dereference it and
optimize accordingly. I think a GEP instruction without the inbounds
parameter set will get rid of the poison value, although I'm not sure
about the case where the offset is known to be zero. At least, that's
my reading of the langref text for the GEP instruction
(https://llvm.org/docs/LangRef.html#id215). If zero is a valid address
in LDS, then it sounds like LLVM needs to be fixed to disable this
optimization for certain address spaces. On the other hand, if you're
doing inttoptr(NULL) + offset, where "offset" is the result of a
ptrtoint somewhere, you should be doing inttoptr(offset) instead, and
then LLVM should never misbehave.

>
> This series only sets volatile on HS output loads and stores. Compute
> shaders and other uses don't set volatile.
>
> Marek
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Use a pure vertical stride for large register strides

2017-11-09 Thread Jason Ekstrand
On Thu, Nov 9, 2017 at 2:23 PM, Matt Turner  wrote:

> On Thu, Nov 2, 2017 at 3:54 PM, Jason Ekstrand 
> wrote:
> > Register strides higher than 4 are uncommon but they can happen.  For
> > instance, if you have a 64-bit extract_u8 operation, we turn that into
> > UB -> UQ MOV with a source stride of 8.  Our previous calculation would
> > try to generate a stride of <32;8,8>:ub which is invalid because the
> > maximum horizontal stride is 4.  To solve this problem, we instead use a
> > stride of <8;1,0>.  As noted in the comment, this does not work as a
> > destination but that's ok as very few things actually generate that
> > stride.
>
> Please put the tests you fixed in the commit message. It's not okay to
> leave that out for all the reasons that I'm sure you know.
>

I didn't because the test passes before and after the patch.  I guess I
could have included that information though.


> Looks like this doesn't work on CHV, BXT, GLK :(
>
> KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks now fails on CHV,
> BXT, GLK with:
>
> mov(8)  g21<1>UQg19<8,1,0>UB{ align1
> 1Q };
> ERROR: Source and destination horizontal stride must equal and
> a multiple of a qword when the execution type is 64-bit
> ERROR: Vstride must be Width * Hstride when the execution type is
> 64-bit
>
> Modulo the typo in the first error, I think both of these are correct.
> I don't think we can extract_u8 to a 64-bit type on Atom :(
>

That's unfortunate...  Quickly racking my brain, I don't see a slick way to
implement that opcode.  How would you feel about some late opt_algebraic
lowering?


> This is filed as https://bugs.freedesktop.org/show_bug.cgi?id=103628
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 11/26] gallium/u_threaded: avoid syncs for get_query_result

2017-11-09 Thread Marek Olšák
This commit makes most query piglit tests crash. I've not investigated further.

Marek

On Mon, Nov 6, 2017 at 11:23 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> Queries should still get marked as flushed when flushes are executed
> asynchronously in the driver thread.
>
> To this end, the management of the unflushed_queries list is moved into
> the driver thread.
>
> Reviewed-by: Marek Olšák 
> ---
>  src/gallium/auxiliary/util/u_threaded_context.c | 65 
> ++---
>  1 file changed, 48 insertions(+), 17 deletions(-)
>
> diff --git a/src/gallium/auxiliary/util/u_threaded_context.c 
> b/src/gallium/auxiliary/util/u_threaded_context.c
> index 0bb645e8522..4908ea8a7ba 100644
> --- a/src/gallium/auxiliary/util/u_threaded_context.c
> +++ b/src/gallium/auxiliary/util/u_threaded_context.c
> @@ -321,91 +321,107 @@ tc_create_batch_query(struct pipe_context *_pipe, 
> unsigned num_queries,
>  {
> struct threaded_context *tc = threaded_context(_pipe);
> struct pipe_context *pipe = tc->pipe;
>
> return pipe->create_batch_query(pipe, num_queries, query_types);
>  }
>
>  static void
>  tc_call_destroy_query(struct pipe_context *pipe, union tc_payload *payload)
>  {
> +   struct threaded_query *tq = threaded_query(payload->query);
> +
> +   if (tq->head_unflushed.next)
> +  LIST_DEL(>head_unflushed);
> +
> pipe->destroy_query(pipe, payload->query);
>  }
>
>  static void
>  tc_destroy_query(struct pipe_context *_pipe, struct pipe_query *query)
>  {
> struct threaded_context *tc = threaded_context(_pipe);
> -   struct threaded_query *tq = threaded_query(query);
> -
> -   if (tq->head_unflushed.next)
> -  LIST_DEL(>head_unflushed);
>
> tc_add_small_call(tc, TC_CALL_destroy_query)->query = query;
>  }
>
>  static void
>  tc_call_begin_query(struct pipe_context *pipe, union tc_payload *payload)
>  {
> pipe->begin_query(pipe, payload->query);
>  }
>
>  static boolean
>  tc_begin_query(struct pipe_context *_pipe, struct pipe_query *query)
>  {
> struct threaded_context *tc = threaded_context(_pipe);
> union tc_payload *payload = tc_add_small_call(tc, TC_CALL_begin_query);
>
> payload->query = query;
> return true; /* we don't care about the return value for this call */
>  }
>
> +struct tc_end_query_payload {
> +   struct threaded_context *tc;
> +   struct pipe_query *query;
> +};
> +
>  static void
>  tc_call_end_query(struct pipe_context *pipe, union tc_payload *payload)
>  {
> -   pipe->end_query(pipe, payload->query);
> +   struct tc_end_query_payload *p = (struct tc_end_query_payload *)payload;
> +   struct threaded_query *tq = threaded_query(p->query);
> +
> +   if (!tq->head_unflushed.next)
> +  LIST_ADD(>head_unflushed, >tc->unflushed_queries);
> +
> +   pipe->end_query(pipe, p->query);
>  }
>
>  static bool
>  tc_end_query(struct pipe_context *_pipe, struct pipe_query *query)
>  {
> struct threaded_context *tc = threaded_context(_pipe);
> struct threaded_query *tq = threaded_query(query);
> -   union tc_payload *payload = tc_add_small_call(tc, TC_CALL_end_query);
> +   struct tc_end_query_payload *payload =
> +  tc_add_struct_typed_call(tc, TC_CALL_end_query, tc_end_query_payload);
> +
> +   tc_add_small_call(tc, TC_CALL_end_query);
>
> +   payload->tc = tc;
> payload->query = query;
>
> tq->flushed = false;
> -   if (!tq->head_unflushed.next)
> -  LIST_ADD(>head_unflushed, >unflushed_queries);
>
> return true; /* we don't care about the return value for this call */
>  }
>
>  static boolean
>  tc_get_query_result(struct pipe_context *_pipe,
>  struct pipe_query *query, boolean wait,
>  union pipe_query_result *result)
>  {
> struct threaded_context *tc = threaded_context(_pipe);
> struct threaded_query *tq = threaded_query(query);
> struct pipe_context *pipe = tc->pipe;
>
> if (!tq->flushed)
>tc_sync_msg(tc, wait ? "wait" : "nowait");
>
> bool success = pipe->get_query_result(pipe, query, wait, result);
>
> if (success) {
>tq->flushed = true;
> -  if (tq->head_unflushed.next)
> +  if (tq->head_unflushed.next) {
> + /* This is safe because it can only happen after we sync'd. */
>   LIST_DEL(>head_unflushed);
> +  }
> }
> return success;
>  }
>
>  struct tc_query_result_resource {
> struct pipe_query *query;
> boolean wait;
> enum pipe_query_value_type result_type;
> int index;
> struct pipe_resource *resource;
> @@ -1806,42 +1822,60 @@ tc_create_video_buffer(struct pipe_context *_pipe,
> unreachable("Threaded context should not be enabled for video APIs");
> return NULL;
>  }
>
>
>  /
>   * draw, launch, clear, blit, copy, flush
>   */
>
>  struct tc_flush_payload {
> +   struct threaded_context *tc;
> struct pipe_fence_handle 

Re: [Mesa-dev] [PATCH] anv: don't crash when creating a huge image

2017-11-09 Thread Jason Ekstrand
On Thu, Nov 9, 2017 at 4:23 PM, Chad Versace 
wrote:

> On Wed 08 Nov 2017, Jason Ekstrand wrote:
> > On Wed, Nov 8, 2017 at 1:34 AM, Samuel Iglesias Gonsálvez <[1]
> > sigles...@igalia.com> wrote:
> >
> > The HW has some limits but, according to the spec, we can create
> > the image as it has not yet any memory backing it. When we allocate
> > that memory, then we fail following what Vulkan spec, "10.2. Device
> > Memory" says when talking about vkAllocateMemory():
> >
> > "Some platforms may have a limit on the maximum size of a single
> >  allocation. For example, certain systems may fail to create
> >  allocations with a size greater than or equal to 4GB. Such a limit
> is
> >  implementation-dependent, and if such a failure occurs then the
> error
> >  VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned."
> >
> > Fixes the crashes on BDW for the following tests:
> >
> > dEQP-VK.pipeline.render_to_image.core.2d_array.huge.*
> > dEQP-VK.pipeline.render_to_image.core.cube_array.huge.*
> >
> > Signed-off-by: Samuel Iglesias Gonsálvez <[2]sigles...@igalia.com>
> > ---
> >
> > Jason, I was tempted to move this piece of code to
> anv_AllocateMemory()
> > but then I found the kernel relocation limitation of 32-bit... Is
> that
> > limitation still applicable? Or was it from the BDW age and we forgot
> > to update that limitation for gen9+?
> >
> >
> > We're still using relocations on all hardware so it applies to everything
> > today.  One of my 2018 projects is to fix that and get rid of
> relocations on
> > gen8+.
> >
> >
> > Sam
> >
> >  src/intel/isl/isl.c | 22 --
> >  1 file changed, 22 deletions(-)
> >
> > diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> > index 59f512fc050..aaadcbaf991 100644
> > --- a/src/intel/isl/isl.c
> > +++ b/src/intel/isl/isl.c
> > @@ -1472,28 +1472,6 @@ isl_surf_init_s(const struct isl_device *dev,
> >base_alignment = MAX(info->min_alignment, tile_size);
> > }
> >
> > -   if (ISL_DEV_GEN(dev) < 9) {
> > -  /* From the Broadwell PRM Vol 5, Surface Layout:
> > -   *
> > -   *"In addition to restrictions on maximum height, width,
> and
> > depth,
> > -   * surfaces are also restricted to a maximum size in
> bytes. This
> > -   * maximum is 2 GB for all products and all surface
> types."
> > -   *
> > -   * This comment is applicable to all Pre-gen9 platforms.
> > -   */
> > -  if (size > (uint64_t) 1 << 31)
> > - return false;
> > -   } else {
> > -  /* From the Skylake PRM Vol 5, Maximum Surface Size in Bytes:
> > -   *"In addition to restrictions on maximum height, width,
> and
> > depth,
> > -   * surfaces are also restricted to a maximum size of 2^38
> bytes.
> > -   * All pixels within the surface must be contained within
> 2^38
> > bytes
> > -   * of the base address."
> > -   */
> > -  if (size > (uint64_t) 1 << 38)
> > - return false;
> > -   }
>
> I think it very unwise to delete code that enforces requirements defined
> by the hardware spec. Deleting the code doesn't make the hardware
> requirements go away :)
>
> > I'm not sure how I feel about removing this from ISL.  There are really
> two
> > limitations going on here.  One is a limitation imposed by relocations,
> and the
> > other is some sort of fundamental hardware surface size limitation.  Most
> > likely, the surface size limitation has to do with how many bits they
> use for
> > image address computations in the hardware.  Most likely, on gen8, they
> do all
> > of the internal calculations in 32 bits and only convert to 48 at the
> end when
> > they need to add it to Surface Base Address.
> >
> > If my understanding is correct then we will still have this limitation
> on gen8
> > even after we get rid of relocations and remove the BO size limitation.
> I see
> > a couple of options, neither of which I like very much:
> >
> >  1) Take something like this patch and then keep the BO size limitation
> on BDW
> > to 2GiB when we get rid of relocations even though it's artificial.
> >  2) "Gracefully" handle isl_surf_init failure by doing a debug_log and
> > succeeding but setting the image size (that will be returned by
> > vkGetImageMemoryRequirements) to UINT64_MAX so that the client won't
> ever be
> > able to find memory for it.
> >
> > My feeling is that 1) above is probably the better of the two as 2)
> seems to be
> > a twisting of the spec.  That said, I would like to keep the restriction
> in ISL
> > somehow and we need to make sure it still gets applied in GL.
>
> I dislike both. I originally designed isl to mimic the VkImage API, so
> let's continue that trend.
>
>   Option 3) Change isl_surf_init() to return a meaningful result 

Re: [Mesa-dev] [PATCH 1/4] i965/miptree: Loosen the format check in miptree_match_image

2017-11-09 Thread Chad Versace
On Wed 08 Nov 2017, Jason Ekstrand wrote:
> This function is used to determine when we need to re-allocate a
> miptree.  Since we do nothing different in miptree allocation for
> sRGB vs. linear, loosening this should be safe and may lead to less
> copying and reallocating in some odd cases.
> 
> Cc: "17.3" 
> Cc: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c  | 6 --
>  src/mesa/drivers/dri/i965/intel_tex.c  | 2 +-
>  src/mesa/drivers/dri/i965/intel_tex_obj.h  | 4 ++--
>  src/mesa/drivers/dri/i965/intel_tex_validate.c | 2 +-
>  4 files changed, 8 insertions(+), 6 deletions(-)

This is on my review queue for this week. I'm still dealing with a sick
family, though, and recovering myself. Maybe tomorrow I'll get to this.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: don't crash when creating a huge image

2017-11-09 Thread Chad Versace
On Wed 08 Nov 2017, Jason Ekstrand wrote:
> On Wed, Nov 8, 2017 at 1:34 AM, Samuel Iglesias Gonsálvez <[1]
> sigles...@igalia.com> wrote:
> 
> The HW has some limits but, according to the spec, we can create
> the image as it has not yet any memory backing it. When we allocate
> that memory, then we fail following what Vulkan spec, "10.2. Device
> Memory" says when talking about vkAllocateMemory():
> 
> "Some platforms may have a limit on the maximum size of a single
>  allocation. For example, certain systems may fail to create
>  allocations with a size greater than or equal to 4GB. Such a limit is
>  implementation-dependent, and if such a failure occurs then the error
>  VK_ERROR_OUT_OF_DEVICE_MEMORY must be returned."
> 
> Fixes the crashes on BDW for the following tests:
> 
> dEQP-VK.pipeline.render_to_image.core.2d_array.huge.*
> dEQP-VK.pipeline.render_to_image.core.cube_array.huge.*
> 
> Signed-off-by: Samuel Iglesias Gonsálvez <[2]sigles...@igalia.com>
> ---
> 
> Jason, I was tempted to move this piece of code to anv_AllocateMemory()
> but then I found the kernel relocation limitation of 32-bit... Is that
> limitation still applicable? Or was it from the BDW age and we forgot
> to update that limitation for gen9+?
> 
> 
> We're still using relocations on all hardware so it applies to everything
> today.  One of my 2018 projects is to fix that and get rid of relocations on
> gen8+.
>  
> 
> Sam
> 
>  src/intel/isl/isl.c | 22 --
>  1 file changed, 22 deletions(-)
> 
> diff --git a/src/intel/isl/isl.c b/src/intel/isl/isl.c
> index 59f512fc050..aaadcbaf991 100644
> --- a/src/intel/isl/isl.c
> +++ b/src/intel/isl/isl.c
> @@ -1472,28 +1472,6 @@ isl_surf_init_s(const struct isl_device *dev,
>        base_alignment = MAX(info->min_alignment, tile_size);
>     }
> 
> -   if (ISL_DEV_GEN(dev) < 9) {
> -      /* From the Broadwell PRM Vol 5, Surface Layout:
> -       *
> -       *    "In addition to restrictions on maximum height, width, and
> depth,
> -       *     surfaces are also restricted to a maximum size in bytes. 
> This
> -       *     maximum is 2 GB for all products and all surface types."
> -       *
> -       * This comment is applicable to all Pre-gen9 platforms.
> -       */
> -      if (size > (uint64_t) 1 << 31)
> -         return false;
> -   } else {
> -      /* From the Skylake PRM Vol 5, Maximum Surface Size in Bytes:
> -       *    "In addition to restrictions on maximum height, width, and
> depth,
> -       *     surfaces are also restricted to a maximum size of 2^38 
> bytes.
> -       *     All pixels within the surface must be contained within 2^38
> bytes
> -       *     of the base address."
> -       */
> -      if (size > (uint64_t) 1 << 38)
> -         return false;
> -   }

I think it very unwise to delete code that enforces requirements defined
by the hardware spec. Deleting the code doesn't make the hardware
requirements go away :)

> I'm not sure how I feel about removing this from ISL.  There are really two
> limitations going on here.  One is a limitation imposed by relocations, and 
> the
> other is some sort of fundamental hardware surface size limitation.  Most
> likely, the surface size limitation has to do with how many bits they use for
> image address computations in the hardware.  Most likely, on gen8, they do all
> of the internal calculations in 32 bits and only convert to 48 at the end when
> they need to add it to Surface Base Address.
> 
> If my understanding is correct then we will still have this limitation on gen8
> even after we get rid of relocations and remove the BO size limitation.  I see
> a couple of options, neither of which I like very much:
> 
>  1) Take something like this patch and then keep the BO size limitation on BDW
> to 2GiB when we get rid of relocations even though it's artificial.
>  2) "Gracefully" handle isl_surf_init failure by doing a debug_log and
> succeeding but setting the image size (that will be returned by
> vkGetImageMemoryRequirements) to UINT64_MAX so that the client won't ever be
> able to find memory for it.
> 
> My feeling is that 1) above is probably the better of the two as 2) seems to 
> be
> a twisting of the spec.  That said, I would like to keep the restriction in 
> ISL
> somehow and we need to make sure it still gets applied in GL.

I dislike both. I originally designed isl to mimic the VkImage API, so
let's continue that trend.

  Option 3) Change isl_surf_init() to return a meaningful result code:

ISL_SUCCESS = 0
ISL_ERROR_SOMETHING_SOMETHING_THE_USUAL_FAILURES = -1
ISL_ERROR_SURFACE_SIZE_TOO_LARGE = -2

I like option 3 because it avoids secret implicit contracts between isl
and anvil, and thus avoids hidden hacks.

Re: [Mesa-dev] [PATCH 0/5] Volatile and invariant LDS memory ops

2017-11-09 Thread Marek Olšák
On Fri, Nov 10, 2017 at 12:40 AM, Matt Arsenault  wrote:
>
>> On Nov 10, 2017, at 07:41, Marek Olšák  wrote:
>>
>> Hi,
>>
>> This fixes the TCS gl_ClipDistance piglit failure that was uncovered
>> by a recent LLVM change. The solution is to set volatile on loads
>> and stores to enforce proper ordering.
>>
>> Please review.
>>
>
>
> Every LDS access certainly should not be volatile. This kills all 
> optimizations, like formation of ds_read2_b32. What ordering issue are you 
> having?

It might be caused by inttoptr(NULL) that we do to declare LDS. There
is simply no ordering enforced, which is weird.

This series only sets volatile on HS output loads and stores. Compute
shaders and other uses don't set volatile.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] radv: Fix architecture in radeon_icd.{arch}.json

2017-11-09 Thread Dylan Baker
for the series:
Reviewed-by: Dylan Baker 

Quoting Chad Versace (2017-11-09 15:45:00)
> Use the host arch, not the target arch. In Meson and in recent
> Autotools, the host arch is where the binary will be used. The target
> arch is useful only when compiling a compiler.
> 
> See: http://mesonbuild.com/Cross-compilation.html
> See: 
> https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html
> Reported-by: Eric Engestrom 
> Cc: Dylan Baker 
> ---
>  src/amd/vulkan/meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/amd/vulkan/meson.build b/src/amd/vulkan/meson.build
> index 305a2f66f58..93997350a25 100644
> --- a/src/amd/vulkan/meson.build
> +++ b/src/amd/vulkan/meson.build
> @@ -130,7 +130,7 @@ radv_data.set('libvulkan_radeon_path', 
> libvulkan_radeon.full_path())
>  configure_file(
>configuration : radv_data,
>input : 'radeon_icd.json.in',
> -  output : 'radeon_icd.@0@.json'.format(target_machine.cpu()),
> +  output : 'radeon_icd.@0@.json'.format(host_machine.cpu()),
>install_dir : with_vulkan_icd_dir,
>  )
>  configure_file(
> -- 
> 2.13.0
> 


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH v1 03/30] anv: Refactor get_buffer_format_properties()

2017-11-09 Thread Chad Versace
On Tue 07 Nov 2017, Lionel Landwerlin wrote:
> On 07/11/17 14:47, Chad Versace wrote:
> > Make it a stand-alone function. Pre-patch, for some formats the function
> > returned incorrect VkFormatFeatureFlags which were cleaned up by the
> > caller.
> > 
> > This prepares for a cleaner implementation of
> > VK_EXT_image_drm_format_modifier.
> > 
> > Reviewed-by: Jason Ekstrand 
> > ---
> >   src/intel/vulkan/anv_formats.c | 44 
> > --
> >   1 file changed, 29 insertions(+), 15 deletions(-)
> > 
> > diff --git a/src/intel/vulkan/anv_formats.c b/src/intel/vulkan/anv_formats.c
> > index b8c9cacb422..ebc6a8351c6 100644
> > --- a/src/intel/vulkan/anv_formats.c
> > +++ b/src/intel/vulkan/anv_formats.c
> > @@ -514,23 +514,39 @@ get_image_format_properties(const struct 
> > gen_device_info *devinfo,
> >   static VkFormatFeatureFlags
> >   get_buffer_format_properties(const struct gen_device_info *devinfo,
> > - enum isl_format format)
> > + VkFormat vk_format,
> > + const struct anv_format *anv_format)
> >   {



> > +   if (anv_format->can_ycbcr)
> > +  return 0;
> 
> There a couple of ycbcr formats with a single plane.
> Is there a line in the spec that says ycbcr formats cannot be access through
> buffer views? (I couldn't find it)
> Might be worth leaving a comment.

I was also unable to find restriction in the spec. You know the ycbcr
spec much better than me, so I would feel safer if you added the
comment.

Just to be clear, the rejection of ycbcr in this patch...

if (anv_format->can_ycbcr)
return 0

... preserves the pre-patch behavior, which rejected ycbcr like this:

if (format->can_ycbcr) {
   ...
   buffer = 0;
}

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] radv: Fix architecture in radeon_icd.{arch}.json

2017-11-09 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On 10 Nov 2017 00:45, "Chad Versace"  wrote:

> Use the host arch, not the target arch. In Meson and in recent
> Autotools, the host arch is where the binary will be used. The target
> arch is useful only when compiling a compiler.
>
> See: http://mesonbuild.com/Cross-compilation.html
> See: https://www.gnu.org/software/automake/manual/html_node/
> Cross_002dCompilation.html
> Reported-by: Eric Engestrom 
> Cc: Dylan Baker 
> ---
>  src/amd/vulkan/meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/amd/vulkan/meson.build b/src/amd/vulkan/meson.build
> index 305a2f66f58..93997350a25 100644
> --- a/src/amd/vulkan/meson.build
> +++ b/src/amd/vulkan/meson.build
> @@ -130,7 +130,7 @@ radv_data.set('libvulkan_radeon_path',
> libvulkan_radeon.full_path())
>  configure_file(
>configuration : radv_data,
>input : 'radeon_icd.json.in',
> -  output : 'radeon_icd.@0@.json'.format(target_machine.cpu()),
> +  output : 'radeon_icd.@0@.json'.format(host_machine.cpu()),
>install_dir : with_vulkan_icd_dir,
>  )
>  configure_file(
> --
> 2.13.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH v1 05/30] anv: Fix get_image_format_properties() - depthstencil (v2)

2017-11-09 Thread Chad Versace
On Tue 07 Nov 2017, Jason Ekstrand wrote:
> I think I'd prefer we not make "Fix" the first word in the title unless it
> fixes an actual bug.  How about "Refactor"?  Same for the ASTC patch.

Sure, I'll s/Fix/Refactor/ in those patches.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Pretend there are 4 subslices for compute shader threads on Gen9+.

2017-11-09 Thread Rafael Antognolli
On Thu, Nov 09, 2017 at 01:50:29PM -0800, Kenneth Graunke wrote:
> On Thursday, November 9, 2017 11:22:34 AM PST Rafael Antognolli wrote:
> > On Thu, Nov 09, 2017 at 12:59:12AM -0800, Jordan Justen wrote:
> > > Reviewed-by: Jordan Justen 
> > 
> > It's also
> > 
> > Tested-by: Rafael Antognolli 
> 
> Sorry, I forgot to sync email before pushing this patch, so I missed
> adding your Tested-by tag.  :(  Next time...

Heh, no worries, as long as the patch is in ;)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] anv: Fix architecture in intel_icd.{arch}.json

2017-11-09 Thread Chad Versace
Use the host arch, not the target arch. In Meson and in recent
Autotools, the host arch is where the binary will be used. The target
arch is useful only when compiling a compiler.

See: http://mesonbuild.com/Cross-compilation.html
See: 
https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html
Reported-by: Eric Engestrom 
Cc: Dylan Baker 
---
 src/intel/vulkan/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/meson.build b/src/intel/vulkan/meson.build
index ff24e304ef5..debdcce4ef2 100644
--- a/src/intel/vulkan/meson.build
+++ b/src/intel/vulkan/meson.build
@@ -38,7 +38,7 @@ anv_extensions_c = custom_target(
 intel_icd = custom_target(
   'intel_icd',
   input : 'anv_icd.py',
-  output : 'intel_icd.@0@.json'.format(target_machine.cpu()),
+  output : 'intel_icd.@0@.json'.format(host_machine.cpu()),
   command : [prog_python2, '@INPUT@',
  '--lib-path', join_paths(get_option('prefix'), 
get_option('libdir')),
  '--out', '@OUTPUT@'],
-- 
2.13.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] anv/meson: Generate dev_icd.json

2017-11-09 Thread Chad Versace
I tested this in a setup where the builddir was outside of the srcdir.

Reviewed-by: Eric Engestrom 
Acked-by: Dylan Baker 
---
 src/intel/vulkan/meson.build | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/intel/vulkan/meson.build b/src/intel/vulkan/meson.build
index debdcce4ef2..606a4898fe2 100644
--- a/src/intel/vulkan/meson.build
+++ b/src/intel/vulkan/meson.build
@@ -48,6 +48,18 @@ intel_icd = custom_target(
   install : true,
 )
 
+dev_icd = custom_target(
+  'dev_icd',
+  input : 'anv_icd.py',
+  output : 'dev_icd.@0@.json'.format(host_machine.cpu()),
+  command : [prog_python2, '@INPUT@',
+ '--lib-path', meson.current_build_dir(),
+ '--out', '@OUTPUT@'],
+  depend_files : files('anv_extensions.py'),
+  build_by_default : true,
+  install : false,
+)
+
 # TODO: workaround for anv_entrypoints combining the .h and .c files in it's
 # output. See issue #2346
 block_entrypoints = custom_target(
-- 
2.13.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] radv: Fix architecture in radeon_icd.{arch}.json

2017-11-09 Thread Chad Versace
Use the host arch, not the target arch. In Meson and in recent
Autotools, the host arch is where the binary will be used. The target
arch is useful only when compiling a compiler.

See: http://mesonbuild.com/Cross-compilation.html
See: 
https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html
Reported-by: Eric Engestrom 
Cc: Dylan Baker 
---
 src/amd/vulkan/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/vulkan/meson.build b/src/amd/vulkan/meson.build
index 305a2f66f58..93997350a25 100644
--- a/src/amd/vulkan/meson.build
+++ b/src/amd/vulkan/meson.build
@@ -130,7 +130,7 @@ radv_data.set('libvulkan_radeon_path', 
libvulkan_radeon.full_path())
 configure_file(
   configuration : radv_data,
   input : 'radeon_icd.json.in',
-  output : 'radeon_icd.@0@.json'.format(target_machine.cpu()),
+  output : 'radeon_icd.@0@.json'.format(host_machine.cpu()),
   install_dir : with_vulkan_icd_dir,
 )
 configure_file(
-- 
2.13.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] Volatile and invariant LDS memory ops

2017-11-09 Thread Matt Arsenault

> On Nov 10, 2017, at 07:41, Marek Olšák  wrote:
> 
> Hi,
> 
> This fixes the TCS gl_ClipDistance piglit failure that was uncovered
> by a recent LLVM change. The solution is to set volatile on loads
> and stores to enforce proper ordering.
> 
> Please review.
> 


Every LDS access certainly should not be volatile. This kills all 
optimizations, like formation of ds_read2_b32. What ordering issue are you 
having?

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] meson: build gallium-xlib based glx

2017-11-09 Thread Dylan Baker
Quoting Eric Anholt (2017-11-08 13:26:12)
> We shouldn't have to manually specify most of these deps, I think, since
> they should be transitively pulled in by the static libraries using
> them, right?  It's fine either way, though.
> 
> > +  install : true,
> > +  version : '1.5.0',
> 
> Looks like this drops the MESA_MAJOR/MINOR/TINY version handling of the
> automake version.  Other than this, and needing the build fix in patch
> 3, the series is:
> 
> Reviewed-by: Eric Anholt 

Matt and I did some git archeology and it appears that MESA_{MAJOR,MINOR,TINY}
was removed in 2008 and this was never fixed.

commit: 80f68e1b6a0e5bd2da799c659c963e213dbf9e66

I think we should just remove those variables.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] threads: fix MinGW build breakage

2017-11-09 Thread Roland Scheidegger
FWIW it looks like this series also broke compilation on mac os (I
suppose that was f0d3a4de75fdb865c058aba8614f0fe6ba5f0969 though):

[...truncated 173 lines...]
   pthread_barrier_destroy(barrier);
   ^~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/pthread.h:220:42:
note: passing argument to parameter here
int pthread_attr_destroy(pthread_attr_t *);
 ^
In file included from src/gallium/auxiliary/gallivm/lp_bld_misc.cpp:93:
In file included from src/gallium/auxiliary/os/os_thread.h:42:
src/util/u_thread.h:123:4: error: use of undeclared identifier
'pthread_barrier_wait'; did you mean 'util_barrier_wait'?
   pthread_barrier_wait(barrier);
   ^~~~
   util_barrier_wait
src/util/u_thread.h:121:20: note: 'util_barrier_wait' declared here
static inline void util_barrier_wait(util_barrier *barrier)
   ^
5 errors generated.

Roland

Am 09.11.2017 um 23:37 schrieb Brian Paul:
> On 11/09/2017 02:41 PM, Nicolai Hähnle wrote:
>> Sorry for the mess.
> 
> Not a huge deal.  FWIW, you can test the MinGW cross-compile pretty easily:
> 
> 1. apt-get install g++-mingw-w64-x86-64 (or equivalent)
> 2. cd mesa ; scons platform=windows
> 
> -Brian
> 
>>
>> Reviewed-by: Nicolai Hähnle 
>>
>> On 09.11.2017 17:46, Brian Paul wrote:
>>> Fixes: f1a364878431c8 ("threads: update for late C11 changes")
>>> ---
>>>   include/c11/threads_win32.h | 5 -
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/c11/threads_win32.h b/include/c11/threads_win32.h
>>> index 77d923a..dac8ef7 100644
>>> --- a/include/c11/threads_win32.h
>>> +++ b/include/c11/threads_win32.h
>>> @@ -78,6 +78,9 @@ Configuration macro:
>>>   /* Visual Studio 2015 and later */
>>>   #if _MSC_VER >= 1900
>>>   #define HAVE_TIMESPEC
>>> +#define HAVE_TIMESPEC_GET
>>> +#elif defined(__MINGW32__)
>>> +#define HAVE_TIMESPEC
>>>   #endif
>>>   #ifndef HAVE_TIMESPEC
>>> @@ -645,7 +648,7 @@ tss_set(tss_t key, void *val)
>>>   /* 7.25.7 Time functions */
>>>   // 7.25.6.1
>>> -#ifndef HAVE_TIMESPEC
>>> +#ifndef HAVE_TIMESPEC_GET
>>>   static inline int
>>>   timespec_get(struct timespec *ts, int base)
>>>   {
>>>
>>
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0=m3uFZN2gUOf6Z-8JON9FReiHkx7t76arYWaon9_g7VQ=6ODsK1Y5jckR6EXRqz6AJFH2Uwl63iZSwrW6SBclRZo=
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv/meson: Generate dev_icd.json

2017-11-09 Thread Chad Versace
On Thu 09 Nov 2017, Eric Engestrom wrote:
> On Wednesday, 2017-11-08 13:40:13 -0800, Chad Versace wrote:
> > On Tue 07 Nov 2017, Dylan Baker wrote:
> > > Quoting Eric Engestrom (2017-11-07 07:25:53)
> > > > On Wednesday, 2017-11-01 13:49:03 -0700, Chad Versace wrote:
> > > > > I tested this in a setup where the builddir was outside of the srcdir.
> > > > > ---
> > > > >  src/intel/vulkan/meson.build | 12 
> > > > >  1 file changed, 12 insertions(+)
> > > > > 
> > > > > diff --git a/src/intel/vulkan/meson.build 
> > > > > b/src/intel/vulkan/meson.build
> > > > > index ff24e304ef5..e8b7f407507 100644
> > > > > --- a/src/intel/vulkan/meson.build
> > > > > +++ b/src/intel/vulkan/meson.build
> > > > > @@ -48,6 +48,18 @@ intel_icd = custom_target(
> > > > >install : true,
> > > > >  )
> > > > >  
> > > > > +dev_icd = custom_target(
> > > > > +  'dev_icd',
> > > > > +  input : 'anv_icd.py',
> > > > > +  output : 'dev_icd.@0@.json'.format(target_machine.cpu()),
> > > > 
> > > > Strictly speaking, shouldn't that be `host_machine` [1] ?
> > > > I don't see how one would do a canadian build of mesa though, so
> > > > host == target should always be true.
> > > 
> > > That's my fault. There are (or were) a number of cases where I used target
> > > instead of host, that can also be a follow up.
> > > 
> > > In any case:
> > > Acked-by: Dylan Baker 
> > 
> > I build Mesa (with autotools) where host == x86_64 but target == armv7a.
> > 
> > The icd filename should have the same architecture as the driver it
> > loads, and that's the target_machine. You never need to access the
> > dev_icd.*.json on the host machine (that is, unless your target machine
> > and host machine are the same machine).
> 
> I might be misunderstanding, but I think there's some confusion here.
> From the meson doc I linked earlier [1]:
> 
> > - `build_machine` is the computer that is doing the actual compiling
> > - `host_machine` is the machine on which the compiled binary will run
> > - `target_machine` is the machine on which the compiled binary's
> >   output will run (this is only meaningful for programs such as
> >   compilers that, when run, produce object code for a different CPU
> >   than what the program is being run on)
> 
> I think autotools[host] == meson[build_machine] and
> autotools[target] == meson[host_machine].
> 
> If I understand correctly, you have a build_machine=x86_64 compiling
> code for a host_machine=armv7a, is that correct?
> 
> [1] http://mesonbuild.com/Cross-compilation.html

Ah, thanks for clarifying the nomenclature. I was unaware that Meson and
Autotools differed in the terms used here.

I think we all agree now. It should be named
"dev_icd.{host_machine.cpu()}.json"
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] i965: Make a better helper function for UBO/SSBO/ABO surface handling.

2017-11-09 Thread Kenneth Graunke
On Thursday, November 9, 2017 12:31:18 PM PST Jason Ekstrand wrote:
> On Thu, Nov 9, 2017 at 12:45 AM, Kenneth Graunke 
> wrote:
> 
> > This fixes the missing AutomaticSize handling in the ABO code, removes
> > a bunch of duplicated code, and drops an extra layer of wrapping around
> > brw_emit_buffer_surface_state().
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.h  |  10 --
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 113
> > +++
> >  src/mesa/drivers/dri/i965/gen6_constant_state.c  |   7 +-
> >  3 files changed, 36 insertions(+), 94 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> > b/src/mesa/drivers/dri/i965/brw_context.h
> > index 8aa0c5ff64c..5d19a6bfc9a 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.h
> > +++ b/src/mesa/drivers/dri/i965/brw_context.h
> > @@ -1395,16 +1395,6 @@ brw_get_index_type(unsigned index_size)
> >  void brw_prepare_vertices(struct brw_context *brw);
> >
> >  /* brw_wm_surface_state.c */
> > -void brw_create_constant_surface(struct brw_context *brw,
> > - struct brw_bo *bo,
> > - uint32_t offset,
> > - uint32_t size,
> > - uint32_t *out_offset);
> > -void brw_create_buffer_surface(struct brw_context *brw,
> > -   struct brw_bo *bo,
> > -   uint32_t offset,
> > -   uint32_t size,
> > -   uint32_t *out_offset);
> >  void brw_update_buffer_texture_surface(struct gl_context *ctx,
> > unsigned unit,
> > uint32_t *surf_offset);
> > diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > index 27c241a87af..a483ba34151 100644
> > --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > @@ -672,44 +672,6 @@ brw_update_buffer_texture_surface(struct gl_context
> > *ctx,
> >   0);
> >  }
> >
> > -/**
> > - * Create the constant buffer surface.  Vertex/fragment shader constants
> > will be
> > - * read from this buffer with Data Port Read instructions/messages.
> > - */
> > -void
> > -brw_create_constant_surface(struct brw_context *brw,
> > -   struct brw_bo *bo,
> > -   uint32_t offset,
> > -   uint32_t size,
> > -   uint32_t *out_offset)
> > -{
> > -   brw_emit_buffer_surface_state(brw, out_offset, bo, offset,
> > - ISL_FORMAT_R32G32B32A32_FLOAT,
> > - size, 1, 0);
> > -}
> > -
> > -/**
> > - * Create the buffer surface. Shader buffer variables will be
> > - * read from / write to this buffer with Data Port Read/Write
> > - * instructions/messages.
> > - */
> > -void
> > -brw_create_buffer_surface(struct brw_context *brw,
> > -  struct brw_bo *bo,
> > -  uint32_t offset,
> > -  uint32_t size,
> > -  uint32_t *out_offset)
> > -{
> > -   /* Use a raw surface so we can reuse existing untyped read/write/atomic
> > -* messages. We need these specifically for the fragment shader since
> > they
> > -* include a pixel mask header that we need to ensure correct behavior
> > -* with helper invocations, which cannot write to the buffer.
> > -*/
> > -   brw_emit_buffer_surface_state(brw, out_offset, bo, offset,
> > - ISL_FORMAT_RAW,
> > - size, 1, RELOC_WRITE);
> > -}
> > -
> >  /**
> >   * Set up a binding table entry for use by stream output logic (transform
> >   * feedback).
> > @@ -1271,6 +1233,31 @@ const struct brw_tracked_state
> > brw_cs_texture_surfaces = {
> > .emit = brw_update_cs_texture_surfaces,
> >  };
> >
> > +static void
> > +upload_buffer_surface(struct brw_context *brw,
> > +  struct gl_buffer_binding *binding,
> > +  uint32_t *out_offset,
> > +  enum isl_format format,
> > +  unsigned reloc_flags)
> > +{
> > +   struct gl_context *ctx = >ctx;
> > +
> > +   if (binding->BufferObject == ctx->Shared->NullBufferObj) {
> > +  emit_null_surface_state(brw, NULL, out_offset);
> > +   } else {
> > +  ptrdiff_t size = binding->BufferObject->Size - binding->Offset;
> > +  if (!binding->AutomaticSize)
> > + size = MIN2(size, binding->Size);
> > +
> > +  struct intel_buffer_object *iobj =
> > + intel_buffer_object(binding->BufferObject);
> > +  struct brw_bo *bo =
> > + intel_bufferobj_buffer(brw, iobj, binding->Offset, size, false);
> >
> 
> You're using this for 

[Mesa-dev] [PATCH 2/5] radeonsi: do 64-bit LDS loads recursively

2017-11-09 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_shader.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index ff4ea95..ec4cf89 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1084,31 +1084,33 @@ static LLVMValueRef lds_load(struct 
lp_build_tgsi_context *bld_base,
if (swizzle == ~0) {
LLVMValueRef values[TGSI_NUM_CHANNELS];
 
for (unsigned chan = 0; chan < TGSI_NUM_CHANNELS; chan++)
values[chan] = lds_load(bld_base, type, chan, dw_addr);
 
return lp_build_gather_values(>gallivm, values,
  TGSI_NUM_CHANNELS);
}
 
+   /* Split 64-bit loads. */
+   if (tgsi_type_is_64bit(type)) {
+   LLVMValueRef lo, hi;
+
+   lo = lds_load(bld_base, TGSI_TYPE_UNSIGNED, swizzle, dw_addr);
+   hi = lds_load(bld_base, TGSI_TYPE_UNSIGNED, swizzle + 1, 
dw_addr);
+   return si_llvm_emit_fetch_64bit(bld_base, type, lo, hi);
+   }
+
dw_addr = lp_build_add(_base->uint_bld, dw_addr,
LLVMConstInt(ctx->i32, swizzle, 0));
 
value = ac_lds_load_volatile(>ac, dw_addr);
-   if (tgsi_type_is_64bit(type)) {
-   LLVMValueRef value2;
-   dw_addr = lp_build_add(_base->uint_bld, dw_addr,
-  ctx->i32_1);
-   value2 = ac_lds_load_volatile(>ac, dw_addr);
-   return si_llvm_emit_fetch_64bit(bld_base, type, value, value2);
-   }
 
return bitcast(bld_base, type, value);
 }
 
 /**
  * Store to LDS.
  *
  * \param swizzle  offset (typically 0..3)
  * \param dw_addr  address in dwords
  * \param valuevalue to store
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] radeonsi: use ac.lds for shared memory

2017-11-09 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_shader.c  | 4 ++--
 src/gallium/drivers/radeonsi/si_shader_internal.h | 2 --
 src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c | 2 +-
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 05c95a6..a44dd6c 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1913,29 +1913,29 @@ void si_load_system_value(struct si_shader_context *ctx,
 void si_declare_compute_memory(struct si_shader_context *ctx,
   const struct tgsi_full_declaration *decl)
 {
struct si_shader_selector *sel = ctx->shader->selector;
 
LLVMTypeRef i8p = LLVMPointerType(ctx->i8, LOCAL_ADDR_SPACE);
LLVMValueRef var;
 
assert(decl->Declaration.MemType == TGSI_MEMORY_TYPE_SHARED);
assert(decl->Range.First == decl->Range.Last);
-   assert(!ctx->shared_memory);
+   assert(!ctx->ac.lds);
 
var = LLVMAddGlobalInAddressSpace(ctx->ac.module,
  LLVMArrayType(ctx->i8, 
sel->local_size),
  "compute_lds",
  LOCAL_ADDR_SPACE);
LLVMSetAlignment(var, 4);
 
-   ctx->shared_memory = LLVMBuildBitCast(ctx->ac.builder, var, i8p, "");
+   ctx->ac.lds = LLVMBuildBitCast(ctx->ac.builder, var, i8p, "");
 }
 
 static LLVMValueRef load_const_buffer_desc(struct si_shader_context *ctx, int 
i)
 {
LLVMValueRef list_ptr = LLVMGetParam(ctx->main_fn,
 
ctx->param_const_and_shader_buffers);
 
return ac_build_load_to_sgpr(>ac, list_ptr,
 LLVMConstInt(ctx->i32, 
si_get_constbuf_slot(i), 0));
 }
diff --git a/src/gallium/drivers/radeonsi/si_shader_internal.h 
b/src/gallium/drivers/radeonsi/si_shader_internal.h
index b249bf9..1e6bceb 100644
--- a/src/gallium/drivers/radeonsi/si_shader_internal.h
+++ b/src/gallium/drivers/radeonsi/si_shader_internal.h
@@ -221,22 +221,20 @@ struct si_shader_context {
LLVMTypeRef i64;
LLVMTypeRef i128;
LLVMTypeRef f32;
LLVMTypeRef v2i32;
LLVMTypeRef v4i32;
LLVMTypeRef v4f32;
LLVMTypeRef v8i32;
 
LLVMValueRef i32_0;
LLVMValueRef i32_1;
-
-   LLVMValueRef shared_memory;
 };
 
 static inline struct si_shader_context *
 si_shader_context(struct lp_build_tgsi_context *bld_base)
 {
return (struct si_shader_context*)bld_base;
 }
 
 static inline struct si_shader_context *
 si_shader_context_from_abi(struct ac_shader_abi *abi)
diff --git a/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c 
b/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
index ec11c75..5552cc8 100644
--- a/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
+++ b/src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c
@@ -442,21 +442,21 @@ static LLVMValueRef get_memory_ptr(struct 
si_shader_context *ctx,
const struct tgsi_full_instruction *inst,
LLVMTypeRef type, int arg)
 {
LLVMBuilderRef builder = ctx->ac.builder;
LLVMValueRef offset, ptr;
int addr_space;
 
offset = lp_build_emit_fetch(>bld_base, inst, arg, 0);
offset = ac_to_integer(>ac, offset);
 
-   ptr = ctx->shared_memory;
+   ptr = ctx->ac.lds;
ptr = LLVMBuildGEP(builder, ptr, , 1, "");
addr_space = LLVMGetPointerAddressSpace(LLVMTypeOf(ptr));
ptr = LLVMBuildBitCast(builder, ptr, LLVMPointerType(type, addr_space), 
"");
 
return ptr;
 }
 
 static void load_emit_memory(
struct si_shader_context *ctx,
struct lp_build_emit_data *emit_data)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] ac: LDS stores in LS and ES don't have to be volatile

2017-11-09 Thread Marek Olšák
From: Marek Olšák 

I like the writeonly wrapper more than using ac_build_store directly.
---
 src/amd/common/ac_llvm_build.c   |  6 ++
 src/amd/common/ac_llvm_build.h   |  2 ++
 src/amd/common/ac_nir_to_llvm.c  |  4 ++--
 src/gallium/drivers/radeonsi/si_shader.c | 15 ++-
 4 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index a85ffe1..4c9beda 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -1771,20 +1771,26 @@ LLVMValueRef ac_lds_load_volatile(struct 
ac_llvm_context *ctx,
 {
return ac_build_load_custom(ctx, ctx->lds, dw_addr, false, false, true);
 }
 
 void ac_lds_store_volatile(struct ac_llvm_context *ctx,
   LLVMValueRef dw_addr, LLVMValueRef value)
 {
ac_build_store(ctx, ctx->lds, dw_addr, ac_to_integer(ctx, value), true);
 }
 
+void ac_lds_store_writeonly(struct ac_llvm_context *ctx,
+   LLVMValueRef dw_addr, LLVMValueRef value)
+{
+   ac_build_store(ctx, ctx->lds, dw_addr, ac_to_integer(ctx, value), 
false);
+}
+
 LLVMValueRef ac_find_lsb(struct ac_llvm_context *ctx,
 LLVMTypeRef dst_type,
 LLVMValueRef src0)
 {
LLVMValueRef params[2] = {
src0,
 
/* The value of 1 means that ffs(x=0) = undef, so LLVM won't
 * add special code to check for x=0. The reason is that
 * the LLVM behavior for x=0 is different from what we
diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index e3f716e..25a540a 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -288,19 +288,21 @@ void ac_optimize_vs_outputs(struct ac_llvm_context *ac,
uint8_t *vs_output_param_offset,
uint32_t num_outputs,
uint8_t *num_param_exports);
 void ac_init_exec_full_mask(struct ac_llvm_context *ctx);
 
 void ac_declare_lds_as_pointer(struct ac_llvm_context *ac);
 LLVMValueRef ac_lds_load_volatile(struct ac_llvm_context *ctx,
  LLVMValueRef dw_addr);
 void ac_lds_store_volatile(struct ac_llvm_context *ctx,
   LLVMValueRef dw_addr, LLVMValueRef value);
+void ac_lds_store_writeonly(struct ac_llvm_context *ctx,
+   LLVMValueRef dw_addr, LLVMValueRef value);
 
 LLVMValueRef ac_find_lsb(struct ac_llvm_context *ctx,
 LLVMTypeRef dst_type,
 LLVMValueRef src0);
 #ifdef __cplusplus
 }
 #endif
 
 #endif
diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index fa30b91..3f41b9f 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -5841,21 +5841,21 @@ handle_es_outputs_post(struct nir_to_llvm_context *ctx,
if (lds_base) {
dw_addr = LLVMBuildAdd(ctx->builder, lds_base,
   LLVMConstInt(ctx->ac.i32, 
param_index * 4, false),
   "");
}
for (j = 0; j < length; j++) {
LLVMValueRef out_val = LLVMBuildLoad(ctx->builder, 
out_ptr[j], "");
out_val = LLVMBuildBitCast(ctx->builder, out_val, 
ctx->ac.i32, "");
 
if (ctx->ac.chip_class  >= GFX9) {
-   ac_lds_store_volatile(>ac, dw_addr,
+   ac_lds_store_writeonly(>ac, dw_addr,
 LLVMBuildLoad(ctx->builder, 
out_ptr[j], ""));
dw_addr = LLVMBuildAdd(ctx->builder, dw_addr, 
ctx->ac.i32_1, "");
} else {
ac_build_buffer_store_dword(>ac,
ctx->esgs_ring,
out_val, 1,
NULL, 
ctx->es2gs_offset,
(4 * param_index + 
j) * 4,
1, 1, true, true);
}
@@ -5881,21 +5881,21 @@ handle_ls_outputs_post(struct nir_to_llvm_context *ctx)
if (i == VARYING_SLOT_CLIP_DIST0)
length = ctx->num_output_clips + ctx->num_output_culls;
int param = shader_io_get_unique_index(i);
mark_tess_output(ctx, false, param);
if (length > 4)
mark_tess_output(ctx, false, param + 1);
LLVMValueRef dw_addr = LLVMBuildAdd(ctx->builder, base_dw_addr,
LLVMConstInt(ctx->ac.i32, 
param * 4, 

[Mesa-dev] [PATCH 4/5] ac: LDS loads of TCS and GS inputs can be non-volatile and invariant

2017-11-09 Thread Marek Olšák
From: Marek Olšák 

---
 src/amd/common/ac_llvm_build.c   |  6 ++
 src/amd/common/ac_llvm_build.h   |  2 ++
 src/amd/common/ac_nir_to_llvm.c  |  4 ++--
 src/gallium/drivers/radeonsi/si_shader.c | 25 ++---
 4 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 4c9beda..305abd3 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -1765,20 +1765,26 @@ void ac_declare_lds_as_pointer(struct ac_llvm_context 
*ctx)
 LLVMPointerType(LLVMArrayType(ctx->i32, 
lds_size / 4), AC_LOCAL_ADDR_SPACE),
 "lds");
 }
 
 LLVMValueRef ac_lds_load_volatile(struct ac_llvm_context *ctx,
  LLVMValueRef dw_addr)
 {
return ac_build_load_custom(ctx, ctx->lds, dw_addr, false, false, true);
 }
 
+LLVMValueRef ac_lds_load_invariant(struct ac_llvm_context *ctx,
+  LLVMValueRef dw_addr)
+{
+   return ac_build_load_custom(ctx, ctx->lds, dw_addr, false, true, false);
+}
+
 void ac_lds_store_volatile(struct ac_llvm_context *ctx,
   LLVMValueRef dw_addr, LLVMValueRef value)
 {
ac_build_store(ctx, ctx->lds, dw_addr, ac_to_integer(ctx, value), true);
 }
 
 void ac_lds_store_writeonly(struct ac_llvm_context *ctx,
LLVMValueRef dw_addr, LLVMValueRef value)
 {
ac_build_store(ctx, ctx->lds, dw_addr, ac_to_integer(ctx, value), 
false);
diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index 25a540a..3bd085c 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -286,20 +286,22 @@ void ac_get_image_intr_name(const char *base_name,
 void ac_optimize_vs_outputs(struct ac_llvm_context *ac,
LLVMValueRef main_fn,
uint8_t *vs_output_param_offset,
uint32_t num_outputs,
uint8_t *num_param_exports);
 void ac_init_exec_full_mask(struct ac_llvm_context *ctx);
 
 void ac_declare_lds_as_pointer(struct ac_llvm_context *ac);
 LLVMValueRef ac_lds_load_volatile(struct ac_llvm_context *ctx,
  LLVMValueRef dw_addr);
+LLVMValueRef ac_lds_load_invariant(struct ac_llvm_context *ctx,
+  LLVMValueRef dw_addr);
 void ac_lds_store_volatile(struct ac_llvm_context *ctx,
   LLVMValueRef dw_addr, LLVMValueRef value);
 void ac_lds_store_writeonly(struct ac_llvm_context *ctx,
LLVMValueRef dw_addr, LLVMValueRef value);
 
 LLVMValueRef ac_find_lsb(struct ac_llvm_context *ctx,
 LLVMTypeRef dst_type,
 LLVMValueRef src0);
 #ifdef __cplusplus
 }
diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 3f41b9f..b4d840f 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2719,21 +2719,21 @@ load_tcs_input(struct nir_to_llvm_context *ctx,
 false, NULL, per_vertex ? _index : NULL,
 _index, _index);
 
stride = unpack_param(>ac, ctx->tcs_in_layout, 13, 8);
dw_addr = get_tcs_in_current_patch_offset(ctx);
dw_addr = get_dw_address(ctx, dw_addr, param, const_index, is_compact, 
vertex_index, stride,
 indir_index);
 
unsigned comp = instr->variables[0]->var->data.location_frac;
for (unsigned i = 0; i < instr->num_components + comp; i++) {
-   value[i] = ac_lds_load_volatile(>ac, dw_addr);
+   value[i] = ac_lds_load_invariant(>ac, dw_addr);
dw_addr = LLVMBuildAdd(ctx->builder, dw_addr,
   ctx->ac.i32_1, "");
}
result = build_varying_gather_values(>ac, value, 
instr->num_components, comp);
result = LLVMBuildBitCast(ctx->builder, result, get_def_type(ctx->nir, 
>dest.ssa), "");
return result;
 }
 
 static LLVMValueRef
 load_tcs_output(struct nir_to_llvm_context *ctx,
@@ -2901,21 +2901,21 @@ load_gs_input(struct nir_to_llvm_context *ctx,
  LLVMConstInt(ctx->ac.i32, 4, false), "");
 
param = 
shader_io_get_unique_index(instr->variables[0]->var->data.location);
 
unsigned comp = instr->variables[0]->var->data.location_frac;
for (unsigned i = comp; i < instr->num_components + comp; i++) {
if (ctx->ac.chip_class >= GFX9) {
LLVMValueRef dw_addr = 
ctx->gs_vtx_offset[vtx_offset_param];
dw_addr = LLVMBuildAdd(ctx->ac.builder, dw_addr,
   LLVMConstInt(ctx->ac.i32, param 
* 4 + i + const_index, 0), "");
-   value[i] = 

[Mesa-dev] [PATCH 0/5] Volatile and invariant LDS memory ops

2017-11-09 Thread Marek Olšák
Hi,

This fixes the TCS gl_ClipDistance piglit failure that was uncovered
by a recent LLVM change. The solution is to set volatile on loads
and stores to enforce proper ordering.

Please review.

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] ac: mark LDS loads and stores as volatile

2017-11-09 Thread Marek Olšák
From: Marek Olšák 

LLVM uses arbitrary scheduling if we don't set volatile.

volatile is a keyword, so use Volatile
---
 src/amd/common/ac_llvm_build.c   | 38 
 src/amd/common/ac_llvm_build.h   | 13 ---
 src/amd/common/ac_nir_to_llvm.c  | 20 -
 src/gallium/drivers/radeonsi/si_shader.c |  6 ++---
 4 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 5640a23..a85ffe1 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -700,70 +700,73 @@ ac_build_gep0(struct ac_llvm_context *ctx,
  LLVMValueRef index)
 {
LLVMValueRef indices[2] = {
LLVMConstInt(ctx->i32, 0, 0),
index,
};
return LLVMBuildGEP(ctx->builder, base_ptr,
indices, 2, "");
 }
 
-void
-ac_build_indexed_store(struct ac_llvm_context *ctx,
-  LLVMValueRef base_ptr, LLVMValueRef index,
-  LLVMValueRef value)
+static void ac_build_store(struct ac_llvm_context *ctx, LLVMValueRef base_ptr,
+  LLVMValueRef index, LLVMValueRef value, bool 
Volatile)
 {
-   LLVMBuildStore(ctx->builder, value,
-  ac_build_gep0(ctx, base_ptr, index));
+   LLVMValueRef store = LLVMBuildStore(ctx->builder, value,
+   ac_build_gep0(ctx, base_ptr, 
index));
+   if (Volatile)
+   LLVMSetVolatile(store, true);
 }
 
 /**
  * Build an LLVM bytecode indexed load using LLVMBuildGEP + LLVMBuildLoad.
  * It's equivalent to doing a load from _ptr[index].
  *
  * \param base_ptr  Where the array starts.
  * \param index The element index into the array.
  * \param uniform   Whether the base_ptr and index can be assumed to be
  *  dynamically uniform (i.e. load to an SGPR)
  * \param invariant Whether the load is invariant (no other opcodes affect it)
  */
 static LLVMValueRef
 ac_build_load_custom(struct ac_llvm_context *ctx, LLVMValueRef base_ptr,
-LLVMValueRef index, bool uniform, bool invariant)
+LLVMValueRef index, bool uniform, bool invariant,
+bool Volatile)
 {
LLVMValueRef pointer, result;
 
pointer = ac_build_gep0(ctx, base_ptr, index);
if (uniform)
LLVMSetMetadata(pointer, ctx->uniform_md_kind, ctx->empty_md);
result = LLVMBuildLoad(ctx->builder, pointer, "");
if (invariant)
LLVMSetMetadata(result, ctx->invariant_load_md_kind, 
ctx->empty_md);
+   if (Volatile)
+   LLVMSetVolatile(result, true);
return result;
 }
 
 LLVMValueRef ac_build_load(struct ac_llvm_context *ctx, LLVMValueRef base_ptr,
   LLVMValueRef index)
 {
-   return ac_build_load_custom(ctx, base_ptr, index, false, false);
+   return ac_build_load_custom(ctx, base_ptr, index, false, false, false);
 }
 
 LLVMValueRef ac_build_load_invariant(struct ac_llvm_context *ctx,
 LLVMValueRef base_ptr, LLVMValueRef index)
 {
-   return ac_build_load_custom(ctx, base_ptr, index, false, true);
+   return ac_build_load_custom(ctx, base_ptr, index, false, true, false);
 }
 
 LLVMValueRef ac_build_load_to_sgpr(struct ac_llvm_context *ctx,
   LLVMValueRef base_ptr, LLVMValueRef index)
 {
-   return ac_build_load_custom(ctx, base_ptr, index, true, true);
+   return ac_build_load_custom(ctx, base_ptr, index, true, true, false);
 }
 
 /* TBUFFER_STORE_FORMAT_{X,XY,XYZ,XYZW} <- the suffix is selected by 
num_channels=1..4.
  * The type of vdata must be one of i32 (num_channels=1), v2i32 
(num_channels=2),
  * or v4i32 (num_channels=3,4).
  */
 void
 ac_build_buffer_store_dword(struct ac_llvm_context *ctx,
LLVMValueRef rsrc,
LLVMValueRef vdata,
@@ -1756,33 +1759,30 @@ void ac_init_exec_full_mask(struct ac_llvm_context *ctx)
 }
 
 void ac_declare_lds_as_pointer(struct ac_llvm_context *ctx)
 {
unsigned lds_size = ctx->chip_class >= CIK ? 65536 : 32768;
ctx->lds = LLVMBuildIntToPtr(ctx->builder, ctx->i32_0,
 LLVMPointerType(LLVMArrayType(ctx->i32, 
lds_size / 4), AC_LOCAL_ADDR_SPACE),
 "lds");
 }
 
-LLVMValueRef ac_lds_load(struct ac_llvm_context *ctx,
-LLVMValueRef dw_addr)
+LLVMValueRef ac_lds_load_volatile(struct ac_llvm_context *ctx,
+ LLVMValueRef dw_addr)
 {
-   return ac_build_load(ctx, ctx->lds, dw_addr);
+   return ac_build_load_custom(ctx, ctx->lds, dw_addr, false, false, true);
 }
 
-void ac_lds_store(struct ac_llvm_context *ctx,
- LLVMValueRef dw_addr,
- 

Re: [Mesa-dev] [PATCH] threads: fix MinGW build breakage

2017-11-09 Thread Brian Paul

On 11/09/2017 02:41 PM, Nicolai Hähnle wrote:

Sorry for the mess.


Not a huge deal.  FWIW, you can test the MinGW cross-compile pretty easily:

1. apt-get install g++-mingw-w64-x86-64 (or equivalent)
2. cd mesa ; scons platform=windows

-Brian



Reviewed-by: Nicolai Hähnle 

On 09.11.2017 17:46, Brian Paul wrote:

Fixes: f1a364878431c8 ("threads: update for late C11 changes")
---
  include/c11/threads_win32.h | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/c11/threads_win32.h b/include/c11/threads_win32.h
index 77d923a..dac8ef7 100644
--- a/include/c11/threads_win32.h
+++ b/include/c11/threads_win32.h
@@ -78,6 +78,9 @@ Configuration macro:
  /* Visual Studio 2015 and later */
  #if _MSC_VER >= 1900
  #define HAVE_TIMESPEC
+#define HAVE_TIMESPEC_GET
+#elif defined(__MINGW32__)
+#define HAVE_TIMESPEC
  #endif
  #ifndef HAVE_TIMESPEC
@@ -645,7 +648,7 @@ tss_set(tss_t key, void *val)
  /* 7.25.7 Time functions */
  // 7.25.6.1
-#ifndef HAVE_TIMESPEC
+#ifndef HAVE_TIMESPEC_GET
  static inline int
  timespec_get(struct timespec *ts, int base)
  {





___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] autotools: Set C++ visibility flags on Intel

2017-11-09 Thread Matt Turner
On Thu, Nov 9, 2017 at 1:58 PM, Dylan Baker  wrote:
> These flags are set for C sources, but not C++. This causes symbol
> visibility leaks from the C++ parts of the Intel compiler.
>
> fixes: 700bebb958e93f4d ("i965: Move the back-end compiler to 
> src/intel/compiler")

Fixes

> Signed-off-by: Dylan Baker 
> ---
>
>  src/intel/Makefile.am | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/intel/Makefile.am b/src/intel/Makefile.am
> index a34e3014497..6a5e393f175 100644
> --- a/src/intel/Makefile.am
> +++ b/src/intel/Makefile.am
> @@ -46,6 +46,9 @@ AM_CFLAGS = \
> $(VISIBILITY_CFLAGS) \
> $(WNO_OVERRIDE_INIT)
>
> +AM_CXXFLAGS = \
> +   $(VISIBILITY_CXXFLAGS)
> +


Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Use a pure vertical stride for large register strides

2017-11-09 Thread Matt Turner
On Thu, Nov 2, 2017 at 3:54 PM, Jason Ekstrand  wrote:
> Register strides higher than 4 are uncommon but they can happen.  For
> instance, if you have a 64-bit extract_u8 operation, we turn that into
> UB -> UQ MOV with a source stride of 8.  Our previous calculation would
> try to generate a stride of <32;8,8>:ub which is invalid because the
> maximum horizontal stride is 4.  To solve this problem, we instead use a
> stride of <8;1,0>.  As noted in the comment, this does not work as a
> destination but that's ok as very few things actually generate that
> stride.

Please put the tests you fixed in the commit message. It's not okay to
leave that out for all the reasons that I'm sure you know.

Looks like this doesn't work on CHV, BXT, GLK :(

KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks now fails on CHV,
BXT, GLK with:

mov(8)  g21<1>UQg19<8,1,0>UB{ align1 1Q };
ERROR: Source and destination horizontal stride must equal and
a multiple of a qword when the execution type is 64-bit
ERROR: Vstride must be Width * Hstride when the execution type is 64-bit

Modulo the typo in the first error, I think both of these are correct.
I don't think we can extract_u8 to a 64-bit type on Atom :(

This is filed as https://bugs.freedesktop.org/show_bug.cgi?id=103628
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] autotools: Set C++ visibility flags on Intel

2017-11-09 Thread Dylan Baker
These flags are set for C sources, but not C++. This causes symbol
visibility leaks from the C++ parts of the Intel compiler.

fixes: 700bebb958e93f4d ("i965: Move the back-end compiler to 
src/intel/compiler")
Signed-off-by: Dylan Baker 
---

 src/intel/Makefile.am | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/intel/Makefile.am b/src/intel/Makefile.am
index a34e3014497..6a5e393f175 100644
--- a/src/intel/Makefile.am
+++ b/src/intel/Makefile.am
@@ -46,6 +46,9 @@ AM_CFLAGS = \
$(VISIBILITY_CFLAGS) \
$(WNO_OVERRIDE_INIT)
 
+AM_CXXFLAGS = \
+   $(VISIBILITY_CXXFLAGS)
+
 MKDIR_GEN = $(AM_V_at)$(MKDIR_P) $(@D)
 PYTHON_GEN = $(AM_V_GEN)$(PYTHON2) $(PYTHON_FLAGS)
 
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] glx: Lower GLX opcode lookup into SendMakeCurrentRequest

2017-11-09 Thread Adam Jackson
Signed-off-by: Adam Jackson 
---
 src/glx/indirect_glx.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/glx/indirect_glx.c b/src/glx/indirect_glx.c
index 4302a8ff28..cfae12f6c0 100644
--- a/src/glx/indirect_glx.c
+++ b/src/glx/indirect_glx.c
@@ -62,13 +62,13 @@ indirect_destroy_context(struct glx_context *gc)
 }
 
 static Bool
-SendMakeCurrentRequest(Display * dpy, CARD8 opcode,
-   GLXContextID gc_id, GLXContextTag gc_tag,
-   GLXDrawable draw, GLXDrawable read,
-   GLXContextTag *out_tag)
+SendMakeCurrentRequest(Display * dpy, GLXContextID gc_id,
+   GLXContextTag gc_tag, GLXDrawable draw,
+   GLXDrawable read, GLXContextTag *out_tag)
 {
xGLXMakeCurrentReply reply;
Bool ret;
+   int opcode = __glXSetupForCommand(dpy);
 
LockDisplay(dpy);
 
@@ -136,7 +136,6 @@ indirect_bind_context(struct glx_context *gc, struct 
glx_context *old,
 {
GLXContextTag tag;
Display *dpy = gc->psc->dpy;
-   int opcode = __glXSetupForCommand(dpy);
Bool sent;
 
if (old !=  && !old->isDirect && old->psc->dpy == dpy) {
@@ -146,7 +145,7 @@ indirect_bind_context(struct glx_context *gc, struct 
glx_context *old,
   tag = 0;
}
 
-   sent = SendMakeCurrentRequest(dpy, opcode, gc->xid, tag, draw, read,
+   sent = SendMakeCurrentRequest(dpy, gc->xid, tag, draw, read,
 >currentContextTag);
 
if (!IndirectAPI)
@@ -160,7 +159,6 @@ static void
 indirect_unbind_context(struct glx_context *gc, struct glx_context *new)
 {
Display *dpy = gc->psc->dpy;
-   int opcode = __glXSetupForCommand(dpy);
 
if (gc == new)
   return;
@@ -170,8 +168,8 @@ indirect_unbind_context(struct glx_context *gc, struct 
glx_context *new)
 * to send a request to the dpy to unbind the previous context.
 */
if (!new || new->isDirect || new->psc->dpy != dpy) {
-  SendMakeCurrentRequest(dpy, opcode, None,
-gc->currentContextTag, None, None, NULL);
+  SendMakeCurrentRequest(dpy, None, gc->currentContextTag, None, None,
+ NULL);
   gc->currentContextTag = 0;
}
 }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3] Misc GLX cleanups

2017-11-09 Thread Adam Jackson
Testing the EXT_no_config_context series revealed that a bunch more
things were broken than I expected. While I work my way through that,
here's one trivial cleanup and a couple of pretty obvious bugfixes.

- ajax

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] glx/dri3: Fix passing renderType into glXCreateContext

2017-11-09 Thread Adam Jackson
Without this, trying to create a GLX_RGBA_FLOAT_TYPE_ARB context would
fail, because GLX_RGBA_TYPE would be a mismatch with the fbconfig.

Signed-off-by: Adam Jackson 
---
 src/glx/dri3_glx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c
index fa048f990a..a10306fe32 100644
--- a/src/glx/dri3_glx.c
+++ b/src/glx/dri3_glx.c
@@ -324,9 +324,10 @@ dri3_create_context(struct glx_screen *base,
 struct glx_context *shareList, int renderType)
 {
unsigned int error;
+   uint32_t attribs[2] = { GLX_RENDER_TYPE, renderType };
 
return dri3_create_context_attribs(base, config_base, shareList,
-  0, NULL, );
+  1, attribs, );
 }
 
 static void
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] glx/drisw: Fix glXMakeCurrent(dpy, None, ctx)

2017-11-09 Thread Adam Jackson
This is perfectly legal in GL 3.0+.

Fixes piglit/glx-create-context-no-current-framebuffer.

Signed-off-by: Adam Jackson 
---
 src/glx/drisw_glx.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/glx/drisw_glx.c b/src/glx/drisw_glx.c
index 2f0675addb..df2467a5c2 100644
--- a/src/glx/drisw_glx.c
+++ b/src/glx/drisw_glx.c
@@ -255,11 +255,9 @@ drisw_bind_context(struct glx_context *context, struct 
glx_context *old,
 
driReleaseDrawables(>base);
 
-   if (pdraw == NULL || pread == NULL)
-  return GLXBadDrawable;
-
if ((*psc->core->bindContext) (pcp->driContext,
- pdraw->driDrawable, pread->driDrawable))
+  pdraw ? pdraw->driDrawable : NULL,
+  pread ? pread->driDrawable : NULL))
   return Success;
 
return GLXBadContext;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Pretend there are 4 subslices for compute shader threads on Gen9+.

2017-11-09 Thread Kenneth Graunke
On Thursday, November 9, 2017 11:22:34 AM PST Rafael Antognolli wrote:
> On Thu, Nov 09, 2017 at 12:59:12AM -0800, Jordan Justen wrote:
> > Reviewed-by: Jordan Justen 
> 
> It's also
> 
> Tested-by: Rafael Antognolli 

Sorry, I forgot to sync email before pushing this patch, so I missed
adding your Tested-by tag.  :(  Next time...


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] threads: fix MinGW build breakage

2017-11-09 Thread Nicolai Hähnle

Sorry for the mess.

Reviewed-by: Nicolai Hähnle 

On 09.11.2017 17:46, Brian Paul wrote:

Fixes: f1a364878431c8 ("threads: update for late C11 changes")
---
  include/c11/threads_win32.h | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/c11/threads_win32.h b/include/c11/threads_win32.h
index 77d923a..dac8ef7 100644
--- a/include/c11/threads_win32.h
+++ b/include/c11/threads_win32.h
@@ -78,6 +78,9 @@ Configuration macro:
  /* Visual Studio 2015 and later */
  #if _MSC_VER >= 1900
  #define HAVE_TIMESPEC
+#define HAVE_TIMESPEC_GET
+#elif defined(__MINGW32__)
+#define HAVE_TIMESPEC
  #endif
  
  #ifndef HAVE_TIMESPEC

@@ -645,7 +648,7 @@ tss_set(tss_t key, void *val)
  
  /* 7.25.7 Time functions */

  // 7.25.6.1
-#ifndef HAVE_TIMESPEC
+#ifndef HAVE_TIMESPEC_GET
  static inline int
  timespec_get(struct timespec *ts, int base)
  {



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLK

2017-11-09 Thread Matt Turner
Fixes the following tests on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
---
 src/intel/compiler/brw_fs_nir.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 15f2d88624..38d0d357e8 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -725,8 +725,12 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   break;
 
case nir_op_f2f64:
+   case nir_op_f2i64:
+   case nir_op_f2u64:
case nir_op_i2f64:
+   case nir_op_i2i64:
case nir_op_u2f64:
+   case nir_op_u2u64:
   /* CHV PRM, vol07, 3D Media GPGPU Engine, Register Region Restrictions:
*
*"When source or destination is 64b (...), regioning in Align1
@@ -754,12 +758,8 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_f2f32:
case nir_op_f2i32:
case nir_op_f2u32:
-   case nir_op_f2i64:
-   case nir_op_f2u64:
case nir_op_i2i32:
-   case nir_op_i2i64:
case nir_op_u2u32:
-   case nir_op_u2u64:
   inst = bld.MOV(result, op[0]);
   inst->saturate = instr->dest.saturate;
   break;
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] st/mesa: use asynchronous flushes

2017-11-09 Thread Andres Rodriguez

Reviewed-by: Andres Rodriguez 

Been going through these patches as they are tightly related to the 
semaphore series I'm working on.


Regards,
Andres

On 2017-11-09 08:45 AM, Nicolai Hähnle wrote:

Hi all,

I've previously sent some of this series, but I'm splitting it up
further for bisectability, plus the first patch is new.

The idea here is to further reduce the amount of synchronization
required with threaded gallium.

Eventually, we should be able to eliminate synchronizations entirely
for well-behaved games (except for throttling). However, that
requires passing fences explicitly on Present, which I'm not planning
to tackle any time soon.

Please review!
Thanks,
Nicolai
--
  .../auxiliary/util/u_threaded_context.c  | 14 --
  .../auxiliary/util/u_threaded_context.h  |  1 +
  .../util/u_threaded_context_calls.h  |  1 +
  src/mesa/state_tracker/st_cb_flush.c |  4 +--
  src/mesa/state_tracker/st_cb_syncobj.c   | 26 --
  5 files changed, 39 insertions(+), 7 deletions(-)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: remove 'struct' keyword on function parameter

2017-11-09 Thread Charmaine Lee

Reviewed-by; Charmaine Lee 


From: Brian Paul 
Sent: Thursday, November 9, 2017 11:31:16 AM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee
Subject: [PATCH] st/mesa: remove 'struct' keyword on function parameter

st_src_reg is a class, not a struct.  Simply remove 'struct' to silence
a MSVC compiler warning (class vs. struct mismatch).
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index ca04765..3dc0237 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -4590,8 +4590,7 @@ glsl_to_tgsi_visitor::simplify_cmp(void)
 }

 static void
-rename_temp_handle_src(struct rename_reg_pair *renames,
-   struct st_src_reg *src)
+rename_temp_handle_src(struct rename_reg_pair *renames, st_src_reg *src)
 {
if (src && src->file == PROGRAM_TEMPORARY) {
   int old_idx = src->index;
--
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] i965: Fold ABO state upload code into the SSBO/UBO state upload code.

2017-11-09 Thread Jason Ekstrand
On Thu, Nov 9, 2017 at 12:45 AM, Kenneth Graunke 
wrote:

> Having this separate could potentially make programs that rebind atomics
> but no other surfaces ever so slightly faster.  But it's a tiny amount
> of code to add to the existing UBO/SSBO atom, and very related.
>
> The extra atoms have a cost on every draw call, and so dropping some of
> them would be nice.  This also reclaims a dirty bit.
> ---
>  src/mesa/drivers/dri/i965/brw_context.h   |  6 --
>  src/mesa/drivers/dri/i965/brw_gs_surface_state.c  | 22 --
>  src/mesa/drivers/dri/i965/brw_state.h |  6 --
>  src/mesa/drivers/dri/i965/brw_state_upload.c  |  3 +-
>  src/mesa/drivers/dri/i965/brw_tcs_surface_state.c | 23 --
>  src/mesa/drivers/dri/i965/brw_tes_surface_state.c | 23 --
>  src/mesa/drivers/dri/i965/brw_vs_surface_state.c  | 22 --
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c  | 87
> ---
>  src/mesa/drivers/dri/i965/genX_state_upload.c | 11 ---
>  src/mesa/drivers/dri/i965/intel_buffer_objects.c  |  2 +-
>  10 files changed, 16 insertions(+), 189 deletions(-)
>
> Total diffstat for the series:
>
>  18 files changed, 54 insertions(+), 416 deletions(-)
>

Nice!  Patches 2 and 3 are

Reviewed-by: Jason Ekstrand 


> diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 5d19a6bfc9a..60279dbde46 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -195,7 +195,6 @@ enum brw_state_id {
> BRW_STATE_RASTERIZER_DISCARD,
> BRW_STATE_STATS_WM,
> BRW_STATE_UNIFORM_BUFFER,
> -   BRW_STATE_ATOMIC_BUFFER,
> BRW_STATE_IMAGE_UNITS,
> BRW_STATE_META_IN_PROGRESS,
> BRW_STATE_PUSH_CONSTANT_ALLOCATION,
> @@ -288,7 +287,6 @@ enum brw_state_id {
>  #define BRW_NEW_RASTERIZER_DISCARD  (1ull <<
> BRW_STATE_RASTERIZER_DISCARD)
>  #define BRW_NEW_STATS_WM(1ull << BRW_STATE_STATS_WM)
>  #define BRW_NEW_UNIFORM_BUFFER  (1ull << BRW_STATE_UNIFORM_BUFFER)
> -#define BRW_NEW_ATOMIC_BUFFER   (1ull << BRW_STATE_ATOMIC_BUFFER)
>  #define BRW_NEW_IMAGE_UNITS (1ull << BRW_STATE_IMAGE_UNITS)
>  #define BRW_NEW_META_IN_PROGRESS(1ull <<
> BRW_STATE_META_IN_PROGRESS)
>  #define BRW_NEW_PUSH_CONSTANT_ALLOCATION (1ull <<
> BRW_STATE_PUSH_CONSTANT_ALLOCATION)
> @@ -1406,10 +1404,6 @@ brw_update_sol_surface(struct brw_context *brw,
>  void brw_upload_ubo_surfaces(struct brw_context *brw, struct gl_program
> *prog,
>   struct brw_stage_state *stage_state,
>   struct brw_stage_prog_data *prog_data);
> -void brw_upload_abo_surfaces(struct brw_context *brw,
> - const struct gl_program *prog,
> - struct brw_stage_state *stage_state,
> - struct brw_stage_prog_data *prog_data);
>  void brw_upload_image_surfaces(struct brw_context *brw,
> const struct gl_program *prog,
> struct brw_stage_state *stage_state,
> diff --git a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
> b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
> index 570f3fb4dd2..6f2629eb29d 100644
> --- a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c
> @@ -91,28 +91,6 @@ const struct brw_tracked_state brw_gs_ubo_surfaces = {
> .emit = brw_upload_gs_ubo_surfaces,
>  };
>
> -static void
> -brw_upload_gs_abo_surfaces(struct brw_context *brw)
> -{
> -   /* _NEW_PROGRAM */
> -   const struct gl_program *gp = brw->programs[MESA_SHADER_GEOMETRY];
> -
> -   if (gp) {
> -  /* BRW_NEW_GS_PROG_DATA */
> -  brw_upload_abo_surfaces(brw, gp, >gs.base,
> brw->gs.base.prog_data);
> -   }
> -}
> -
> -const struct brw_tracked_state brw_gs_abo_surfaces = {
> -   .dirty = {
> -  .mesa = _NEW_PROGRAM,
> -  .brw = BRW_NEW_ATOMIC_BUFFER |
> - BRW_NEW_BATCH |
> - BRW_NEW_GS_PROG_DATA,
> -   },
> -   .emit = brw_upload_gs_abo_surfaces,
> -};
> -
>  static void
>  brw_upload_gs_image_surfaces(struct brw_context *brw)
>  {
> diff --git a/src/mesa/drivers/dri/i965/brw_state.h
> b/src/mesa/drivers/dri/i965/brw_state.h
> index 927e77920ef..f668b807fc7 100644
> --- a/src/mesa/drivers/dri/i965/brw_state.h
> +++ b/src/mesa/drivers/dri/i965/brw_state.h
> @@ -58,16 +58,12 @@ extern const struct brw_tracked_state
> brw_recalculate_urb_fence;
>  extern const struct brw_tracked_state brw_sf_vp;
>  extern const struct brw_tracked_state brw_cs_texture_surfaces;
>  extern const struct brw_tracked_state brw_vs_ubo_surfaces;
> -extern const struct brw_tracked_state brw_vs_abo_surfaces;
>  extern const struct brw_tracked_state brw_vs_image_surfaces;
>  extern const struct brw_tracked_state brw_tcs_ubo_surfaces;
> -extern const struct brw_tracked_state 

Re: [Mesa-dev] [PATCH 1/3] i965: Make a better helper function for UBO/SSBO/ABO surface handling.

2017-11-09 Thread Jason Ekstrand
On Thu, Nov 9, 2017 at 12:45 AM, Kenneth Graunke 
wrote:

> This fixes the missing AutomaticSize handling in the ABO code, removes
> a bunch of duplicated code, and drops an extra layer of wrapping around
> brw_emit_buffer_surface_state().
> ---
>  src/mesa/drivers/dri/i965/brw_context.h  |  10 --
>  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 113
> +++
>  src/mesa/drivers/dri/i965/gen6_constant_state.c  |   7 +-
>  3 files changed, 36 insertions(+), 94 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 8aa0c5ff64c..5d19a6bfc9a 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1395,16 +1395,6 @@ brw_get_index_type(unsigned index_size)
>  void brw_prepare_vertices(struct brw_context *brw);
>
>  /* brw_wm_surface_state.c */
> -void brw_create_constant_surface(struct brw_context *brw,
> - struct brw_bo *bo,
> - uint32_t offset,
> - uint32_t size,
> - uint32_t *out_offset);
> -void brw_create_buffer_surface(struct brw_context *brw,
> -   struct brw_bo *bo,
> -   uint32_t offset,
> -   uint32_t size,
> -   uint32_t *out_offset);
>  void brw_update_buffer_texture_surface(struct gl_context *ctx,
> unsigned unit,
> uint32_t *surf_offset);
> diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> index 27c241a87af..a483ba34151 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> @@ -672,44 +672,6 @@ brw_update_buffer_texture_surface(struct gl_context
> *ctx,
>   0);
>  }
>
> -/**
> - * Create the constant buffer surface.  Vertex/fragment shader constants
> will be
> - * read from this buffer with Data Port Read instructions/messages.
> - */
> -void
> -brw_create_constant_surface(struct brw_context *brw,
> -   struct brw_bo *bo,
> -   uint32_t offset,
> -   uint32_t size,
> -   uint32_t *out_offset)
> -{
> -   brw_emit_buffer_surface_state(brw, out_offset, bo, offset,
> - ISL_FORMAT_R32G32B32A32_FLOAT,
> - size, 1, 0);
> -}
> -
> -/**
> - * Create the buffer surface. Shader buffer variables will be
> - * read from / write to this buffer with Data Port Read/Write
> - * instructions/messages.
> - */
> -void
> -brw_create_buffer_surface(struct brw_context *brw,
> -  struct brw_bo *bo,
> -  uint32_t offset,
> -  uint32_t size,
> -  uint32_t *out_offset)
> -{
> -   /* Use a raw surface so we can reuse existing untyped read/write/atomic
> -* messages. We need these specifically for the fragment shader since
> they
> -* include a pixel mask header that we need to ensure correct behavior
> -* with helper invocations, which cannot write to the buffer.
> -*/
> -   brw_emit_buffer_surface_state(brw, out_offset, bo, offset,
> - ISL_FORMAT_RAW,
> - size, 1, RELOC_WRITE);
> -}
> -
>  /**
>   * Set up a binding table entry for use by stream output logic (transform
>   * feedback).
> @@ -1271,6 +1233,31 @@ const struct brw_tracked_state
> brw_cs_texture_surfaces = {
> .emit = brw_update_cs_texture_surfaces,
>  };
>
> +static void
> +upload_buffer_surface(struct brw_context *brw,
> +  struct gl_buffer_binding *binding,
> +  uint32_t *out_offset,
> +  enum isl_format format,
> +  unsigned reloc_flags)
> +{
> +   struct gl_context *ctx = >ctx;
> +
> +   if (binding->BufferObject == ctx->Shared->NullBufferObj) {
> +  emit_null_surface_state(brw, NULL, out_offset);
> +   } else {
> +  ptrdiff_t size = binding->BufferObject->Size - binding->Offset;
> +  if (!binding->AutomaticSize)
> + size = MIN2(size, binding->Size);
> +
> +  struct intel_buffer_object *iobj =
> + intel_buffer_object(binding->BufferObject);
> +  struct brw_bo *bo =
> + intel_bufferobj_buffer(brw, iobj, binding->Offset, size, false);
>

You're using this for both reads and writes.  I think you need another
boolean parameter instead of simply passing false all the time.  Other than
that, looks good.


> +
> +  brw_emit_buffer_surface_state(brw, out_offset, bo, binding->Offset,
> +format, size, 1, reloc_flags);
> +   }

Re: [Mesa-dev] [PATCH 3/3] mesa: s/GLint/gl_buffer_index/ for _ColorDrawBufferIndexes

2017-11-09 Thread Charmaine Lee

For this series, Reviewed-by: Charmaine Lee 


From: Brian Paul 
Sent: Thursday, November 9, 2017 11:31:42 AM
To: mesa-dev@lists.freedesktop.org
Cc: Charmaine Lee
Subject: [PATCH 3/3] mesa: s/GLint/gl_buffer_index/ for _ColorDrawBufferIndexes

Also fix local variable declarations and replace -1 with BUFFER_NONE.
No Piglit changes.
---
 src/mesa/drivers/common/meta.c   |  2 +-
 src/mesa/main/buffers.c  | 16 
 src/mesa/main/clear.c|  9 +
 src/mesa/main/framebuffer.c  |  4 ++--
 src/mesa/main/mtypes.h   |  2 +-
 src/mesa/state_tracker/st_cb_clear.c |  4 ++--
 src/mesa/state_tracker/st_cb_fbo.c   |  4 ++--
 src/mesa/swrast/s_blit.c |  8 
 src/mesa/swrast/s_renderbuffer.c |  4 ++--
 9 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
index bae04be..1cc736c 100644
--- a/src/mesa/drivers/common/meta.c
+++ b/src/mesa/drivers/common/meta.c
@@ -1655,7 +1655,7 @@ _mesa_meta_drawbuffers_and_colormask(struct gl_context 
*ctx, GLbitfield mask)
enums[0] = GL_NONE;

for (int i = 0; i < ctx->DrawBuffer->_NumColorDrawBuffers; i++) {
-  int b = ctx->DrawBuffer->_ColorDrawBufferIndexes[i];
+  gl_buffer_index b = ctx->DrawBuffer->_ColorDrawBufferIndexes[i];
   int colormask_idx = ctx->Extensions.EXT_draw_buffers2 ? i : 0;

   if (b < 0 || !(mask & (1 << b)) || is_color_disabled(ctx, colormask_idx))
diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c
index 5c37f0f..d364047 100644
--- a/src/mesa/main/buffers.c
+++ b/src/mesa/main/buffers.c
@@ -170,7 +170,7 @@ draw_buffer_enum_to_bitmask(const struct gl_context *ctx, 
GLenum buffer)
  * Helper routine used by glReadBuffer.
  * Given a GLenum naming a color buffer, return the index of the corresponding
  * renderbuffer (a BUFFER_* value).
- * return -1 for an invalid buffer.
+ * return BUFFER_NONE for an invalid buffer.
  */
 static gl_buffer_index
 read_buffer_enum_to_index(const struct gl_context *ctx, GLenum buffer)
@@ -719,7 +719,7 @@ _mesa_drawbuffers(struct gl_context *ctx, struct 
gl_framebuffer *fb,
if (n > 0 && _mesa_bitcount(destMask[0]) > 1) {
   GLuint count = 0, destMask0 = destMask[0];
   while (destMask0) {
- const int bufIndex = u_bit_scan();
+ const gl_buffer_index bufIndex = u_bit_scan();
  if (fb->_ColorDrawBufferIndexes[count] != bufIndex) {
 updated_drawbuffers(ctx, fb);
 fb->_ColorDrawBufferIndexes[count] = bufIndex;
@@ -733,7 +733,7 @@ _mesa_drawbuffers(struct gl_context *ctx, struct 
gl_framebuffer *fb,
   GLuint count = 0;
   for (buf = 0; buf < n; buf++ ) {
  if (destMask[buf]) {
-GLint bufIndex = ffs(destMask[buf]) - 1;
+gl_buffer_index bufIndex = ffs(destMask[buf]) - 1;
 /* only one bit should be set in the destMask[buf] field */
 assert(_mesa_bitcount(destMask[buf]) == 1);
 if (fb->_ColorDrawBufferIndexes[buf] != bufIndex) {
@@ -743,9 +743,9 @@ _mesa_drawbuffers(struct gl_context *ctx, struct 
gl_framebuffer *fb,
 count = buf + 1;
  }
  else {
-if (fb->_ColorDrawBufferIndexes[buf] != -1) {
+if (fb->_ColorDrawBufferIndexes[buf] != BUFFER_NONE) {
   updated_drawbuffers(ctx, fb);
-   fb->_ColorDrawBufferIndexes[buf] = -1;
+   fb->_ColorDrawBufferIndexes[buf] = BUFFER_NONE;
 }
  }
  fb->ColorDrawBuffer[buf] = buffers[buf];
@@ -753,11 +753,11 @@ _mesa_drawbuffers(struct gl_context *ctx, struct 
gl_framebuffer *fb,
   fb->_NumColorDrawBuffers = count;
}

-   /* set remaining outputs to -1 (GL_NONE) */
+   /* set remaining outputs to BUFFER_NONE */
for (buf = fb->_NumColorDrawBuffers; buf < ctx->Const.MaxDrawBuffers; 
buf++) {
-  if (fb->_ColorDrawBufferIndexes[buf] != -1) {
+  if (fb->_ColorDrawBufferIndexes[buf] != BUFFER_NONE) {
  updated_drawbuffers(ctx, fb);
- fb->_ColorDrawBufferIndexes[buf] = -1;
+ fb->_ColorDrawBufferIndexes[buf] = BUFFER_NONE;
   }
}
for (buf = n; buf < ctx->Const.MaxDrawBuffers; buf++) {
diff --git a/src/mesa/main/clear.c b/src/mesa/main/clear.c
index c5e7f13..be60442 100644
--- a/src/mesa/main/clear.c
+++ b/src/mesa/main/clear.c
@@ -194,9 +194,9 @@ clear(struct gl_context *ctx, GLbitfield mask, bool 
no_error)
   if (mask & GL_COLOR_BUFFER_BIT) {
  GLuint i;
  for (i = 0; i < ctx->DrawBuffer->_NumColorDrawBuffers; i++) {
-GLint buf = ctx->DrawBuffer->_ColorDrawBufferIndexes[i];
+gl_buffer_index buf = ctx->DrawBuffer->_ColorDrawBufferIndexes[i];

-if (buf >= 0 && color_buffer_writes_enabled(ctx, i)) {
+if (buf != BUFFER_NONE && color_buffer_writes_enabled(ctx, i)) {

Re: [Mesa-dev] [PATCH] swr: Fixed an uncommon freed-memory access during state validation

2017-11-09 Thread Kyriazis, George
Looks good..

Reviewed-By: George Kyriazis 
>

On Nov 8, 2017, at 6:39 PM, Bruce Cherniak 
> wrote:

State validation is performed during clear and draw calls.  Validation
during clear was still accessing vertex buffer state.  When the currently
set vertex buffers are client arrays, this could lead to accessing freed
memory.  Such is the case with the VMD application.

Previously, vertex buffer validation depended on a dirty bit or the
draw info indicating an indexed draw.  This required special handling for
clears.  But, vertex buffer validation still occurred which was unnecessary
and wrong.

Now, only minimal validation is performed during clear, deferring the
remainder to the next draw.  And, by setting the dirty bit in swr_draw_vbo
for indexed draws, vertex buffer validation is only dependent upon a
single dirty bit.

This fixes a bug exposed by the VMD application when changing models.
---
src/gallium/drivers/swr/swr_draw.cpp  |  7 ++-
src/gallium/drivers/swr/swr_state.cpp | 35 +++
2 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_draw.cpp 
b/src/gallium/drivers/swr/swr_draw.cpp
index 57660c7464..a94cdd6da0 100644
--- a/src/gallium/drivers/swr/swr_draw.cpp
+++ b/src/gallium/drivers/swr/swr_draw.cpp
@@ -52,7 +52,12 @@ swr_draw_vbo(struct pipe_context *pipe, const struct 
pipe_draw_info *info)
  return;
   }

-   /* Update derived state, pass draw info to update function */
+   /* If indexed draw, force vertex validation since index buffer comes
+* from draw info. */
+   if (info->index_size)
+  ctx->dirty |= SWR_NEW_VERTEX;
+
+   /* Update derived state, pass draw info to update function. */
   swr_update_derived(pipe, info);

   swr_update_draw_context(ctx);
diff --git a/src/gallium/drivers/swr/swr_state.cpp 
b/src/gallium/drivers/swr/swr_state.cpp
index c6da4fcb8e..4530d377ee 100644
--- a/src/gallium/drivers/swr/swr_state.cpp
+++ b/src/gallium/drivers/swr/swr_state.cpp
@@ -1204,11 +1204,6 @@ swr_update_derived(struct pipe_context *pipe,
  ctx->api.pfnSwrSetRastState(ctx->swrContext, rastState);
   }

-   /* Scissor */
-   if (ctx->dirty & SWR_NEW_SCISSOR) {
-  ctx->api.pfnSwrSetScissorRects(ctx->swrContext, 1, >swr_scissor);
-   }
-
   /* Viewport */
   if (ctx->dirty & (SWR_NEW_VIEWPORT | SWR_NEW_FRAMEBUFFER
 | SWR_NEW_RASTERIZER)) {
@@ -1249,18 +1244,26 @@ swr_update_derived(struct pipe_context *pipe,
  ctx->api.pfnSwrSetViewports(ctx->swrContext, 1, vp, vpm);
   }

-   /* Set vertex & index buffers
-* (using draw info if called by swr_draw_vbo)
-* If indexed draw, revalidate since index buffer comes from
-* pipe_draw_info.
-*/
-   if (ctx->dirty & SWR_NEW_VERTEX ||
-  (p_draw_info && p_draw_info->index_size)) {
+   /* When called from swr_clear (p_draw_info = null), render targets,
+* rasterState and viewports (dependent on render targets) are the only
+* necessary validation.  Defer remaining validation by setting
+* post_update_dirty_flags and clear all dirty flags.  BackendState is
+* still unconditionally validated below */
+   if (!p_draw_info) {
+  post_update_dirty_flags = ctx->dirty & ~(SWR_NEW_FRAMEBUFFER |
+   SWR_NEW_RASTERIZER |
+   SWR_NEW_VIEWPORT);
+  ctx->dirty = 0;
+   }
+
+   /* Scissor */
+   if (ctx->dirty & SWR_NEW_SCISSOR) {
+  ctx->api.pfnSwrSetScissorRects(ctx->swrContext, 1, >swr_scissor);
+   }

-  /* If being called by swr_draw_vbo, copy draw details */
-  struct pipe_draw_info info = {0};
-  if (p_draw_info)
- info = *p_draw_info;
+   /* Set vertex & index buffers */
+   if (ctx->dirty & SWR_NEW_VERTEX) {
+  const struct pipe_draw_info  = *p_draw_info;

  /* vertex buffers */
  SWR_VERTEX_BUFFER_STATE swrVertexBuffers[PIPE_MAX_ATTRIBS];
--
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/st/nir: assign driver_location for images

2017-11-09 Thread Rob Clark
Signed-off-by: Rob Clark 
---
 src/mesa/state_tracker/st_glsl_to_nir.cpp | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index e9a8d6414e7..b748e13de1b 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -176,6 +176,7 @@ st_nir_assign_uniform_locations(struct gl_program *prog,
 {
int max = 0;
int shaderidx = 0;
+   int imageidx = 0;
 
nir_foreach_variable(uniform, uniform_list) {
   int loc;
@@ -188,10 +189,13 @@ st_nir_assign_uniform_locations(struct gl_program *prog,
   uniform->interface_type != NULL)
  continue;
 
-  if (uniform->type->is_sampler()) {
+  if (uniform->type->is_sampler() || uniform->type->is_image()) {
  unsigned val = 0;
  bool found = shader_program->UniformHash->get(val, uniform->name);
- loc = shaderidx++;
+ if (uniform->type->is_sampler())
+loc = shaderidx++;
+ else
+loc = imageidx++;
  assert(found);
  (void) found; /* silence unused var warning */
  /* this ensure that nir_lower_samplers looks at the correct
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] mesa: s/GLint/gl_buffer_index/ for _ColorDrawBufferIndexes

2017-11-09 Thread Brian Paul
Also fix local variable declarations and replace -1 with BUFFER_NONE.
No Piglit changes.
---
 src/mesa/drivers/common/meta.c   |  2 +-
 src/mesa/main/buffers.c  | 16 
 src/mesa/main/clear.c|  9 +
 src/mesa/main/framebuffer.c  |  4 ++--
 src/mesa/main/mtypes.h   |  2 +-
 src/mesa/state_tracker/st_cb_clear.c |  4 ++--
 src/mesa/state_tracker/st_cb_fbo.c   |  4 ++--
 src/mesa/swrast/s_blit.c |  8 
 src/mesa/swrast/s_renderbuffer.c |  4 ++--
 9 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
index bae04be..1cc736c 100644
--- a/src/mesa/drivers/common/meta.c
+++ b/src/mesa/drivers/common/meta.c
@@ -1655,7 +1655,7 @@ _mesa_meta_drawbuffers_and_colormask(struct gl_context 
*ctx, GLbitfield mask)
enums[0] = GL_NONE;
 
for (int i = 0; i < ctx->DrawBuffer->_NumColorDrawBuffers; i++) {
-  int b = ctx->DrawBuffer->_ColorDrawBufferIndexes[i];
+  gl_buffer_index b = ctx->DrawBuffer->_ColorDrawBufferIndexes[i];
   int colormask_idx = ctx->Extensions.EXT_draw_buffers2 ? i : 0;
 
   if (b < 0 || !(mask & (1 << b)) || is_color_disabled(ctx, colormask_idx))
diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c
index 5c37f0f..d364047 100644
--- a/src/mesa/main/buffers.c
+++ b/src/mesa/main/buffers.c
@@ -170,7 +170,7 @@ draw_buffer_enum_to_bitmask(const struct gl_context *ctx, 
GLenum buffer)
  * Helper routine used by glReadBuffer.
  * Given a GLenum naming a color buffer, return the index of the corresponding
  * renderbuffer (a BUFFER_* value).
- * return -1 for an invalid buffer.
+ * return BUFFER_NONE for an invalid buffer.
  */
 static gl_buffer_index
 read_buffer_enum_to_index(const struct gl_context *ctx, GLenum buffer)
@@ -719,7 +719,7 @@ _mesa_drawbuffers(struct gl_context *ctx, struct 
gl_framebuffer *fb,
if (n > 0 && _mesa_bitcount(destMask[0]) > 1) {
   GLuint count = 0, destMask0 = destMask[0];
   while (destMask0) {
- const int bufIndex = u_bit_scan();
+ const gl_buffer_index bufIndex = u_bit_scan();
  if (fb->_ColorDrawBufferIndexes[count] != bufIndex) {
 updated_drawbuffers(ctx, fb);
 fb->_ColorDrawBufferIndexes[count] = bufIndex;
@@ -733,7 +733,7 @@ _mesa_drawbuffers(struct gl_context *ctx, struct 
gl_framebuffer *fb,
   GLuint count = 0;
   for (buf = 0; buf < n; buf++ ) {
  if (destMask[buf]) {
-GLint bufIndex = ffs(destMask[buf]) - 1;
+gl_buffer_index bufIndex = ffs(destMask[buf]) - 1;
 /* only one bit should be set in the destMask[buf] field */
 assert(_mesa_bitcount(destMask[buf]) == 1);
 if (fb->_ColorDrawBufferIndexes[buf] != bufIndex) {
@@ -743,9 +743,9 @@ _mesa_drawbuffers(struct gl_context *ctx, struct 
gl_framebuffer *fb,
 count = buf + 1;
  }
  else {
-if (fb->_ColorDrawBufferIndexes[buf] != -1) {
+if (fb->_ColorDrawBufferIndexes[buf] != BUFFER_NONE) {
   updated_drawbuffers(ctx, fb);
-   fb->_ColorDrawBufferIndexes[buf] = -1;
+   fb->_ColorDrawBufferIndexes[buf] = BUFFER_NONE;
 }
  }
  fb->ColorDrawBuffer[buf] = buffers[buf];
@@ -753,11 +753,11 @@ _mesa_drawbuffers(struct gl_context *ctx, struct 
gl_framebuffer *fb,
   fb->_NumColorDrawBuffers = count;
}
 
-   /* set remaining outputs to -1 (GL_NONE) */
+   /* set remaining outputs to BUFFER_NONE */
for (buf = fb->_NumColorDrawBuffers; buf < ctx->Const.MaxDrawBuffers; 
buf++) {
-  if (fb->_ColorDrawBufferIndexes[buf] != -1) {
+  if (fb->_ColorDrawBufferIndexes[buf] != BUFFER_NONE) {
  updated_drawbuffers(ctx, fb);
- fb->_ColorDrawBufferIndexes[buf] = -1;
+ fb->_ColorDrawBufferIndexes[buf] = BUFFER_NONE;
   }
}
for (buf = n; buf < ctx->Const.MaxDrawBuffers; buf++) {
diff --git a/src/mesa/main/clear.c b/src/mesa/main/clear.c
index c5e7f13..be60442 100644
--- a/src/mesa/main/clear.c
+++ b/src/mesa/main/clear.c
@@ -194,9 +194,9 @@ clear(struct gl_context *ctx, GLbitfield mask, bool 
no_error)
   if (mask & GL_COLOR_BUFFER_BIT) {
  GLuint i;
  for (i = 0; i < ctx->DrawBuffer->_NumColorDrawBuffers; i++) {
-GLint buf = ctx->DrawBuffer->_ColorDrawBufferIndexes[i];
+gl_buffer_index buf = ctx->DrawBuffer->_ColorDrawBufferIndexes[i];
 
-if (buf >= 0 && color_buffer_writes_enabled(ctx, i)) {
+if (buf != BUFFER_NONE && color_buffer_writes_enabled(ctx, i)) {
bufferMask |= 1 << buf;
 }
  }
@@ -321,9 +321,10 @@ make_color_buffer_mask(struct gl_context *ctx, GLint 
drawbuffer)
   break;
default:
   {
- GLint buf = ctx->DrawBuffer->_ColorDrawBufferIndexes[drawbuffer];
+ gl_buffer_index buf =
+

[Mesa-dev] [PATCH 1/3] mesa: minor reformatting, add const to gl_external_samplers()

2017-11-09 Thread Brian Paul
This function should probably be moved elsewhere, too.
---
 src/mesa/main/mtypes.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index d092630..af9115e 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2532,7 +2532,9 @@ struct gl_linked_shader
struct glsl_symbol_table *symbols;
 };
 
-static inline GLbitfield gl_external_samplers(struct gl_program *prog)
+
+static inline GLbitfield
+gl_external_samplers(const struct gl_program *prog)
 {
GLbitfield external_samplers = 0;
GLbitfield mask = prog->SamplersUsed;
@@ -2546,6 +2548,7 @@ static inline GLbitfield gl_external_samplers(struct 
gl_program *prog)
return external_samplers;
 }
 
+
 /**
  * Compile status enum. compile_skipped is used to indicate the compile
  * was skipped due to the shader matching one that's been seen before by
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] mesa: s/GLint/gl_buffer_index/ for _ColorReadBufferIndex

2017-11-09 Thread Brian Paul
BUFFER_NONE is -1 so no reason for GLint.
---
 src/mesa/main/mtypes.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index af9115e..a8e2b39 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3484,7 +3484,7 @@ struct gl_framebuffer
/** Computed from ColorDraw/ReadBuffer above */
GLuint _NumColorDrawBuffers;
GLint _ColorDrawBufferIndexes[MAX_DRAW_BUFFERS]; /**< BUFFER_x or -1 */
-   GLint _ColorReadBufferIndex; /* -1 = None */
+   gl_buffer_index _ColorReadBufferIndex;
struct gl_renderbuffer *_ColorDrawBuffers[MAX_DRAW_BUFFERS];
struct gl_renderbuffer *_ColorReadBuffer;
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/mesa: remove 'struct' keyword on function parameter

2017-11-09 Thread Brian Paul
st_src_reg is a class, not a struct.  Simply remove 'struct' to silence
a MSVC compiler warning (class vs. struct mismatch).
---
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index ca04765..3dc0237 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -4590,8 +4590,7 @@ glsl_to_tgsi_visitor::simplify_cmp(void)
 }
 
 static void
-rename_temp_handle_src(struct rename_reg_pair *renames,
-   struct st_src_reg *src)
+rename_temp_handle_src(struct rename_reg_pair *renames, st_src_reg *src)
 {
if (src && src->file == PROGRAM_TEMPORARY) {
   int old_idx = src->index;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: do not wait for idle when SURFACE_SYNC is emitted

2017-11-09 Thread Bas Nieuwenhuizen
IIRC if we wait using SURFACE_SYNC other waits in other rings seemed
to sometimes get insanely long, almost like it got stuck behind it.
However, the shader waits don't have this issue.

On Thu, Nov 9, 2017 at 8:00 PM, Marek Olšák  wrote:
> What high priority interactions?
>
> Marek
>
> On Thu, Nov 9, 2017 at 6:22 PM, Bas Nieuwenhuizen
>  wrote:
>> Nack. We had that and Andres removed it due to high priority interactions.
>>
>>
>> On 9 Nov 2017 18:01, "Samuel Pitoiset"  wrote:
>>
>> Copied from RadeonSI.
>>
>> Signed-off-by: Samuel Pitoiset 
>> ---
>>  src/amd/vulkan/si_cmd_buffer.c | 18 --
>>  1 file changed, 12 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c
>> index 89ee399817..f5c04c07a8 100644
>> --- a/src/amd/vulkan/si_cmd_buffer.c
>> +++ b/src/amd/vulkan/si_cmd_buffer.c
>> @@ -973,12 +973,18 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
>> radeon_emit(cs, EVENT_TYPE(V_028A90_FLUSH_AND_INV_DB_META) |
>> EVENT_INDEX(0));
>> }
>>
>> -   if (flush_bits & RADV_CMD_FLAG_PS_PARTIAL_FLUSH) {
>> -   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
>> -   radeon_emit(cs, EVENT_TYPE(V_028A90_PS_PARTIAL_FLUSH) |
>> EVENT_INDEX(4));
>> -   } else if (flush_bits & RADV_CMD_FLAG_VS_PARTIAL_FLUSH) {
>> -   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
>> -   radeon_emit(cs, EVENT_TYPE(V_028A90_VS_PARTIAL_FLUSH) |
>> EVENT_INDEX(4));
>> +   /* Wait for shader engines to go idle.
>> +* VS and PS waits are unnecessary if SURFACE_SYNC is going to wait
>> +* for everything including CB/DB cache flushes.
>> +*/
>> +   if (!flush_cb_db) {
>> +   if (flush_bits & RADV_CMD_FLAG_PS_PARTIAL_FLUSH) {
>> +   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
>> +   radeon_emit(cs,
>> EVENT_TYPE(V_028A90_PS_PARTIAL_FLUSH) | EVENT_INDEX(4));
>> +   } else if (flush_bits & RADV_CMD_FLAG_VS_PARTIAL_FLUSH) {
>> +   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
>> +   radeon_emit(cs,
>> EVENT_TYPE(V_028A90_VS_PARTIAL_FLUSH) | EVENT_INDEX(4));
>> +   }
>> }
>>
>> if (flush_bits & RADV_CMD_FLAG_CS_PARTIAL_FLUSH) {
>> --
>> 2.15.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>>
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Pretend there are 4 subslices for compute shader threads on Gen9+.

2017-11-09 Thread Rafael Antognolli
On Thu, Nov 09, 2017 at 12:59:12AM -0800, Jordan Justen wrote:
> Reviewed-by: Jordan Justen 

It's also

Tested-by: Rafael Antognolli 

> On 2017-11-08 10:56:00, Kenneth Graunke wrote:
> > Similar to what we did for pixel shader threads - see gen_device_info.c.
> > 
> > We don't want to bump the actual Maximum Number of Threads though, so
> > we adjust it here.  For pixel shaders, we don't use max_wm_threads, so
> > we could just bump it globally.
> > 
> > Fixes Piglit tests:
> > arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-i64vec3-int64_t
> > arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-i64vec4-int64_t
> > arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-u64vec4-uint64_t
> > ---
> >  src/mesa/drivers/dri/i965/brw_program.c | 14 +-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
> > b/src/mesa/drivers/dri/i965/brw_program.c
> > index 7607bc38840..5ecfb9f5b11 100644
> > --- a/src/mesa/drivers/dri/i965/brw_program.c
> > +++ b/src/mesa/drivers/dri/i965/brw_program.c
> > @@ -357,7 +357,19 @@ brw_alloc_stage_scratch(struct brw_context *brw,
> >thread_count = devinfo->max_wm_threads;
> >break;
> > case MESA_SHADER_COMPUTE: {
> > -  const unsigned subslices = MAX2(brw->screen->subslice_total, 1);
> > +  unsigned subslices = MAX2(brw->screen->subslice_total, 1);
> > +
> > +  /* The documentation for 3DSTATE_PS "Scratch Space Base Pointer" 
> > says:
> > +   *
> > +   * "Scratch Space per slice is computed based on 4 sub-slices.  SW 
> > must
> > +   *  allocate scratch space enough so that each slice has 4 slices
> > +   *  allowed."
> > +   *
> > +   * According to the other driver team, this applies to compute 
> > shaders
> > +   * as well.  This is not currently documented at all.
> > +   */
> > +  if (devinfo->gen >= 9)
> > + subslices = 4;
> >  
> >/* WaCSScratchSize:hsw
> > *
> > -- 
> > 2.15.0
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] anv/gen10: Enable float blend optimization

2017-11-09 Thread Anuj Phogat
On CNL this bit has been moved to CACHE_MODE_SS register.
We already have this enabled in OpenGL driver.
See Mesa commit 6c681b4cc1

Signed-off-by: Anuj Phogat 
Cc: Nanley Chery 
Cc: Rafael Antognolli 
---
 src/intel/genxml/gen10.xml| 12 
 src/intel/vulkan/genX_state.c | 12 
 2 files changed, 24 insertions(+)

diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml
index a7ae49ae65..a6b8f48fda 100644
--- a/src/intel/genxml/gen10.xml
+++ b/src/intel/genxml/gen10.xml
@@ -3752,4 +3752,16 @@
 
   
 
+  
+
+
+
+
+
+
+
+
+
+  
+
 
diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c
index f56c686ed3..54fb8634fd 100644
--- a/src/intel/vulkan/genX_state.c
+++ b/src/intel/vulkan/genX_state.c
@@ -121,6 +121,18 @@ genX(init_device_state)(struct anv_device *device)
}
 #endif
 
+#if GEN_GEN == 10
+   uint32_t cache_mode_ss;
+   anv_pack_struct(_mode_ss, GENX(CACHE_MODE_SS),
+   .FloatBlendOptimizationEnable = true,
+   .FloatBlendOptimizationEnableMask = true);
+
+   anv_batch_emit(, GENX(MI_LOAD_REGISTER_IMM), lri) {
+  lri.RegisterOffset = GENX(CACHE_MODE_SS_num);
+  lri.DataDWord  = cache_mode_ss;
+   }
+#endif
+
anv_batch_emit(, GENX(3DSTATE_AA_LINE_PARAMETERS), aa);
 
anv_batch_emit(, GENX(3DSTATE_DRAWING_RECTANGLE), rect) {
-- 
2.13.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/dri: fix android fence regression

2017-11-09 Thread Marek Olšák
From: Marek Olšák 

Fixes piglit - egl_khr_fence_sync/android_native tests.
Broken by 884a0b2a9e55d4c1ca39475b50d9af598d7d7280.
---
 src/gallium/include/state_tracker/st_api.h   | 2 ++
 src/gallium/state_trackers/dri/dri_helpers.c | 2 +-
 src/mesa/state_tracker/st_manager.c  | 7 +--
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/gallium/include/state_tracker/st_api.h 
b/src/gallium/include/state_tracker/st_api.h
index 11a9878..6cdaefc 100644
--- a/src/gallium/include/state_tracker/st_api.h
+++ b/src/gallium/include/state_tracker/st_api.h
@@ -140,20 +140,22 @@ enum st_attachment_type {
 #define ST_ATTACHMENT_DEPTH_STENCIL_MASK  (1 << ST_ATTACHMENT_DEPTH_STENCIL)
 #define ST_ATTACHMENT_ACCUM_MASK  (1 << ST_ATTACHMENT_ACCUM)
 #define ST_ATTACHMENT_SAMPLE_MASK (1 << ST_ATTACHMENT_SAMPLE)
 
 /**
  * Flush flags.
  */
 #define ST_FLUSH_FRONT(1 << 0)
 #define ST_FLUSH_END_OF_FRAME (1 << 1)
 #define ST_FLUSH_WAIT (1 << 2)
+#define ST_FLUSH_DEFERRED (1 << 3)
+#define ST_FLUSH_FENCE_FD (1 << 4)
 
 /**
  * Value to st_manager->get_param function.
  */
 enum st_manager_param {
/**
 * The dri state tracker on old libGL's doesn't do the right thing
 * with regards to invalidating the framebuffers.
 *
 * For the mesa state tracker that means that it needs to invalidate
diff --git a/src/gallium/state_trackers/dri/dri_helpers.c 
b/src/gallium/state_trackers/dri/dri_helpers.c
index a9213ec..4a61455 100644
--- a/src/gallium/state_trackers/dri/dri_helpers.c
+++ b/src/gallium/state_trackers/dri/dri_helpers.c
@@ -109,21 +109,21 @@ dri2_create_fence(__DRIcontext *_ctx)
 
 static void *
 dri2_create_fence_fd(__DRIcontext *_ctx, int fd)
 {
struct st_context_iface *stapi = dri_context(_ctx)->st;
struct pipe_context *ctx = stapi->pipe;
struct dri2_fence *fence = CALLOC_STRUCT(dri2_fence);
 
if (fd == -1) {
   /* exporting driver created fence, flush: */
-  stapi->flush(stapi, PIPE_FLUSH_DEFERRED | PIPE_FLUSH_FENCE_FD,
+  stapi->flush(stapi, ST_FLUSH_DEFERRED | ST_FLUSH_FENCE_FD,
>pipe_fence);
} else {
   /* importing a foreign fence fd: */
   ctx->create_fence_fd(ctx, >pipe_fence, fd);
}
if (!fence->pipe_fence) {
   FREE(fence);
   return NULL;
}
 
diff --git a/src/mesa/state_tracker/st_manager.c 
b/src/mesa/state_tracker/st_manager.c
index 953f715..8f5ded4 100644
--- a/src/mesa/state_tracker/st_manager.c
+++ b/src/mesa/state_tracker/st_manager.c
@@ -625,23 +625,26 @@ st_framebuffers_purge(struct st_context *st)
}
 }
 
 static void
 st_context_flush(struct st_context_iface *stctxi, unsigned flags,
  struct pipe_fence_handle **fence)
 {
struct st_context *st = (struct st_context *) stctxi;
unsigned pipe_flags = 0;
 
-   if (flags & ST_FLUSH_END_OF_FRAME) {
+   if (flags & ST_FLUSH_END_OF_FRAME)
   pipe_flags |= PIPE_FLUSH_END_OF_FRAME;
-   }
+   if (flags & ST_FLUSH_DEFERRED)
+  pipe_flags |= PIPE_FLUSH_DEFERRED;
+   if (flags & ST_FLUSH_FENCE_FD)
+  pipe_flags |= PIPE_FLUSH_FENCE_FD;
 
FLUSH_VERTICES(st->ctx, 0);
FLUSH_CURRENT(st->ctx, 0);
st_flush(st, fence, pipe_flags);
 
if ((flags & ST_FLUSH_WAIT) && fence && *fence) {
   st->pipe->screen->fence_finish(st->pipe->screen, NULL, *fence,
  PIPE_TIMEOUT_INFINITE);
   st->pipe->screen->fence_reference(st->pipe->screen, fence, NULL);
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] anv/gen10: Implement WaSampleOffsetIZ workaround

2017-11-09 Thread Anuj Phogat
We already have this workaround in OpenGL driver.
See Mesa commit 3cf4fe2219.

Signed-off-by: Anuj Phogat 
Cc: Nanley Chery 
Cc: Rafael Antognolli 
---
 src/intel/vulkan/genX_state.c | 61 +++
 1 file changed, 61 insertions(+)

diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c
index b7e4e5bcea..f56c686ed3 100644
--- a/src/intel/vulkan/genX_state.c
+++ b/src/intel/vulkan/genX_state.c
@@ -35,6 +35,59 @@
 
 #include "vk_util.h"
 
+#if GEN_GEN == 10
+/**
+ * From Gen10 Workarounds page in h/w specs:
+ * WaSampleOffsetIZ:
+ *"Prior to the 3DSTATE_SAMPLE_PATTERN driver must ensure there are no
+ * markers in the pipeline by programming a PIPE_CONTROL with stall."
+ */
+static void
+gen10_emit_wa_cs_stall_flush(struct anv_batch *batch)
+{
+
+   anv_batch_emit(batch, GENX(PIPE_CONTROL), pc) {
+  pc.CommandStreamerStallEnable = true;
+  pc.StallAtPixelScoreboard = true;
+   }
+}
+
+/**
+ * From Gen10 Workarounds page in h/w specs:
+ * WaSampleOffsetIZ:_cs_stall_flush
+ *"When 3DSTATE_SAMPLE_PATTERN is programmed, driver must then issue an
+ * MI_LOAD_REGISTER_IMM command to an offset between 0x7000 and 0x7FFF(SVL)
+ * after the command to ensure the state has been delivered prior to any
+ * command causing a marker in the pipeline."
+ */
+static void
+gen10_emit_wa_lri_to_cache_mode_zero(struct anv_batch *batch)
+{
+   /* Before changing the value of CACHE_MODE_0 register, GFX pipeline must
+* be idle; i.e., full flush is required.
+*/
+   anv_batch_emit(batch, GENX(PIPE_CONTROL), pc) {
+  pc.DepthCacheFlushEnable = true;
+  pc.DCFlushEnable = true;
+  pc.RenderTargetCacheFlushEnable = true;
+  pc.InstructionCacheInvalidateEnable = true;
+  pc.StateCacheInvalidationEnable = true;
+  pc.TextureCacheInvalidationEnable = true;
+  pc.VFCacheInvalidationEnable = true;
+  pc.ConstantCacheInvalidationEnable =true;
+   }
+
+   /* Write to CACHE_MODE_0 (0x7000) */
+   uint32_t cache_mode_0 = 0;
+   anv_pack_struct(_mode_0, GENX(CACHE_MODE_0));
+
+   anv_batch_emit(batch, GENX(MI_LOAD_REGISTER_IMM), lri) {
+  lri.RegisterOffset = GENX(CACHE_MODE_0_num);
+  lri.DataDWord  = cache_mode_0;
+   }
+}
+#endif
+
 VkResult
 genX(init_device_state)(struct anv_device *device)
 {
@@ -82,6 +135,10 @@ genX(init_device_state)(struct anv_device *device)
 #if GEN_GEN >= 8
anv_batch_emit(, GENX(3DSTATE_WM_CHROMAKEY), ck);
 
+#if GEN_GEN == 10
+   gen10_emit_wa_cs_stall_flush();
+#endif
+
/* See the Vulkan 1.0 spec Table 24.1 "Standard sample locations" and
 * VkPhysicalDeviceFeatures::standardSampleLocations.
 */
@@ -96,6 +153,10 @@ genX(init_device_state)(struct anv_device *device)
}
 #endif
 
+#if GEN_GEN == 10
+   gen10_emit_wa_lri_to_cache_mode_zero();
+#endif
+
anv_batch_emit(, GENX(MI_BATCH_BUFFER_END), bbe);
 
assert(batch.next <= batch.end);
-- 
2.13.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] intel/fs: Don't let undefined values prevent copy propagation.

2017-11-09 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Fri, Oct 27, 2017 at 5:05 PM, Francisco Jerez 
wrote:

> This makes the dataflow propagation logic of the copy propagation pass
> more intelligent in cases where the destination of a copy is known to
> be undefined for some incoming CFG edges, building upon the
> definedness information provided by the last patch.  Helps a few
> programs, and avoids a handful shader-db regressions from the next
> patch.
>
> shader-db results on ILK:
>
>   total instructions in shared programs: 6541547 -> 6541523 (-0.00%)
>   instructions in affected programs: 360 -> 336 (-6.67%)
>   helped: 8
>   HURT: 0
>
>   LOST:   0
>   GAINED: 10
>
> shader-db results on BDW:
>
>   total instructions in shared programs: 8174323 -> 8173882 (-0.01%)
>   instructions in affected programs: 7730 -> 7289 (-5.71%)
>   helped: 5
>   HURT: 2
>
>   LOST:   0
>   GAINED: 4
>
> shader-db results on SKL:
>
>   total instructions in shared programs: 8185669 -> 8184598 (-0.01%)
>   instructions in affected programs: 10364 -> 9293 (-10.33%)
>   helped: 5
>   HURT: 2
>
>   LOST:   0
>   GAINED: 2
> ---
>  src/intel/compiler/brw_fs_copy_propagation.cpp | 50
> --
>  1 file changed, 47 insertions(+), 3 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp
> b/src/intel/compiler/brw_fs_copy_propagation.cpp
> index cb117396089..5897cff35be 100644
> --- a/src/intel/compiler/brw_fs_copy_propagation.cpp
> +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
> @@ -36,9 +36,12 @@
>
>  #include "util/bitset.h"
>  #include "brw_fs.h"
> +#include "brw_fs_live_variables.h"
>  #include "brw_cfg.h"
>  #include "brw_eu.h"
>
> +using namespace brw;
> +
>  namespace { /* avoid conflict with opt_copy_propagation_elements */
>  struct acp_entry : public exec_node {
> fs_reg dst;
> @@ -77,12 +80,19 @@ struct block_data {
>  * course of this block.
>  */
> BITSET_WORD *kill;
> +
> +   /**
> +* Which entries in the fs_copy_prop_dataflow acp table are guaranteed
> to
> +* have a fully uninitialized destination at the end of this block.
> +*/
> +   BITSET_WORD *undef;
>  };
>
>  class fs_copy_prop_dataflow
>  {
>  public:
> fs_copy_prop_dataflow(void *mem_ctx, cfg_t *cfg,
> + const fs_live_variables *live,
>   exec_list *out_acp[ACP_HASH_SIZE]);
>
> void setup_initial_values();
> @@ -92,6 +102,7 @@ public:
>
> void *mem_ctx;
> cfg_t *cfg;
> +   const fs_live_variables *live;
>
> acp_entry **acp;
> int num_acp;
> @@ -102,8 +113,9 @@ public:
>  } /* anonymous namespace */
>
>  fs_copy_prop_dataflow::fs_copy_prop_dataflow(void *mem_ctx, cfg_t *cfg,
> + const fs_live_variables
> *live,
>   exec_list
> *out_acp[ACP_HASH_SIZE])
> -   : mem_ctx(mem_ctx), cfg(cfg)
> +   : mem_ctx(mem_ctx), cfg(cfg), live(live)
>  {
> bd = rzalloc_array(mem_ctx, struct block_data, cfg->num_blocks);
>
> @@ -124,6 +136,7 @@ fs_copy_prop_dataflow::fs_copy_prop_dataflow(void
> *mem_ctx, cfg_t *cfg,
>bd[block->num].liveout = rzalloc_array(bd, BITSET_WORD,
> bitset_words);
>bd[block->num].copy = rzalloc_array(bd, BITSET_WORD, bitset_words);
>bd[block->num].kill = rzalloc_array(bd, BITSET_WORD, bitset_words);
> +  bd[block->num].undef = rzalloc_array(bd, BITSET_WORD, bitset_words);
>
>for (int i = 0; i < ACP_HASH_SIZE; i++) {
>   foreach_in_list(acp_entry, entry, _acp[block->num][i]) {
> @@ -189,6 +202,18 @@ fs_copy_prop_dataflow::setup_initial_values()
>   }
>}
> }
> +
> +   /* Initialize the undef set. */
> +   foreach_block (block, cfg) {
> +  for (int i = 0; i < num_acp; i++) {
> + BITSET_SET(bd[block->num].undef, i);
> + for (unsigned off = 0; off < acp[i]->size_written; off +=
> REG_SIZE) {
> +if (BITSET_TEST(live->block_data[block->num].defout,
> +live->var_from_reg(byte_offset(acp[i]->dst,
> off
> +   BITSET_CLEAR(bd[block->num].undef, i);
> + }
> +  }
> +   }
>  }
>
>  /**
> @@ -229,13 +254,30 @@ fs_copy_prop_dataflow::run()
>
>   for (int i = 0; i < bitset_words; i++) {
>  const BITSET_WORD old_livein = bd[block->num].livein[i];
> +BITSET_WORD livein_from_any_block = 0;
>
>  bd[block->num].livein[i] = ~0u;
>  foreach_list_typed(bblock_link, parent_link, link,
> >parents) {
> bblock_t *parent = parent_link->block;
> -   bd[block->num].livein[i] &= bd[parent->num].liveout[i];
> +   /* Consider ACP entries with a known-undefined destination
> to
> +* be available from the parent.  This is valid because
> we're
> +* free to set the undefined variable equal to the source
> of
> +* the ACP entry without 

Re: [Mesa-dev] [PATCH v2 07/26] winsys/amdgpu: handle cs_add_fence_dependency for deferred/unsubmitted fences

2017-11-09 Thread Marek Olšák
FYI, this breaks:
piglit/bin/bufferstorage-persistent read -auto

and a bunch of others.

Marek

On Mon, Nov 6, 2017 at 11:23 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> The idea is to fix the following interleaving of operations
> that can arise from deferred fences:
>
>  Thread 1 / Context 1  Thread 2 / Context 2
>    
>  f = deferred flush
>  <--- application-side synchronization --->
>fence_server_sync(f)
>...
>flush()
>  flush()
>
> We will now stall in fence_server_sync until the flush of context 1
> has completed.
>
> This scenario was unlikely to occur previously, because applications
> seem to be doing
>
>  Thread 1 / Context 1  Thread 2 / Context 2
>    
>  f = glFenceSync()
>  glFlush()
>  <--- application-side synchronization --->
>glWaitSync(f)
>
> ... and indeed they probably *have* to use this ordering to avoid
> deadlocks in the GLX model, where all GL operations conceptually
> go through a single connection to the X server. However, it's less
> clear whether applications have to do this with other WSI (i.e. EGL).
> Besides, even this sequence of GL commands can be translated into
> the Gallium-level sequence outlined above when Gallium threading
> and asynchronous flushes are used. So it makes sense to be more
> robust.
>
> As a side effect, we no longer busy-wait on submission_in_progress.
>
> We won't enable asynchronous flushes on radeon, but add a
> cs_add_fence_dependency stub anyway to document the potential
> issue.
> ---
>  src/gallium/drivers/radeon/radeon_winsys.h|  4 +++-
>  src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 21 +
>  src/gallium/winsys/amdgpu/drm/amdgpu_cs.h |  9 ++---
>  src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 19 +++
>  4 files changed, 41 insertions(+), 12 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
> b/src/gallium/drivers/radeon/radeon_winsys.h
> index 2d3f646dc65..e8c486cb7f4 100644
> --- a/src/gallium/drivers/radeon/radeon_winsys.h
> +++ b/src/gallium/drivers/radeon/radeon_winsys.h
> @@ -536,21 +536,23 @@ struct radeon_winsys {
>   * \return Negative POSIX error code or 0 for success.
>   * Asynchronous submissions never return an error.
>   */
>  int (*cs_flush)(struct radeon_winsys_cs *cs,
>  unsigned flags,
>  struct pipe_fence_handle **fence);
>
>  /**
>   * Create a fence before the CS is flushed.
>   * The user must flush manually to complete the initializaton of the 
> fence.
> - * The fence must not be used before the flush.
> + *
> + * The fence must not be used for anything except \ref 
> cs_add_fence_dependency
> + * before the flush.
>   */
>  struct pipe_fence_handle *(*cs_get_next_fence)(struct radeon_winsys_cs 
> *cs);
>
>  /**
>   * Return true if a buffer is referenced by a command stream.
>   *
>   * \param csA command stream.
>   * \param buf   A winsys buffer.
>   */
>  bool (*cs_is_buffer_referenced)(struct radeon_winsys_cs *cs,
> diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
> b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
> index 0450ccc3596..0628e547351 100644
> --- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
> +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
> @@ -43,21 +43,22 @@ amdgpu_fence_create(struct amdgpu_ctx *ctx, unsigned 
> ip_type,
>  {
> struct amdgpu_fence *fence = CALLOC_STRUCT(amdgpu_fence);
>
> fence->reference.count = 1;
> fence->ws = ctx->ws;
> fence->ctx = ctx;
> fence->fence.context = ctx->ctx;
> fence->fence.ip_type = ip_type;
> fence->fence.ip_instance = ip_instance;
> fence->fence.ring = ring;
> -   fence->submission_in_progress = true;
> +   util_queue_fence_init(>submitted);
> +   util_queue_fence_reset(>submitted);
> p_atomic_inc(>refcount);
> return (struct pipe_fence_handle *)fence;
>  }
>
>  static struct pipe_fence_handle *
>  amdgpu_fence_import_sync_file(struct radeon_winsys *rws, int fd)
>  {
> struct amdgpu_winsys *ws = amdgpu_winsys(rws);
> struct amdgpu_fence *fence = CALLOC_STRUCT(amdgpu_fence);
>
> @@ -74,66 +75,69 @@ amdgpu_fence_import_sync_file(struct radeon_winsys *rws, 
> int fd)
>FREE(fence);
>return NULL;
> }
>
> r = amdgpu_cs_syncobj_import_sync_file(ws->dev, fence->syncobj, fd);
> if (r) {
>amdgpu_cs_destroy_syncobj(ws->dev, fence->syncobj);
>FREE(fence);
>return NULL;
> }
> +
> +   util_queue_fence_init(>submitted);
> +
> return (struct pipe_fence_handle*)fence;
>  }
>
>  static int amdgpu_fence_export_sync_file(struct radeon_winsys *rws,
>

[Mesa-dev] [PATCH 3/5] r600: use ieee version of rcp

2017-11-09 Thread sroland
From: Roland Scheidegger 

r600 used the clamped version for rcp, whereas both evergreen and cayman
used the ieee version. I don't know why that discrepancy exists (it does so
since day 1) but there does not seem to be a valid reason for this, so make
it consistent. This seems now safer than before the previous commit (using
the dx10 clamp bit).
Note that rsq still uses clamped version (as before even though the table
may have suggested otherwise for evergreen) for r600/eg, but not for cayman.
Will be changed separately for better regression tracking...
---
 src/gallium/drivers/r600/r600_shader.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index e9054c4fbb..2ece2210a6 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -8830,11 +8830,7 @@ static const struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[]
[TGSI_OPCODE_MOV]   = { ALU_OP1_MOV, tgsi_op2},
[TGSI_OPCODE_LIT]   = { ALU_OP0_NOP, tgsi_lit},
 
-   /* XXX:
-* For state trackers other than OpenGL, we'll want to use
-* _RECIP_IEEE instead.
-*/
-   [TGSI_OPCODE_RCP]   = { ALU_OP1_RECIP_CLAMPED, 
tgsi_trans_srcx_replicate},
+   [TGSI_OPCODE_RCP]   = { ALU_OP1_RECIP_IEEE, 
tgsi_trans_srcx_replicate},
 
[TGSI_OPCODE_RSQ]   = { ALU_OP0_NOP, tgsi_rsq},
[TGSI_OPCODE_EXP]   = { ALU_OP0_NOP, tgsi_exp},
@@ -9035,7 +9031,7 @@ static const struct r600_shader_tgsi_instruction 
eg_shader_tgsi_instruction[] =
[TGSI_OPCODE_MOV]   = { ALU_OP1_MOV, tgsi_op2},
[TGSI_OPCODE_LIT]   = { ALU_OP0_NOP, tgsi_lit},
[TGSI_OPCODE_RCP]   = { ALU_OP1_RECIP_IEEE, 
tgsi_trans_srcx_replicate},
-   [TGSI_OPCODE_RSQ]   = { ALU_OP1_RECIPSQRT_IEEE, tgsi_rsq},
+   [TGSI_OPCODE_RSQ]   = { ALU_OP0_NOP, tgsi_rsq},
[TGSI_OPCODE_EXP]   = { ALU_OP0_NOP, tgsi_exp},
[TGSI_OPCODE_LOG]   = { ALU_OP0_NOP, tgsi_log},
[TGSI_OPCODE_MUL]   = { ALU_OP2_MUL_IEEE, tgsi_op2},
-- 
2.12.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: do not wait for idle when SURFACE_SYNC is emitted

2017-11-09 Thread Marek Olšák
What high priority interactions?

Marek

On Thu, Nov 9, 2017 at 6:22 PM, Bas Nieuwenhuizen
 wrote:
> Nack. We had that and Andres removed it due to high priority interactions.
>
>
> On 9 Nov 2017 18:01, "Samuel Pitoiset"  wrote:
>
> Copied from RadeonSI.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/si_cmd_buffer.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c
> index 89ee399817..f5c04c07a8 100644
> --- a/src/amd/vulkan/si_cmd_buffer.c
> +++ b/src/amd/vulkan/si_cmd_buffer.c
> @@ -973,12 +973,18 @@ si_cs_emit_cache_flush(struct radeon_winsys_cs *cs,
> radeon_emit(cs, EVENT_TYPE(V_028A90_FLUSH_AND_INV_DB_META) |
> EVENT_INDEX(0));
> }
>
> -   if (flush_bits & RADV_CMD_FLAG_PS_PARTIAL_FLUSH) {
> -   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
> -   radeon_emit(cs, EVENT_TYPE(V_028A90_PS_PARTIAL_FLUSH) |
> EVENT_INDEX(4));
> -   } else if (flush_bits & RADV_CMD_FLAG_VS_PARTIAL_FLUSH) {
> -   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
> -   radeon_emit(cs, EVENT_TYPE(V_028A90_VS_PARTIAL_FLUSH) |
> EVENT_INDEX(4));
> +   /* Wait for shader engines to go idle.
> +* VS and PS waits are unnecessary if SURFACE_SYNC is going to wait
> +* for everything including CB/DB cache flushes.
> +*/
> +   if (!flush_cb_db) {
> +   if (flush_bits & RADV_CMD_FLAG_PS_PARTIAL_FLUSH) {
> +   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
> +   radeon_emit(cs,
> EVENT_TYPE(V_028A90_PS_PARTIAL_FLUSH) | EVENT_INDEX(4));
> +   } else if (flush_bits & RADV_CMD_FLAG_VS_PARTIAL_FLUSH) {
> +   radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
> +   radeon_emit(cs,
> EVENT_TYPE(V_028A90_VS_PARTIAL_FLUSH) | EVENT_INDEX(4));
> +   }
> }
>
> if (flush_bits & RADV_CMD_FLAG_CS_PARTIAL_FLUSH) {
> --
> 2.15.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] r600: use ieee version of rsq

2017-11-09 Thread sroland
From: Roland Scheidegger 

Both r600 and evergreen used the clamped version, whereas cayman used the
ieee one. I don't think there's a valid reason for this discrepancy, so let's
switch to the ieee version for r600 and evergreen too, since we generally
want to stick to ieee arithmetic.
With this, behavior for both rcp and rsq should now be the same for all of
r600, eg, cm, all using ieee versions (albeit note rsq retains the abs
behavior for everybody, which may not be a good idea ultimately).
---
 src/gallium/drivers/r600/r600_shader.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 2ece2210a6..3f42654d13 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -4796,11 +4796,7 @@ static int tgsi_rsq(struct r600_shader_ctx *ctx)
 
memset(, 0, sizeof(struct r600_bytecode_alu));
 
-   /* XXX:
-* For state trackers other than OpenGL, we'll want to use
-* _RECIPSQRT_IEEE instead.
-*/
-   alu.op = ALU_OP1_RECIPSQRT_CLAMPED;
+   alu.op = ALU_OP1_RECIPSQRT_IEEE;
 
for (i = 0; i < inst->Instruction.NumSrcRegs; i++) {
r600_bytecode_src([i], >src[i], 0);
-- 
2.12.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] r600: use min_dx10/max_dx10 instead of min/max

2017-11-09 Thread sroland
From: Roland Scheidegger 

I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-dx10 ones will pick the NaN (verified by newly changed piglit
isinf-and-isnan test).
Other "modern" drivers will most likely do the same.
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.
---
 src/gallium/drivers/r600/r600_shader.c  | 13 +++--
 src/gallium/drivers/r600/sb/sb_expr.cpp |  2 ++
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 188fbc9d47..e9054c4fbb 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -8844,8 +8844,9 @@ static const struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[]
[TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
-   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
-   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
+   /* MIN_DX10 returns non-nan result if one src is NaN, MIN returns NaN */
+   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
+   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
[TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
[TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
[TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
@@ -9042,8 +9043,8 @@ static const struct r600_shader_tgsi_instruction 
eg_shader_tgsi_instruction[] =
[TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
-   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
-   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
+   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
+   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
[TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
[TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
[TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
@@ -9265,8 +9266,8 @@ static const struct r600_shader_tgsi_instruction 
cm_shader_tgsi_instruction[] =
[TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
-   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
-   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
+   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
+   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
[TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
[TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
[TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp 
b/src/gallium/drivers/r600/sb/sb_expr.cpp
index 3dd3a4815b..7a5d62c8e8 100644
--- a/src/gallium/drivers/r600/sb/sb_expr.cpp
+++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
@@ -753,7 +753,9 @@ bool expr_handler::fold_alu_op2(alu_node& n) {
n.bc.src[0].abs == n.bc.src[1].abs) {
switch (n.bc.op) {
case ALU_OP2_MIN: // (MIN x, x) => (MOV x)
+   case ALU_OP2_MIN_DX10:
case ALU_OP2_MAX:
+   case ALU_OP2_MAX_DX10:
convert_to_mov(n, v0, n.bc.src[0].neg, 
n.bc.src[0].abs);
return fold_alu_op1(n);
case ALU_OP2_ADD:  // (ADD x, x) => (MUL x, 2)
-- 
2.12.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] r600: set the number type correctly for float rts in cb setup

2017-11-09 Thread sroland
From: Roland Scheidegger 

Float rts were always set as unorm instead of float.
Not sure of the consequences, but at least it looks like the blend clamp
would have been enabled, which is against the rules (only eg really bothered
to even attempt to specify this correctly, r600 always used clamp anyway).
Albeit r600 (not r700) setup still looks bugged to me due to never setting
BLEND_FLOAT32 which must be set according to docs...
Not sure if the hw really cares, no piglit change (on eg/juniper).
---
 src/gallium/drivers/r600/evergreen_state.c |  7 ++-
 src/gallium/drivers/r600/r600_state.c  | 10 +-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index ef323bf4f6..e724cb157f 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1042,7 +1042,7 @@ static void evergreen_set_color_surface_buffer(struct 
r600_context *rctx,
}
}
ntype = V_028C70_NUMBER_UNORM;
-   if (desc->colorspace == UTIL_FORMAT_COLORSPACE_SRGB)
+   if (desc->colorspace == UTIL_FORMAT_COLORSPACE_SRGB)
ntype = V_028C70_NUMBER_SRGB;
else if (desc->channel[i].type == UTIL_FORMAT_TYPE_SIGNED) {
if (desc->channel[i].normalized)
@@ -1054,7 +1054,10 @@ static void evergreen_set_color_surface_buffer(struct 
r600_context *rctx,
ntype = V_028C70_NUMBER_UNORM;
else if (desc->channel[i].pure_integer)
ntype = V_028C70_NUMBER_UINT;
+   } else if (desc->channel[i].type == UTIL_FORMAT_TYPE_FLOAT) {
+   ntype = V_028C70_NUMBER_FLOAT;
}
+
pitch = (pitch / 8) - 1;
color->pitch = S_028C64_PITCH_TILE_MAX(pitch);
 
@@ -1180,6 +1183,8 @@ static void evergreen_set_color_surface_common(struct 
r600_context *rctx,
ntype = V_028C70_NUMBER_UNORM;
else if (desc->channel[i].pure_integer)
ntype = V_028C70_NUMBER_UINT;
+   } else if (desc->channel[i].type == UTIL_FORMAT_TYPE_FLOAT) {
+   ntype = V_028C70_NUMBER_FLOAT;
}
 
if (R600_BIG_ENDIAN)
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index db3d6db70b..f024987a30 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -817,7 +817,7 @@ static void r600_init_color_surface(struct r600_context 
*rctx,
unsigned offset;
const struct util_format_description *desc;
int i;
-   bool blend_bypass = 0, blend_clamp = 1, do_endian_swap = FALSE;
+   bool blend_bypass = 0, blend_clamp = 0, do_endian_swap = FALSE;
 
if (rtex->db_compatible && !r600_can_sample_zs(rtex, false)) {
r600_init_flushed_depth_texture(>b.b, surf->base.texture, 
NULL);
@@ -869,6 +869,8 @@ static void r600_init_color_surface(struct r600_context 
*rctx,
ntype = V_0280A0_NUMBER_UNORM;
else if (desc->channel[i].pure_integer)
ntype = V_0280A0_NUMBER_UINT;
+   } else if (desc->channel[i].type == UTIL_FORMAT_TYPE_FLOAT) {
+   ntype = V_0280A0_NUMBER_FLOAT;
}
 
if (R600_BIG_ENDIAN)
@@ -883,6 +885,11 @@ static void r600_init_color_surface(struct r600_context 
*rctx,
 
endian = r600_colorformat_endian_swap(format, do_endian_swap);
 
+   /* blend clamp should be set for all NORM/SRGB types */
+   if (ntype == V_0280A0_NUMBER_UNORM || ntype == V_0280A0_NUMBER_SNORM ||
+   ntype == V_0280A0_NUMBER_SRGB)
+   blend_clamp = 1;
+
/* set blend bypass according to docs if SINT/UINT or
   8/24 COLOR variants */
if (ntype == V_0280A0_NUMBER_UINT || ntype == V_0280A0_NUMBER_SINT ||
@@ -916,6 +923,7 @@ static void r600_init_color_surface(struct r600_context 
*rctx,
 ntype != V_0280A0_NUMBER_UINT &&
 ntype != V_0280A0_NUMBER_SINT) &&
G_0280A0_BLEND_CLAMP(color_info) &&
+   /* XXX this condition is always true since BLEND_FLOAT32 is 
never set (bug?). */
!G_0280A0_BLEND_FLOAT32(color_info)) {
color_info |= 
S_0280A0_SOURCE_FORMAT(V_0280A0_EXPORT_NORM);
surf->export_16bpc = true;
-- 
2.12.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] r600: use DX10_CLAMP bit in shader setup

2017-11-09 Thread sroland
From: Roland Scheidegger 

The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this bit set, if an instruction has the CLAMP modifier bit (which
clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result
will be NaN.
D3D10 would require this, glsl doesn't have modifiers (with mesa
clamp(x,0,1) would get converted to such a modifier) coupled with a
whatever-floats-your-boat specified NaN behavior, but the clamp behavior
should probably always be used (this also matches what a decomposition into
min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of
picking the non-nan result).
Some apps may in fact rely on this, as this prevents misrenderings in
This War of Mine since using ieee muls
(ce7a045feeef8cad155f1c9aa07f166e146e3d00), without having to use clamped
rcp opcode, which would also fix this bug there.
radeonsi also seems to set this bit nowadays if I see that righ (albeit the
llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0"
instead of "Do not clamp NAN to 0" since it was changed, which also looks
a bit misleading).

v2: set it in all shader stages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544
---
 src/gallium/drivers/r600/evergreen_state.c | 6 ++
 src/gallium/drivers/r600/r600_state.c  | 9 +
 2 files changed, 15 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 96eb35a981..ef323bf4f6 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -3235,6 +3235,7 @@ void evergreen_update_ps_state(struct pipe_context *ctx, 
struct r600_pipe_shader
r600_store_value(cb, /* R_028844_SQ_PGM_RESOURCES_PS */
 S_028844_NUM_GPRS(rshader->bc.ngpr) |
 S_028844_PRIME_CACHE_ON_DRAW(1) |
+S_028844_DX10_CLAMP(1) |
 S_028844_STACK_SIZE(rshader->bc.nstack));
/* After that, the NOP relocation packet must be emitted (shader->bo, 
RADEON_USAGE_READ). */
 
@@ -3255,6 +3256,7 @@ void evergreen_update_es_state(struct pipe_context *ctx, 
struct r600_pipe_shader
 
r600_store_context_reg(cb, R_028890_SQ_PGM_RESOURCES_ES,
   S_028890_NUM_GPRS(rshader->bc.ngpr) |
+  S_028890_DX10_CLAMP(1) |
   S_028890_STACK_SIZE(rshader->bc.nstack));
r600_store_context_reg(cb, R_02888C_SQ_PGM_START_ES,
   shader->bo->gpu_address >> 8);
@@ -3317,6 +3319,7 @@ void evergreen_update_gs_state(struct pipe_context *ctx, 
struct r600_pipe_shader
 
r600_store_context_reg(cb, R_028878_SQ_PGM_RESOURCES_GS,
   S_028878_NUM_GPRS(rshader->bc.ngpr) |
+  S_028878_DX10_CLAMP(1) |
   S_028878_STACK_SIZE(rshader->bc.nstack));
r600_store_context_reg(cb, R_028874_SQ_PGM_START_GS,
   shader->bo->gpu_address >> 8);
@@ -3357,6 +3360,7 @@ void evergreen_update_vs_state(struct pipe_context *ctx, 
struct r600_pipe_shader
   S_0286C4_VS_EXPORT_COUNT(nparams - 1));
r600_store_context_reg(cb, R_028860_SQ_PGM_RESOURCES_VS,
   S_028860_NUM_GPRS(rshader->bc.ngpr) |
+  S_028860_DX10_CLAMP(1) |
   S_028860_STACK_SIZE(rshader->bc.nstack));
if (rshader->vs_position_window_space) {
r600_store_context_reg(cb, R_028818_PA_CL_VTE_CNTL,
@@ -3391,6 +3395,7 @@ void evergreen_update_hs_state(struct pipe_context *ctx, 
struct r600_pipe_shader
r600_init_command_buffer(cb, 32);
r600_store_context_reg(cb, R_0288BC_SQ_PGM_RESOURCES_HS,
   S_0288BC_NUM_GPRS(rshader->bc.ngpr) |
+  S_0288BC_DX10_CLAMP(1) |
   S_0288BC_STACK_SIZE(rshader->bc.nstack));
r600_store_context_reg(cb, R_0288B8_SQ_PGM_START_HS,
   shader->bo->gpu_address >> 8);
@@ -3404,6 +3409,7 @@ void evergreen_update_ls_state(struct pipe_context *ctx, 
struct r600_pipe_shader
r600_init_command_buffer(cb, 32);
r600_store_context_reg(cb, R_0288D4_SQ_PGM_RESOURCES_LS,
   S_0288D4_NUM_GPRS(rshader->bc.ngpr) |
+  S_0288D4_DX10_CLAMP(1) |
   S_0288D4_STACK_SIZE(rshader->bc.nstack));
r600_store_context_reg(cb, R_0288D0_SQ_PGM_START_LS,
   shader->bo->gpu_address >> 8);
diff --git a/src/gallium/drivers/r600/r600_state.c 

Re: [Mesa-dev] [PATCH] threads: fix MinGW build breakage

2017-11-09 Thread Roland Scheidegger
Looks alright to me.

Reviewed-by: Roland Scheidegger 

Am 09.11.2017 um 17:46 schrieb Brian Paul:
> Fixes: f1a364878431c8 ("threads: update for late C11 changes")
> ---
>  include/c11/threads_win32.h | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/c11/threads_win32.h b/include/c11/threads_win32.h
> index 77d923a..dac8ef7 100644
> --- a/include/c11/threads_win32.h
> +++ b/include/c11/threads_win32.h
> @@ -78,6 +78,9 @@ Configuration macro:
>  /* Visual Studio 2015 and later */
>  #if _MSC_VER >= 1900
>  #define HAVE_TIMESPEC
> +#define HAVE_TIMESPEC_GET
> +#elif defined(__MINGW32__)
> +#define HAVE_TIMESPEC
>  #endif
>  
>  #ifndef HAVE_TIMESPEC
> @@ -645,7 +648,7 @@ tss_set(tss_t key, void *val)
>  
>  /* 7.25.7 Time functions */
>  // 7.25.6.1
> -#ifndef HAVE_TIMESPEC
> +#ifndef HAVE_TIMESPEC_GET
>  static inline int
>  timespec_get(struct timespec *ts, int base)
>  {
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] r600: use ieee version of rcp

2017-11-09 Thread Roland Scheidegger
Am 09.11.2017 um 18:58 schrieb Roland Scheidegger:
> Am 09.11.2017 um 18:27 schrieb Jan Vesely:
>> On Thu, 2017-11-09 at 03:58 +0100, srol...@vmware.com wrote:
>>> From: Roland Scheidegger 
>>>
>>> r600 used the clamped version for rcp, whereas both evergreen and cayman
>>> used the ieee version. I don't know why that discrepancy exists (it does so
>>> since day 1) but there does not seem to be a valid reason for this, so make
>>> it consistent. This seems now safer than before the previous commit (using
>>> the mystery dx10 clamp).
>>> Note that rsq still uses clamped version (as before even though the table
>>> may have suggested otherwise for evergreen) for r600/eg, but not for cayman.
>>
>> just layman's opinion here. Does TGSI not mandate specific behaviour
>> wrt nans and infinities for this OP?
> No, not really. Ideally all (non-legacy such as LIT) opcodes would
> follow ieee754 (or d3d10) semantics (and that's how they are implemented
> at least in llvmpipe). But we don't enforce denorm behavior for instance
> neither (llvmpipe will disable them - on x86 at least...).
> Some hw supported by gallium drivers also simply can't generate NaNs, no
> matter the opcode.
> I think in general drivers should try to stick as close to ieee754 (and
> d3d10) semantics as they can, regardless what GL may allow, so yes, I
> guess the ieee version should be used (albeit the abs modifier doesn't
> really fit in there neither, and that is done because there's problems
> otherwise).
> 
> Roland
> 
> 
>>
>>> I just don't feel lucky enough to change this (it should also be noted r600
>>> supports sqrt natively, which is always ieee, therefore might not really see
>>> rsqrt with glsl often presumably).
>>
>> why would that be? isn't RECIPSQRT_IEEE(x) still optimization over
>> RECIP_IEEE(SQRT(x))?

Forgot to mention previously, I was actually thinking glsl doesn't have
rsqrt, but that's not true. In any case, shaders using sqrt will have
this lowered to rsqrt/rcp on some drivers, but not r600.
I'll send out new patches also rsq to ieee behavior in any case...

Roland
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] r600: use ieee version of rcp

2017-11-09 Thread Roland Scheidegger
Am 09.11.2017 um 18:27 schrieb Jan Vesely:
> On Thu, 2017-11-09 at 03:58 +0100, srol...@vmware.com wrote:
>> From: Roland Scheidegger 
>>
>> r600 used the clamped version for rcp, whereas both evergreen and cayman
>> used the ieee version. I don't know why that discrepancy exists (it does so
>> since day 1) but there does not seem to be a valid reason for this, so make
>> it consistent. This seems now safer than before the previous commit (using
>> the mystery dx10 clamp).
>> Note that rsq still uses clamped version (as before even though the table
>> may have suggested otherwise for evergreen) for r600/eg, but not for cayman.
> 
> just layman's opinion here. Does TGSI not mandate specific behaviour
> wrt nans and infinities for this OP?
No, not really. Ideally all (non-legacy such as LIT) opcodes would
follow ieee754 (or d3d10) semantics (and that's how they are implemented
at least in llvmpipe). But we don't enforce denorm behavior for instance
neither (llvmpipe will disable them - on x86 at least...).
Some hw supported by gallium drivers also simply can't generate NaNs, no
matter the opcode.
I think in general drivers should try to stick as close to ieee754 (and
d3d10) semantics as they can, regardless what GL may allow, so yes, I
guess the ieee version should be used (albeit the abs modifier doesn't
really fit in there neither, and that is done because there's problems
otherwise).

Roland


> 
>> I just don't feel lucky enough to change this (it should also be noted r600
>> supports sqrt natively, which is always ieee, therefore might not really see
>> rsqrt with glsl often presumably).
> 
> why would that be? isn't RECIPSQRT_IEEE(x) still optimization over
> RECIP_IEEE(SQRT(x))?
> 
> Jan
> 
>> Compile tested only...
>> ---
>>  src/gallium/drivers/r600/r600_shader.c | 8 ++--
>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/gallium/drivers/r600/r600_shader.c 
>> b/src/gallium/drivers/r600/r600_shader.c
>> index 6a755bb3fd..628c33787e 100644
>> --- a/src/gallium/drivers/r600/r600_shader.c
>> +++ b/src/gallium/drivers/r600/r600_shader.c
>> @@ -8830,11 +8830,7 @@ static const struct r600_shader_tgsi_instruction 
>> r600_shader_tgsi_instruction[]
>>  [TGSI_OPCODE_MOV]   = { ALU_OP1_MOV, tgsi_op2},
>>  [TGSI_OPCODE_LIT]   = { ALU_OP0_NOP, tgsi_lit},
>>  
>> -/* XXX:
>> - * For state trackers other than OpenGL, we'll want to use
>> - * _RECIP_IEEE instead.
>> - */
>> -[TGSI_OPCODE_RCP]   = { ALU_OP1_RECIP_CLAMPED, 
>> tgsi_trans_srcx_replicate},
>> +[TGSI_OPCODE_RCP]   = { ALU_OP1_RECIP_IEEE, 
>> tgsi_trans_srcx_replicate},
>>  
>>  [TGSI_OPCODE_RSQ]   = { ALU_OP0_NOP, tgsi_rsq},
>>  [TGSI_OPCODE_EXP]   = { ALU_OP0_NOP, tgsi_exp},
>> @@ -9034,7 +9030,7 @@ static const struct r600_shader_tgsi_instruction 
>> eg_shader_tgsi_instruction[] =
>>  [TGSI_OPCODE_MOV]   = { ALU_OP1_MOV, tgsi_op2},
>>  [TGSI_OPCODE_LIT]   = { ALU_OP0_NOP, tgsi_lit},
>>  [TGSI_OPCODE_RCP]   = { ALU_OP1_RECIP_IEEE, 
>> tgsi_trans_srcx_replicate},
>> -[TGSI_OPCODE_RSQ]   = { ALU_OP1_RECIPSQRT_IEEE, tgsi_rsq},
>> +[TGSI_OPCODE_RSQ]   = { ALU_OP0_NOP, tgsi_rsq},
>>  [TGSI_OPCODE_EXP]   = { ALU_OP0_NOP, tgsi_exp},
>>  [TGSI_OPCODE_LOG]   = { ALU_OP0_NOP, tgsi_log},
>>  [TGSI_OPCODE_MUL]   = { ALU_OP2_MUL_IEEE, tgsi_op2},

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] r600: use min_dx10/max_dx10 instead of min/max

2017-11-09 Thread Roland Scheidegger
Am 09.11.2017 um 18:43 schrieb Jan Vesely:
> On Thu, 2017-11-09 at 18:39 +0100, Nicolai Hähnle wrote:
>> On 09.11.2017 18:26, Roland Scheidegger wrote:
>>> Am 09.11.2017 um 18:19 schrieb Jan Vesely:
 On Thu, 2017-11-09 at 03:58 +0100, srol...@vmware.com wrote:
> From: Roland Scheidegger 
>
> I believe this is the safe thing to do, especially ever since the driver
> actually generates NaNs for muls too.
> Albeit since the radeon ISA docs are inaccurate/wrong there, I'm not
> entirely sure what the non-dx10 versions do,

 non-dx10 version return nan if one of the operands is nan (tested on
 Turks).
>>>
>>> Yes, I've modified a piglit test and came to the same conclusion. (I
>>> will put that up for review shortly.)
>>> I don't know why you'd ever want that, though (ieee fmin/fmax also
>>> should return non-nan).
>>
>> My guess is that DX9-level hardware had that behavior by virtue of just 
>> not caring about NaN at all, and the hardware folks were just being 
>> conservative in adding a new opcode rather than changing the behavior of 
>> the old one. I don't think GCN has the old-style min/max.
> 
> Looks like it.
> There's v_max_legacy_f32 at least on SI/CI. comment says "D.f =
> max(S0.f, S1.f) (DX9 rules for NaN)"
The problem with dx9 rules for NaN is that noone really seems to know
what they are exactly - in that sense even GL is an improvement since
it's at least obvious you can do whatever floats your boat :-).
Albeit generally, in d3d9 you should never generate a NaN in the first
place, hence how you promote it doesn't really matter.

Roland

> 
> Jan
>>
>> Cheers,
>> Nicolai
>>
>>>
>>> Roland
>>>
>>>

 Jan

>   but (as required by dx10)
> the dx10 versions should pick a non-nan source over a nan source.
> Other drivers presumably do the same (radeonsi, llvmpipe).
> This was shown to make some difference for bug 103544, albeit it is not
> required to fix it.
> ---
>   src/gallium/drivers/r600/r600_shader.c  | 12 ++--
>   src/gallium/drivers/r600/sb/sb_expr.cpp |  2 ++
>   2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/r600_shader.c 
> b/src/gallium/drivers/r600/r600_shader.c
> index 188fbc9d47..6a755bb3fd 100644
> --- a/src/gallium/drivers/r600/r600_shader.c
> +++ b/src/gallium/drivers/r600/r600_shader.c
> @@ -8844,8 +8844,8 @@ static const struct r600_shader_tgsi_instruction 
> r600_shader_tgsi_instruction[]
>   [TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>   [TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>   [TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
> - [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
> - [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
> + [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
> + [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
>   [TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
>   [TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
>   [TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
> @@ -9042,8 +9042,8 @@ static const struct r600_shader_tgsi_instruction 
> eg_shader_tgsi_instruction[] =
>   [TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>   [TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>   [TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
> - [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
> - [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
> + [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
> + [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
>   [TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
>   [TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
>   [TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
> @@ -9265,8 +9265,8 @@ static const struct r600_shader_tgsi_instruction 
> cm_shader_tgsi_instruction[] =
>   [TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>   [TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
>   [TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
> - [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
> - [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
> + [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
> + [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
>   [TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
>   [TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
>   [TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
> diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp 
> b/src/gallium/drivers/r600/sb/sb_expr.cpp
> index 3dd3a4815b..7a5d62c8e8 100644
> 

Re: [Mesa-dev] [PATCH 1/4] r600: use min_dx10/max_dx10 instead of min/max

2017-11-09 Thread Jan Vesely
On Thu, 2017-11-09 at 18:39 +0100, Nicolai Hähnle wrote:
> On 09.11.2017 18:26, Roland Scheidegger wrote:
> > Am 09.11.2017 um 18:19 schrieb Jan Vesely:
> > > On Thu, 2017-11-09 at 03:58 +0100, srol...@vmware.com wrote:
> > > > From: Roland Scheidegger 
> > > > 
> > > > I believe this is the safe thing to do, especially ever since the driver
> > > > actually generates NaNs for muls too.
> > > > Albeit since the radeon ISA docs are inaccurate/wrong there, I'm not
> > > > entirely sure what the non-dx10 versions do,
> > > 
> > > non-dx10 version return nan if one of the operands is nan (tested on
> > > Turks).
> > 
> > Yes, I've modified a piglit test and came to the same conclusion. (I
> > will put that up for review shortly.)
> > I don't know why you'd ever want that, though (ieee fmin/fmax also
> > should return non-nan).
> 
> My guess is that DX9-level hardware had that behavior by virtue of just 
> not caring about NaN at all, and the hardware folks were just being 
> conservative in adding a new opcode rather than changing the behavior of 
> the old one. I don't think GCN has the old-style min/max.

Looks like it.
There's v_max_legacy_f32 at least on SI/CI. comment says "D.f =
max(S0.f, S1.f) (DX9 rules for NaN)"

Jan
> 
> Cheers,
> Nicolai
> 
> > 
> > Roland
> > 
> > 
> > > 
> > > Jan
> > > 
> > > >   but (as required by dx10)
> > > > the dx10 versions should pick a non-nan source over a nan source.
> > > > Other drivers presumably do the same (radeonsi, llvmpipe).
> > > > This was shown to make some difference for bug 103544, albeit it is not
> > > > required to fix it.
> > > > ---
> > > >   src/gallium/drivers/r600/r600_shader.c  | 12 ++--
> > > >   src/gallium/drivers/r600/sb/sb_expr.cpp |  2 ++
> > > >   2 files changed, 8 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/src/gallium/drivers/r600/r600_shader.c 
> > > > b/src/gallium/drivers/r600/r600_shader.c
> > > > index 188fbc9d47..6a755bb3fd 100644
> > > > --- a/src/gallium/drivers/r600/r600_shader.c
> > > > +++ b/src/gallium/drivers/r600/r600_shader.c
> > > > @@ -8844,8 +8844,8 @@ static const struct r600_shader_tgsi_instruction 
> > > > r600_shader_tgsi_instruction[]
> > > > [TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > > [TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > > [TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
> > > > -   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
> > > > -   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
> > > > +   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
> > > > +   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
> > > > [TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
> > > > [TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
> > > > [TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
> > > > @@ -9042,8 +9042,8 @@ static const struct r600_shader_tgsi_instruction 
> > > > eg_shader_tgsi_instruction[] =
> > > > [TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > > [TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > > [TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
> > > > -   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
> > > > -   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
> > > > +   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
> > > > +   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
> > > > [TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
> > > > [TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
> > > > [TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
> > > > @@ -9265,8 +9265,8 @@ static const struct r600_shader_tgsi_instruction 
> > > > cm_shader_tgsi_instruction[] =
> > > > [TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > > [TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > > [TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
> > > > -   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
> > > > -   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
> > > > +   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
> > > > +   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
> > > > [TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
> > > > [TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
> > > > [TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
> > > > diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp 
> > > > b/src/gallium/drivers/r600/sb/sb_expr.cpp
> > > > index 3dd3a4815b..7a5d62c8e8 100644
> > > > --- a/src/gallium/drivers/r600/sb/sb_expr.cpp
> > > > +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
> > > > @@ -753,7 +753,9 @@ bool expr_handler::fold_alu_op2(alu_node& n) {
> > > > n.bc.src[0].abs 

Re: [Mesa-dev] [PATCH] i965: disable BLORP color clears for gen 4-5

2017-11-09 Thread Emil Velikov
On 9 November 2017 at 17:23, Jason Ekstrand  wrote:
> This is a really rubbish solution.  Yes, it fixes a crash in MPV but unless
> we disable all blorp on gen4-5 (which I don't think is possible anymore), we
> haven't actually fixed it for real.
>
Fully agreed - it is nasty.

Skimming through the blockers for 17.3.0 and many of those are
directed your way.
I was looking for a way to mitigate some, so we don't stress/burn you out :-)

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] r600: use min_dx10/max_dx10 instead of min/max

2017-11-09 Thread Nicolai Hähnle

On 09.11.2017 18:26, Roland Scheidegger wrote:

Am 09.11.2017 um 18:19 schrieb Jan Vesely:

On Thu, 2017-11-09 at 03:58 +0100, srol...@vmware.com wrote:

From: Roland Scheidegger 

I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
Albeit since the radeon ISA docs are inaccurate/wrong there, I'm not
entirely sure what the non-dx10 versions do,


non-dx10 version return nan if one of the operands is nan (tested on
Turks).


Yes, I've modified a piglit test and came to the same conclusion. (I
will put that up for review shortly.)
I don't know why you'd ever want that, though (ieee fmin/fmax also
should return non-nan).


My guess is that DX9-level hardware had that behavior by virtue of just 
not caring about NaN at all, and the hardware folks were just being 
conservative in adding a new opcode rather than changing the behavior of 
the old one. I don't think GCN has the old-style min/max.


Cheers,
Nicolai



Roland




Jan


  but (as required by dx10)
the dx10 versions should pick a non-nan source over a nan source.
Other drivers presumably do the same (radeonsi, llvmpipe).
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.
---
  src/gallium/drivers/r600/r600_shader.c  | 12 ++--
  src/gallium/drivers/r600/sb/sb_expr.cpp |  2 ++
  2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 188fbc9d47..6a755bb3fd 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -8844,8 +8844,8 @@ static const struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[]
[TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
-   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
-   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
+   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
+   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
[TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
[TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
[TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
@@ -9042,8 +9042,8 @@ static const struct r600_shader_tgsi_instruction 
eg_shader_tgsi_instruction[] =
[TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
-   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
-   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
+   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
+   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
[TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
[TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
[TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
@@ -9265,8 +9265,8 @@ static const struct r600_shader_tgsi_instruction 
cm_shader_tgsi_instruction[] =
[TGSI_OPCODE_DP3]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DP4]   = { ALU_OP2_DOT4_IEEE, tgsi_dp},
[TGSI_OPCODE_DST]   = { ALU_OP0_NOP, tgsi_opdst},
-   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN, tgsi_op2},
-   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX, tgsi_op2},
+   [TGSI_OPCODE_MIN]   = { ALU_OP2_MIN_DX10, tgsi_op2},
+   [TGSI_OPCODE_MAX]   = { ALU_OP2_MAX_DX10, tgsi_op2},
[TGSI_OPCODE_SLT]   = { ALU_OP2_SETGT, tgsi_op2_swap},
[TGSI_OPCODE_SGE]   = { ALU_OP2_SETGE, tgsi_op2},
[TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD_IEEE, tgsi_op3},
diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp 
b/src/gallium/drivers/r600/sb/sb_expr.cpp
index 3dd3a4815b..7a5d62c8e8 100644
--- a/src/gallium/drivers/r600/sb/sb_expr.cpp
+++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
@@ -753,7 +753,9 @@ bool expr_handler::fold_alu_op2(alu_node& n) {
n.bc.src[0].abs == n.bc.src[1].abs) {
switch (n.bc.op) {
case ALU_OP2_MIN: // (MIN x, x) => (MOV x)
+   case ALU_OP2_MIN_DX10:
case ALU_OP2_MAX:
+   case ALU_OP2_MAX_DX10:
convert_to_mov(n, v0, n.bc.src[0].neg, 
n.bc.src[0].abs);
return fold_alu_op1(n);
case ALU_OP2_ADD:  // (ADD x, x) => (MUL x, 2)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___

  1   2   >